What is big data?
The term “big data” means large quantities of raw information – typically terabytes or more – collated from multiple sources both within and outside of an organisation. The term can also loosely refer to the process of analysing such large sets of data to discover insights.
What sort of information is included?
Big data can include everything that could conceivably be relevant to a company’s operations, such as sales data, website activity, CRM records, network logs and sensor information. External information may include details of social media activity or currency exchange rates. Big data sets often focus on “high-velocity” sources, which grow rapidly as new data is continually added.
Another hallmark of big data is the inclusion of unstructured or semi-structured content, such as social media posts or web pages. This contrasts with traditional database-driven approaches to business intelligence, although big-data sources generally still need to be translated into a consistent format for analysis.
Why is big data useful?
Big data analysis uses powerful computing resources to process data sets that would be too large and diverse for a human to work with. Subtle trends and correlations can be spotted, and actionable insights can be generated – perhaps relating to customer behaviour, or to inefficiencies in the company’s workflow – that would be missed by traditional approaches, or uncovered much more slowly.
Does big data use AI?
Big data analysis doesn’t necessarily involve artificial intelligence. However, the task of finding patterns and connections in very large, unorganised data sets is a natural fit for machine learning. AI logic can be used at multiple stages of a big data process, such as standardising the data and making predictions from incomplete information.
How is the data processed?
There is no off-the-shelf tool for big data analysis: the process needs to be custom-coded to suit the available data sources and business parameters. Many solutions use the open-source Hadoop programming framework, which has built-in capabilities for handling the ingestion, storage and processing of large data stores.
What sorts of organisation can make use of big data?
Big data is of particular interest to enterprise-scale businesses: these are the companies most likely to generate the huge quantities of data required for big data analysis. Large companies are also most likely to have the resources to invest in the necessary computing power, and can afford to hire professional developers and analysts to realise big data projects.
However, big data techniques are open to businesses of all sizes. Hosted services such as Google Cloud BigQuery, IBM Cloud Pak for Data and Microsoft Azure Databricks let businesses of any size assemble their own data analysis processes, using a variety of languages and frameworks, on a pay-as-you-go basis.
Summary
- Big data refers to very large collections of unstructured data, and the analysis that can be performed on them.
- Applying AI-type logic to big data stores can unearth insights that a human data worker would never discover.
- Big data processing normally entails some degree of custom coding, using a suitable framework such as Hadoop.
- Small businesses can take advantage of numerous cloud-based big data services.
NEXT UP
Phil Robinson, Principal Security Consultant and Founder at Prism Infosec: “Ethical hackers serve as the frontline defence against cybercriminals”
We interview Phil Robinson, Principal Security Consultant and Founder at Prism Infosec, who shares his views on ethical hackers and the latest ransomware trends.
What is Thunderbolt Share?
Intel has just announced Thunderbolt Share, which can link two PCs together in a way that we’ve never seen before. To discover how it works, and what you need, read our explainer.
Ghostbusters proton packs in real life
Would Ghostbusters proton packs be useful in the real world? Richard Trenholm speaks to scientist James Maxwell to find out.