The term “big data” means large quantities of raw information – typically terabytes or more – collated from multiple sources both within and outside of an organisation. The term can also loosely refer to the process of analysing such large sets of data to discover insights.

What sort of information is included?

Big data can include everything that could conceivably be relevant to a company’s operations, such as sales data, website activity, CRM records, network logs and sensor information. External information may include details of social media activity or currency exchange rates. Big data sets often focus on “high-velocity” sources, which grow rapidly as new data is continually added.

Another hallmark of big data is the inclusion of unstructured or semi-structured content, such as social media posts or web pages. This contrasts with traditional database-driven approaches to business intelligence, although big-data sources generally still need to be translated into a consistent format for analysis.

Why is big data useful?

Big data analysis uses powerful computing resources to process data sets that would be too large and diverse for a human to work with. Subtle trends and correlations can be spotted, and actionable insights can be generated – perhaps relating to customer behaviour, or to inefficiencies in the company’s workflow – that would be missed by traditional approaches, or uncovered much more slowly.

Does big data use AI?

Big data analysis doesn’t necessarily involve artificial intelligence. However, the task of finding patterns and connections in very large, unorganised data sets is a natural fit for machine learning. AI logic can be used at multiple stages of a big data process, such as standardising the data and making predictions from incomplete information.

How is the data processed?

There is no off-the-shelf tool for big data analysis: the process needs to be custom-coded to suit the available data sources and business parameters. Many solutions use the open-source Hadoop programming framework, which has built-in capabilities for handling the ingestion, storage and processing of large data stores.

What sorts of organisation can make use of big data?

Big data is of particular interest to enterprise-scale businesses: these are the companies most likely to generate the huge quantities of data required for big data analysis. Large companies are also most likely to have the resources to invest in the necessary computing power, and can afford to hire professional developers and analysts to realise big data projects.

However, big data techniques are open to businesses of all sizes. Hosted services such as Google Cloud BigQuery, IBM Cloud Pak for Data and Microsoft Azure Databricks let businesses of any size assemble their own data analysis processes, using a variety of languages and frameworks, on a pay-as-you-go basis.

Summary

  • Big data refers to very large collections of unstructured data, and the analysis that can be performed on them. 
  • Applying AI-type logic to big data stores can unearth insights that a human data worker would never discover. 
  • Big data processing normally entails some degree of custom coding, using a suitable framework such as Hadoop. 
  • Small businesses can take advantage of numerous cloud-based big data services. 
Avatar photo
Darien Graham-Smith

Darien is one of the UK's most knowledgeable technical journalists. You will find him in PC Pro magazine, writing reviews for a variety of sites and on guitar with his band The Red Queens. His explainer articles help TechFinitive's audience understand how technology works.

NEXT UP

what is thunderbolt share shown by a PC connected to a laptop

What is Thunderbolt Share?

Intel has just announced Thunderbolt Share, which can link two PCs together in a way that we’ve never seen before. To discover how it works, and what you need, read our explainer.