BIG DATA GLOSSARY: Key Big Data Concepts for Separating Meaning from the Hype


An algorithm is mathematical “logic” or a set of rules used to make calculations. Starting with an initial input (which may be zero or null), the logic or rules are coded or written into software as a set of steps to be followed in conducting calculations, processing data or performing other functions, eventually leading to an output.

Teradata Take: Within the context of big data, algorithms are the primary means for uncovering insights and detecting patterns. Thus, they are essential to realizing the big data business case.


An analytics platform is a full-featured technology solution designed to address the needs of large enterprises. Typically, it joins different “tools and analytics systems together with an engine to execute, a database or repository to store and manage the data, data mining processes, and techniques and mechanisms for obtaining and preparing data that is not stored. This solution can be conveyed as a software-only application or as a cloud-based software as a service (SaaS) provided to organizations in need of contextual information that all their data points to, in other words, analytical information based on current data records.”Source: Techopedia

Behavioral Analytics is a subset of business analytics that focuses on understanding what consumers and applications do, as well as how and why they act in certain ways. It is particularly prevalent in the realm of eCommerce and online retailing, online gaming and Web applications. In practice, behavioral analytics seeks to connect seemingly unrelated data points and explain or predict outcomes, future trends or the likelihood of certain events. At the heart of behavioral analytics is such data as online navigation paths, clickstreams, social media interactions, purchases or shopping cart abandonment decisions, though it may also include more specific metrics.

Teradata Take: But behavioral analytics can be more than just tracking people. Its principles also apply to the interactions and dynamics between processes, machines and equipment, even macroeconomic trends.


“Big data is an all-encompassing term for any collection of data sets so large or complex that it becomes difficult to process them using traditional data-processing applications.” Source: Wikipedia

Teradata Take: Big data is often described in terms of several “V’s” – volume, variety, velocity, variability, veracity – which speak collectively to the complexity and difficulty in collecting, storing, managing, analyzing and otherwise putting big data to work in creating the most important “V” of all – value.


“Big data analytics refers to the strategy of analyzing large volumes of data … gathered from a wide variety of sources, including social networks, videos, digital images, sensors and sales transaction records. The aim in analyzing all this data is to uncover patterns and connections that might otherwise be invisible, and that might provide valuable insights about the users who created it. Through this insight, businesses may be able to gain an edge over their rivals and make superior business decisions.” Source: Techopedia

Teradata Take: Big data analytics isn’t one practice or one tool. Big data visualizations are needed in some situations, while connected analytics are the right answer in others.


“Business intelligence (BI) is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.” Source: Gartner “Companies use BI to improve decision making, cut costs and identify new business opportunities. BI is more than just corporate reporting and more than a set of tools to coax data out of enterprise systems. CIOs use BI to identify inefficient business processes that are ripe for re-engineering.” Source:


Cluster analysis or clustering is a statistical classification technique or activity that involves grouping a set of objects or data so that those in the same group (called a cluster) are similar to each other, but different from those in other clusters. It is essential to data mining and discovery, and is often used in the context of machine learning, pattern recognition, image analysis and in bioinformatics and other sectors that analyze large data sets.


Comparative analysis refers to the comparison of two or more processes, documents, data sets or other objects. Pattern analysis, filtering and decision-tree analytics are forms of comparative analysis. In healthcare, comparative analysis is used to compare large volumes of medical records, documents, images, sensor data and other information to assess the effectiveness of medical diagnoses.


Connection analytics is an emerging discipline that helps to discover interrelated connections and influences between people, products, processes machines and systems within a network by mapping those connections and continuously monitoring interactions between them. It has been used to address difficult and persistent business questions relating to, for instance, the influence of thought leaders, the impact of external events or players on financial risk, and the causal relationships between nodes in assessing network performance.


Correlation analysis refers to the application of statistical analysis and other mathematical techniques to evaluate or measure the relationships between variables. It can be used to define the most likely set of factors that will lead to a specific outcome – like a customer responding to an offer or the performance of financial markets.


The main tasks of data analysts are to collect, manipulate and analyze data, as well as to prepare reports, which may be include graphs, charts, dashboards and other visualizations. Data analysts also generally serve as guardians or gatekeepers of an organization’s data, ensuring that information assets are consistent, complete and current. Many data analysts and business analysts are known for having considerable technical knowledge and strong industry expertise.

Teradata Take: Data analysts serve the critical purpose of helping to operationalize big data within specific functions and processes, with a clear focus on performance trends and operational information.


“Data mining is the process of analyzing hidden patterns of data according to different perspectives for categorization into useful information, which is collected and assembled in common areas, such as data warehouses, for efficient analysis, data mining algorithms, facilitating business decision making and other information requirements to ultimately cut costs and increase revenue. Data mining is also known as data discovery and knowledge discovery.” Source: Techopedia


“Data modeling is the analysis of data objects that are used in a business or other context and the identification of the relationships among these data objects. A data model can be thought of as a diagram or flowchart that illustrates the relationships between data.” Source: TechTarget

Teradata Take: Data models that are tailored to specific industries or business functions can provide a strong foundation or “jump-start” for big data programs and investments.


“In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons. The data stored in the warehouse is uploaded from the operational systems (such as marketing, sales, etc.)”  Source: Wikipedia

Considered the most basic type of analytics, descriptive analytics involves the breaking down of big data into smaller chunks of usable information so that companies can understand what happened with a specific operation, process or set of transactions. Descriptive analytics can provide insight into current customer behaviors and operational trends to support decisions about resource allocations, process improvements and overall performance management. Most industry observers believe it represents the vast majority of the analytics in use at companies today.

Teradata Take: A strong foundation of descriptive analytics – based on a solid and flexible data architecture – provides the accuracy and confidence in decision making most companies need in the big data era (especially if they wish to avoid being overwhelmed by large data volumes). More importantly, it ultimately enables more advanced analytics capabilities – especially predictive and prescriptive analytics.


Hadoop is a distributed data management platform or open-source software framework for storing and processing big data. It is sometimes described as a cut-down distributed operating system. It is designed to manage and work with immense volumes of data, and scale linearly to large clusters of thousands of commodity computers. It was originally developed for Yahoo!, but is now available free and publicly through Apache Software Foundation, though it usually requires extensive programming knowledge to be used.

A concept that describes the connection of everyday physical objects and products to the Internet so that they are recognizable by (through unique identifiers) and can relate to other devices. The term is closely identified with machine-to-machine communications and the development of, for example, “smart grids” for utilities, remote monitoring and other innovations. Gartner estimates 26 billion devices will be connected by 2020, including cars, coffee makers.

Teradata Take: Big data will only get bigger in the future and the IOT will be a major driver. The connectivity from wearables and sensors mean bigger volumes, more variety and higher-velocity feeds.


“Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. It focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data. The process of machine learning is similar to that of data mining. Both systems search through data to look for patterns. However, instead of extracting data for human comprehension – as is the case in data mining applications – machine learning uses that data to improve the program’s own understanding. Machine learning programs detect patterns in data and adjust program actions accordingly.” Source: TechTarget

Teradata Take: Machine learning is especially powerful in a big data context in that machines can test hypotheses using large data volumes, refine business rules as conditions change and identify anomalies and outliers quickly and accurately.


“Metadata is data that describes other data. Metadata summarizes basic information about data, which can make finding and working with particular instances of data easier. For example, author, date created and date modified and file size are very basic document metadata. In addition to document files, metadata is used for images, videos, spreadsheets and web pages.” Source: TechTarget

Teradata Take: The effective management of metadata is an essential part of solid and flexible big data “ecosystems” in that it helps companies more efficiently manage their data assets and make them available to data scientists and other analysts.


A branch of artificial intelligence, natural language processing (NLP) deals with making human language (in both written and spoken forms) comprehensible to computers. As a scientific discipline, NLP involves tasks such as identifying sentence structures and boundaries in documents, detecting key words or phrases in audio recordings, extracting relationships between documents, and uncovering meaning in informal or slang speech patterns. NLP can make it possible to analyze and recognize patterns in verbal data that is currently unstructured.

Teradata Take: NLP holds a key for enabling major advancements in text analytics and for garnering deeper and potentially more powerful insights from social media data streams, where slang and unconventional language are prevalent.


Pattern recognition occurs when an algorithm locates recurrences or regularities within large data sets or across disparate data sets. It is closely linked and even considered synonymous with machine learning and data mining. This visibility can help researchers discover insights or reach conclusions that would otherwise be obscured.


Predictive analytics refers to the analysis of big data to make predictions and determine the likelihood of future outcomes, trends or events. In business, it can be used to model various scenarios for how customers react to new product offerings or promotions and how the supply chain might be affected by extreme weather patterns or demand spikes. Predictive analytics may involve various statistical techniques, such as modeling, machine learning and data mining.


A type or extension of predictive analytics, prescriptive analytics is used to recommend or prescribe specific actions when certain information states are reached or conditions are met. It uses algorithms, mathematical techniques and/or business rules to choose among several different actions that are aligned to an objective (such as improving business performance) and that recognize various requirements or constraints.


Semi-structured data refers to data that is not captured or formatted in conventional ways, such as those associated with a traditional database fields or common data models. It is also not raw or totally unstructured and may contain some data tables, tags or other structural elements. Graphs and tables, XML documents and email are examples of semi-structured data, which is very prevalent across the World Wide Web and is often found in object-oriented databases.

Teradata Take: As semi-structured data proliferates and because it contains some rational data, companies must account for it within their big data programs and data architectures.


Sentiment analysis involves the capture and tracking of opinions, emotions or feelings expressed by consumers in various types of interactions or documents, including social media, calls to customer service representatives, surveys and the like. Text analytics and natural language processing are typical activities within a process of sentiment analysis. The goal is to determine or assess the sentiments or attitudes expressed toward a company, product, service, person or event.

Teradata Take: Sentiment analysis is particularly important in tracking emerging trends or changes in perceptions on social media. Within big data environments, sentiment analysis combined with behavioral analytics and machine learning is likely to yield even more valuable insights.


Big data – and the business challenges and opportunities associated with it – are often discussed or described in the context of multiple V’s:

Value: the most important “V” from the perspective of the business, the value of big data usually comes from insight discovery and pattern recognition that lead to more effective operations, stronger customer relationships and other clear and quantifiable business benefits

Variability: the changing nature of the data companies seek to capture, manage and analyze – e.g., in sentiment or text analytics, changes in the meaning of key words or phrases

Variety: the diversity and range of different data types, including unstructured data, semi-structured data and raw data

Velocity: the speed at which companies receive, store and manage data – e.g., the specific number of social media posts or search queries received within a day, hour or other unit of time

Veracity: the “truth” or accuracy of data and information assets, which often determines executive-level confidence

Volume: the size and amounts of big data that companies manage and analyze


Comments are closed.

Up ↑

%d bloggers like this: