Glossary: AI and advanced analytics terms

What is A/B testing?

A/B testing, also known as split testing, is a form of business experimentation that compares a test group with a control group to see which responds more favorably. → multivariate testing, test & learn

What is analysis of variance (ANOVA)?

Analysis of variance (ANOVA) is a way to test variance resulting from the influence on dependent variables of fixed independent variables. → regression analysis

What is artificial intelligence (AI)?

Artificial intelligence (AI) is the ability of a computer to mimic a human by developing reasoning skills, particularly associated with the use of a neural network (NN) for deep learning (DL). → predictive AI, generative AI (genAI)

What is an artificial neural network (ANN)?

An artificial neural network (ANN) is an alternative term for neural network (NN).

What is bias error?

Bias error is an error resulting from incorrect associations during machine learning (ML). → variance error, bias–variance tradeoff

What is the bias–variance tradeoff?

The bias-variance tradeoff is the quest for low bias and low variance in machine learning (ML) since complex models tend to have lower bias but higher variance, while simpler models tend to have higher bias but lower variance.

What is the black box problem?

The black box problem is the inability to fully understand how machine learning (ML) and artificial intelligence (AI) systems reason.

What is bootstrapping?

Bootstrapping is a form of cross validation that randomly replaces portions of data when resampling.

What is business experimentation?

Business experimentation is a way for businesses to test ideas through techniques such as A/B testing, multivariate testing, test & learn or a natural experiment.

What is a cohort analysis?

A cohort analysis is customer segmentation pertaining to a specific action or time period.

What is cross validation in machine learning?

Cross validation is the use of resampling to ensure accuracy in machine learning (ML) through methods such as bootstrapping, permutation testing, jackknife and Monte Carlo.

What is customer segmentation?

Customer segmentation is the separation of customers into groups, which is often for purposes such as lookalike modeling or business experimentation, based on common traits identified in customer data, campaign data and transaction data through machine learning(ML). → cohort analysis

What is data augmentation?

Data augmentation is the artificial generation of data for machine learning (ML) by making minor changes to existing data, which is a process increasingly conducted by generative AI (genAI).

What is deep learning (DL)?

Deep learning (DL) is a form of machine learning (ML) using a neural network (NN) with more than one layer of artificial neurons between its input layer and output layer. → large language model (LLM)

What is generative AI (genAI)?

Generative AI (genAI) is a form of artificial intelligence (AI) that goes beyond just mimicking humans in predictive AI to generate new content as if created by humans. → generative pre-trained transformer (GPT)

What is a generative pre-trained transformer (GPT)?

A generative pre-trained transformer (GPT) is a type of large language model (LLM) using generative AI (genAI).

What is hallucination in AI?

Hallucination is an incorrect or misleading output from an artificial intelligence (AI) system.

What is jackknife cross validation?

Jackknife is a form of cross validation that selectively omits data when resampling.

What is a large language model (LLM)?

A large language model (LLM) is a type of neural network (NN) based on deep learning (DL) that relies on text as input data. → natural language processing (NLP), generative pre-trained transformer (GPT)

What is linear regression?

Linear regression is a form of regression analysis, often used in supervised learning, that plots a best-fit line to see how changing one or more variables may continuously affect another.

What is logistic regression?

Logistic regression is a form of regression analysis, often used in supervised learning, that uses binary categories to see how changing one or more variables may affect the likelihood of A or B happening.

What is a lookalike audience?

A lookalike audience is a group of potential customers that is identified through lookalike modeling.

What is lookalike modeling?

Lookalike modeling is a use of machine learning (ML) to match a small seed audience of existing customers with a large reference audience of non-customers to establish a lookalike audience of potential customers.

What is machine learning (ML)?

Machine learning (ML) is a computer finding patterns in data based on training data from supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning, which is often a basis for artificial intelligence (AI). → deep learning (DL), tinyML

What is Monte Carlo cross validation?

Monte Carlo is a form of cross validation that involves repeated random resampling.

What is multivariate testing?

Multivariate testing is A/B testing conducted across more than two variables.

What is a natural experiment?

A natural experiment is a form of business experimentation where the object of the experiment is already naturally exposed to the ideas being tested.

What is natural language processing (NLP)?

Natural language processing (NLP) is a type of machine learning (ML) based on supervised learning that relies on text as input data for small-scale quality and efficiency that requires less processing power than a large language model (LLM).

What is a neural network (NN)?

A neural network, also known as an artificial neural network (ANN), is connections, based on the weighted importance of associated data, between artificial neurons from an input layer to one or more subsequent layers before reaching an output layer. → deep learning (DL), large language model (LLM)

What is permutation testing?

Permutation testing, also known as rerandomization testing, is a form of cross validation that randomly merges two or more samples based on all possible combinations when resampling.

What is predictive AI?

Predictive AI is a retronym for artificial intelligence (AI) to distinguish it from newer forms of generative AI (genAI).

What is a reference audience?

A reference audience is a larger group of non-customers that is combined with a smaller seed audience of existing customers for lookalike modeling.

What is regression analysis?

Regression analysis is a way to test variance resulting from the influence on dependent variables of independent variables, often used with supervised learning. → linear regression, logistic regression, analysis of variance (ANOVA)

What is reinforcement learning?

Reinforcement learning is a way to conduct machine learning (ML) using unlabeled datasets as inputs alongside performance metrics that guide learning through rewards and punishments. → supervised learning, semi-supervised learning, unsupervised learning

What is rerandomization testing?

Rerandomization testing is an alternative term for permutation testing.

What is resampling?

Resampling is taking alternative training data and validation data samples from a data set for cross validation.

What is a seed audience?

A seed audience is a smaller group of existing customers that is combined with a larger reference audience of non-customers for lookalike modeling.

What is semi-supervised learning?

Semi-supervised learning is a way to conduct machine learning (ML) using focused supervised learning to guide broader unsupervised learning. → reinforcement learning

What is split testing?

Split testing is an alternative term for A/B testing.

What is supervised learning?

Supervised learning is a way to conduct machine learning (ML) using labeled datasets as inputs and is often associated with linear regression and logistic regression. → unsupervised learning, semi-supervised learning, reinforcement learning

What is test & learn?

Test & Learn, also the registered trademark of Mastercard’s Test & Learn® platform, is a form of business experimentation involving small-scale tests to predict the success of an initiative when launched on a larger scale.

What is training data?

Training data is the part of a data set, separate from validation data, that trains a computer for machine learning (ML).

What is unsupervised learning?

Unsupervised learning is a way to conduct machine learning (ML) using unlabeled datasets as inputs. → supervised learning, semi-supervised learning, reinforcement learning

What is validation data?

Validation data is the part of a dataset that serves to avoid overfitting of machine learning (ML) to its training data. → cross validation

What is variance error?

Variance error is a measure of the spread of data from its mean, which results from unexamined variables during machine learning (ML). → bias error, bias–variance tradeoff, analysis of variance (ANOVA)

Related resources

Glossary