Glossary of AI Terms

Glossary of AI Terms

Glossary of AI Terms

Artificial Intelligence is transforming every field and it is becoming increasingly important for us to learn about AI. Let us dive in and learn about glossary of terms used when talking about artificial Intelligence.

Also Read: 50 AI Terms You Should Know

Active Learning 

When you have a large amount of data but lack labels for the entire dataset, you can design a system that queries a user for labels on-the-fly. Active learning is semi-supervised and is useful for when the cost of acquiring labels is high. You may have seen real world examples of active learning in software such as Facebook, where a user is asked to provide the tag for the photo. Negative tweet recognition on Twitter is another example of collecting a label – positive vs. negative tweet – and storing the data for training a model.

Ancillary price optimization

This is another approach designed to increase airline revenues through analytics-driven pricing. It allows data scientists to learn about a traveler’s tendency to buy ancillaries like baggage. Specialists define in which markets and on what days people are likely to pay more to check their bags. “For example, if I book tickets for three people with a child, then I’m ready to pay X euros more than if I flew alone somewhere on a weekend,” explains Konstantin Vandyshev.

Artificial Intelligence (AI) – an area of computer science that emphasizes the creation of intelligent machines that work and react like humans. Some of the activities computers with artificial intelligence are designed for include:

  1. Speech recognition
  2. Learning
  3. Planning
  4. Problem solving

Autonomous car

A vehicle that can guide itself without human conduction. This kind of vehicle has become a concrete reality and may pave the way for future systems where computers take over the art of driving.

Big Data

A term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

Branch and Bound (BB, B&B, or BnB)

An algorithm design paradigm for discrete and combinatorial optimization problems, as well as mathematical optimization. A branch-and-bound algorithm consists of a systematic enumeration of candidate solutions by means of state space search: the set of candidate solutions is thought of as forming a rooted tree with the full set at the root. The algorithm explores branches of this tree, which represent subsets of the solution set. Before enumerating the candidate solutions of a branch, the branch is checked against upper and lower estimated bounds on the optimal solution and is discarded if it cannot produce a better solution than the best one found so far by the algorithm.

British Museum Algorithm

A general approach to find a solution by checking all possibilities one by one, beginning with the smallest. The term refers to a conceptual, not a practical, technique where the number of possibilities are enormous. For instance, one may, in theory, find the smallest program that solves a particular problem in the following way: Generate all possible source codes of length one character. Check each one to see if it solves the problem. (Note: the halting problem makes this check troublesome.) If not, generate and check all programs of two characters, three characters, etc. Conceptually, this finds the smallest program, but in practice it tends to take an unacceptable amount of time (more than the lifetime of the universe, in many instances). Similar arguments can be made to show that optimizations, theorem proving, language recognition, etc. is possible or impossible.

Source: enacademic

Catastrophic Forgetting

The tendency of an AI to entirely and abruptly forget information it previously knew after learning new information, essentially overwriting past knowledge with new knowledge.


Classification is a technique by which you determine to what group a certain observation belongs, such as when biologists categorize plants, animals, and other lifeforms into different taxonomies. It is one of the primary uses of data science and machine learning.

In order to determine the correct category for a given observation, machine learning technology does the following:

  1. Applies a classification algorithm to identify shared characteristics of certain classes.
  2. Compares those characteristics to the data you’re trying to classify.
  3. Uses that information to estimate how likely it is that observation belongs to a particular class.


In clustering or unsupervised learning, the target features are not given in the training examples. The aim is to construct a natural classification that can be used to cluster the data. The general idea behind clustering is to partition the examples into clusters or classes. Each class predicts feature values for the examples in the class. Each clustering has a prediction error on the predictions. The best clustering is the one that minimizes the error. In hard clustering, each example is placed definitively in a class. The class is then used to predict the feature values of the example. The alternative to hard clustering is soft clustering, in which each example has a probability distribution over its class. The prediction of the values for the features of an example is the weighted average of the predictions of the classes the example is in, weighted by the probability of the example being in the class.

Convolutional Neural Networks

These are deep artificial neural networks that are used primarily to classify images (e.g. name what they see), cluster them by similarity (photo search), and perform object recognition within scenes. They are algorithms that can identify faces, individuals, street signs, tumors, platypuses and many other aspects of visual data.

Data Labeling

One of the cornerstones of AI. Without labeling the data, computers would be unable to understand what they are looking at. Kind of the like pre-language children who can’t remember anything because they have no words to attach to images and therefor gain no understanding or knowledge, so it is with computers. Without labeling data no learning can occur.

Deep Learning

A type of machine learning that trains a computer to perform human-like tasks, such as recognizing speech, identifying images or making predictions. Instead of organizing data to run through predefined equations, deep learning sets up basic parameters about the data and trains the computer to learn on its own by recognizing patterns using many layers of processing.

Dimensionality Reduction

Dimensionality reduction reduces the number of features in a dataset without having to lose much information and keep (or improve) the model’s performance. It’s a powerful way to deal with huge datasets and can be done in two different ways:

  1. By only keeping the most relevant variables from the original dataset (this technique is called feature selection)
  2. By finding a smaller set of new variables, each being a combination of the input variables, containing basically the same information as the input variables (this technique is called dimensionality reduction)
  3. Benefits of applying dimensionality reduction to a dataset include:
    1. Space required to store the data is reduced as the number of dimensions comes down.
    2. Less dimensions lead to less computation/training time.
    3. Some algorithms do not perform well when we have a large dimensions. So reducing these dimensions needs to happen for the algorithm to be useful.
    4. It takes care of multicollinearity by removing redundant features. For example, you have two variables – ‘time spent on treadmill in minutes’ and ‘calories burnt’. These variables are highly correlated as the more time you spend running on a treadmill, the more calories you will burn. Hence, there is no point in storing both as just one of them does what you require.
    5. It helps in visualizing data. As discussed earlier, it is very difficult to visualize data in higher dimensions so reducing our space to 2D or 3D may allow us to plot and observe patterns more clearly

Expert System

In artificial intelligence, an expert system is a computer system that emulates the decision-making ability of a human expert. Expert systems are designed to solve complex problems by reasoning through bodies of knowledge, represented mainly as if–then rules rather than through conventional procedural code. The first expert systems were created in the 1970s and then proliferated in the 1980s. Expert systems were among the first truly successful forms of artificial intelligence (AI) software. However, some experts point out that expert systems were not part of true artificial intelligence since they lack the ability to learn autonomously from external data. An expert system is divided into two subsystems: the inference engine and the knowledge base. The knowledge base represents facts and rules. The inference engine applies the rules to the known facts to deduce new facts. Inference engines can also include explanation and debugging abilities.

Facial Recognition Technology

Face recognition is a method of identifying or verifying the identity of an individual using their face. Face recognition systems can be used to identify people in photos, video, or in real-time. Law enforcement may also use mobile devices to identify people during police stops.

Source: Electronic Frontier Foundation

Feature Classification

The grouping of features based on some criteria. Sometimes feature classification might also be related to feature selection which is to select a subset of the extracted features that would optimize the machine learning algorithm and possible reduce noise removing unrelated features.

Source: Researchgate

Feature Extraction

The process of collecting discriminative information from a set of samples.

Source: Researchgate

Feedback Loop  

A system for improving a product, process, etc. by collecting and reacting to users’ comments.

Source: Cambridge dictionary

Fraud Detection

One of the chief uses of deep learning in enterprise is fraud and anomaly detection. Anomaly detection is a broad term referring to any set of unusual activities, including network security breaches, extraordinary transactions or even mechanical breakdowns. Any behavior that be digitized or measured numerically, including machine performance, is subject to anomaly detection. Fraud detection is a good example of anomaly detection for many reasons, the first being that it is incredibly costly. Fraudulent transactions are estimated to cost U.S. banks up to $11 billion per year, so it’s a problem that a lot of people want to solve.


Game Theory 

Belongs to a family of theories often subsumed under the umbrella term Rational Choice Theory. All these theories (in particular, decision theory, game theory and social choice theory) discuss conditions under which agents’ actions, or at least their decision to act, can be said to be rational. Depending on how these conditions are interpreted, Rational Choice theory may have a positive or a normative function: it may contribute to the prediction and explanation of agent behavior, or it may contribute to advising agents what they should do. Many of the purported functions of Rational Choice theory are controversial; as a part of it, game theory is affected by these controversies, in particular its usefulness for the social sciences.

Source: Internet Encyclopedia of Philosophy

Goldilocks Principle 

The idea that there is an ideal amount of some measurable substance, an amount in the middle or mean of a continuum of amounts, and that this amount is “just right” for a life-supporting condition to exist. The analogy is based on the children’s story, The Three Bears, in which a little girl named Goldilocks tastes three different bowls of porridge (ZIRP, QE, NIRP?), and she finds that she prefers porridge which is neither too hot nor too cold but has just the right temperature.


Generally speaking, a heuristic is a “rule of thumb,” or a good guide to follow when making decisions. In computer science, a heuristic has a similar meaning, but refers specifically to algorithms. When programming software, computer programmers aim to create the most efficient algorithms to accomplish various tasks. These may include simple processes like sorting numbers or complex functions such as processing images or video clips. Since these functions often accept a wide range of input, one algorithm may perform well in certain cases, while not very well in others.


Intelligent Agents 

A type of software application that searches, retrieves and presents information from the Internet. This application automates the process of extracting data from the Internet, such as information selected based on a predefined criterion, keywords or any specified information/entity to be searched. Intelligent agents are often used as Web browsers, news retrieval services and online shopping. An intelligent agent may also be called an agent or bot.


Junction Tree Algorithm

A general algorithmic framework, which provides an understanding of the general concepts that underlie inference. The general problem here is to calculate the conditional probability of a node or a set of nodes, given the observed values of another set of nodes. We have treated a number of inferential calculations in graphical models, but all of them are special cases. The idea of junction tree algorithm is to find ways to decompose a global calculation on a joint probability into a linked set of local computations. The key point of this approach is the concept of locality. A particular data structure – the junction tree – is introduced to make explicit the important relationship between graph-theoretic locality and efficient probabilistic inference.


Knowledge-based Systems

These are considered to be a major branch of artificial intelligence. They are capable of making decisions based on the knowledge residing in them, and can understand the context of the data that is being processed. Knowledge-based systems broadly consist of an interface engine and knowledge base. The interface engine acts as the search engine, and the knowledge base acts as the knowledge repository. Learning is an essential component of knowledge-based systems and simulation of learning helps in the betterment of the systems.

Knowledge-based systems can be broadly classified as CASE-based systems, intelligent tutoring systems, expert systems, hypertext manipulation systems and databases with intelligent user interface.

Compared to traditional computer-based information systems, knowledge-based systems have many advantages. They can provide efficient documentation and also handle large amounts of unstructured data in an intelligent fashion. Knowledge-based systems can aid in expert decision making and allow users to work at a higher level of expertise and promote productivity and consistency. These systems are considered very useful when expertise is unavailable, or when data needs to be stored for future usage or needs to be grouped with different expertise at a common platform, thus providing large-scale integration of knowledge. Finally, knowledge-based systems are capable of creating new knowledge by referring to the stored content. The limitations of knowledge-based systems are the abstract nature of the concerned knowledge, acquiring and manipulating large volumes of information or data, and the limitations of cognitive and other scientific techniques. 

Source: Techopedia

Labeled Data 

Labeled data is a group of samples that have been tagged with one or more labels. Labeling typically takes a set of unlabeled data and augments each piece of that unlabeled data with meaningful tags that are informative. For example, labels might indicate whether a photo contains a horse or a cow, which words were uttered in an audio recording, what type of action is being performed in a video, what the topic of a news article is, what the overall sentiment of a tweet is, whether the dot in an x-ray is a tumor, etc.

Source: Wikipedia

Machine Learning 

Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.

Source: SAS

Natural Intelligence   

Natural intelligence (NI) is the opposite of artificial intelligence: it is all the systems of control that are not artifacts, but rather are present in biology. Normally when we think of NI we think about how animal or human brains function, but there is more to natural intelligence than neuroscience. Nature also demonstrates non-neural control in plants and protozoa, as well as distributed intelligence in colony species like ants, hyenas and humans. Our behaviour co-evolves with the rest of our bodies, and in response to our changing environment. Understanding natural intelligence requires understanding all of these influences on behaviour, and their interactions.


Natural Language Processing (NLP) 

Natural language refers to language that is spoken and written by people, and natural language processing (NLP) attempts to extract information from the spoken and written word using algorithms. NLP encompasses active and a passive modes: natural language generation (NLG), or the ability to formulate phrases that humans might emit, and natural language understanding (NLU), or the ability to build a comprehension of a phrase, what the words in the phrase refer to, and its intent. In a conversational system, NLU and NLG alternate, as algorithms parse and comprehend a natural-language statement, and formulate a satisfactory response to it.


Nearest Neighbors   

The k-Nearest-Neighbors (kNN) method of classification is one of the simplest methods in machine learning, and is a great way to introduce yourself to machine learning and classification in general. At its most basic level, it is essentially classification by finding the most similar data points in the training data, and making an educated guess based on their classifications. Although very simple to understand and implement, this method has seen wide application in many domains, such as in recommendation systems, semantic searching, and anomaly detection.

Source: Toward Data Science

Neural Network 

Neural networks are computing systems with interconnected nodes that work much like neurons in the human brain. Using algorithms, they can recognize hidden patterns and correlations in raw data, cluster and classify it, and – over time – continuously learn and improve. 

There are different kinds of deep neural networks – and each has advantages and disadvantages, depending upon the use. Examples include:

  1. Convolutional neural networks (CNNs) contain five types of layers: input, convolution, pooling, fully connected and output. Each layer has a specific purpose, like summarizing, connecting or activating. Convolutional neural networks have popularized image classification and object detection. However, CNNs have also been applied to other areas, such as natural language processing and forecasting.
  2. Recurrent neural networks (RNNs) use sequential information such as time-stamped data from a sensor device or a spoken sentence, composed of a sequence of terms. Unlike traditional neural networks, all inputs to a recurrent neural network are not independent of each other, and the output for each element depends on the computations of its preceding elements. RNNs are used in fore­casting and time series applications, sentiment analysis and other text applications.
  3. Feedforward neural networks, in which each perceptron in one layer is connected to every perceptron from the next layer. Information is fed forward from one layer to the next in the forward direction only. There are no feedback loops.
  4. Autoencoder neural networks are used to create abstractions called encoders, created from a given set of inputs. Although similar to more traditional neural networks, autoencoders seek to model the inputs themselves, and therefore the method is considered unsupervised. The premise of autoencoders is to desensitize the irrelevant and sensitize the relevant. As layers are added, further abstractions are formulated at higher layers (layers closest to the point at which a decoder layer is introduced). These abstractions can then be used by linear or nonlinear classifiers.

Source: SAS

Occam’s Razor 

Also spelled Ockham’s razor, also called law of economy or law of parsimony, principle stated by the Scholastic philosopher William of Ockham (1285–1347/49) that pluralitas non est ponenda sine necessitate, “plurality should not be posited without necessity.” The principle gives precedence to simplicity: of two competing theories, the simpler explanation of an entity is to be preferred. The principle is also expressed as “Entities are not to be multiplied beyond necessity.”

Source: Britannica

Predictive Asset Maintenance

Predictive Maintenance (PdM) is the servicing of equipment when it is estimated that service will be required. Maintaining machinery and electronics is most cost-effective if done when it is needed, within a certain tolerance.

Source: Quora

Query Language

Refers to any computer programming language that requests and retrieves data from database and information systems by sending queries. It works on user entered structured and formal programming command based queries to find and extract data from host databases.

Source: Techopedia

Reinforcement Learning 

Reinforcement learning analyzes and optimizes the behavior of an agent based on the feedback from the environment. Machines try different scenarios to discover which actions yield the greatest reward, rather than being told which actions to take. Trial-and-error and delayed reward distinguishes reinforcement learning from other techniques.

Source: SAS

Revenue Management 

The application of data and analytics aimed at defining how to sell a product to those who need it, at a reasonable cost at the right time and using the right channel. It’s based on the idea that customers perceive product value differently, so the price they are ready to pay for it depends on target groups they belong to and purchase time. Revenue management specialists make good use of AI to define destinations and adjust prices for specific markets, find efficient distribution channels, and manage seats to keep the airline simultaneously competitive and customer-friendly.

Source: Alexsoft

Supervised Learning  

In supervised learning, the machine is taught by example. The operator provides the machine learning algorithm with a known dataset that includes desired inputs and outputs, and the algorithm must find a method to determine how to arrive at those inputs and outputs. While the operator knows the correct answers to the problem, the algorithm identifies patterns in data, learns from observations and makes predictions. The algorithm makes predictions and is corrected by the operator – and this process continues until the algorithm achieves a high level of accuracy/performance.

Under the umbrella of supervised learning fall: classification, regression and forecasting.

  • Classification: In classification tasks, the machine learning program must draw a conclusion from observed values and determine to what category new observations belong. For example, when filtering emails as spam or not spam, the program looks at existing observational data and filter the emails accordingly.
  • Regression: In regression tasks, the machine learning program must estimate – and understand – the relationships among variables.
  • Regression analysis focuses on one dependent variable and a series of other changing variables – making it particularly useful for prediction and forecasting. Forecasting: Forecasting is the process of making predictions about the future based on the past and present data, and is commonly used to analyze trends.

Source: SAS

Semi-supervised Learning 

The challenge with supervised learning is that labeling data can be expensive and time consuming. If labels are limited, you can use unlabeled examples to enhance supervised learning. Because the machine is not fully supervised in this case, we say the machine is semi-supervised. With semi-supervised learning, you use unlabeled with a small amount of labeled data to improve the learning accuracy.

Source: SAS

Simon’s Ant 

“An ant, viewed as a behaving system, is quite simple. The apparent complexity of its behavior over time is largely a reflection of the complexity of the environment in which it finds itself.” — Herbert Simon (Simon’s Law)

A complex problem often signals a simple solution’s existence because the problem contains all the complexity. A seemingly simple problem may indicate that only complex solutions exist. Simon’s Ant reminds us to always take the environment into consideration when analyzing a problem. In other words: understand both the content and context of any problem. Most people believe that complex problems require equally complex solutions. But the inverse is often more accurate when context is considered. Just because a system’s environment is complex does not mean that the systems operating within it must be complex as well.

Simon’s Law is about an ant on a beach looking for food. If you were to graph the ant’s path it would look swervy and complex. If you saw this line with no other context, you’d think to yourself: “some ant.” If however you had in your possession a corresponding picture of the beach, you would realize that there is nothing special about the ant at all.

Source: Sean Newman Maroni Blog


A Python-friendly open source library for numerical computation that makes machine learning faster and easier.

Source: Inforworld

Turing Test 

In artificial intelligence (AI), a Turing Test is a method of inquiry for determining whether or not a computer is capable of thinking like a human being. The test is named after Alan Turing, an English mathematician who pioneered machine learning during the 1940s and 1950s.

Source: Tech Target

Unsupervised Learning 

When performing unsupervised learning, the machine is presented with totally unlabeled data. It is asked to discover the intrinsic patterns that underlies the data, such as a clustering structure, a low-dimensional manifold, or a sparse tree and graph.

Clustering: Grouping a set of data examples so that examples in one group (or one cluster) are more similar (according to some criteria) than those in other groups. This is often used to segment the whole dataset into several groups. Analysis can be performed in each group to help users to find intrinsic patterns.

Dimension reduction: Reducing the number of variables under consideration. In many applications, the raw data have very high dimensional features and some features are redundant or irrelevant to the task. Reducing the dimensionality helps to find the true, latent relationship. 

Source: SAS

Voice Recognition 

Voice or speaker recognition is the ability of a machine or program to receive and interpret dictation or to understand and carry out spoken commands. Voice recognition has gained prominence and use with the rise of AI and intelligent assistants, such as Amazon’s Alexa, Apple’s Siri and Microsoft’s Cortana.

Source: Techtarget

Voice search is a speech recognition technology that allows users to search by saying terms aloud rather than typing them into a search field. The proliferation of smart phones and other small, Web-enabled mobile devices has spurred interest in voice search. Applications of voice search include:

  1. Making search engine queries.
  2. Clarifying specifics of the request.
  3. Requesting specific information, such as a stock quote or sports score.
  4. Launching programs and selecting options.
  5. Searching for content in audio or video files.
  6. Voice dialing.

Although voice search is usually built as a software application, it can also be built as a service. Voice search applications such as Google Mobile App with Voice and Vlingo for iPhone rely on speech recognition programs. The free voice search service ChaCha, however, uses another approach. ChaCha employs human beings, called guides, to look up queries and provide search results. According to a July 2009 study by MSearchGroove, the accuracy of search results from ChaCha’s guides was much higher than those from either speech recognition program.

Source: Tech Target

Volume, Variety, Velocity and Veracity – The Big Four of Big Data.

Weak A.I.  

Our current level of A.I., which can do just one thing at a time, like play chess or recognize breeds of cats. The opposite would be strong A.I., also known as artificial general intelligence (A.G.I.), which would have the capability to do anything that most humans can do. 

Also Read: Introduction to Machine Learning Algorithms

Willingness to Pay 

Collecting and crunching data about customers, airlines understand passengers’ tastes and behavior well enough to offer them transportation options they prefer and, more important, are ready to spend money on. So, revenue managers start from measuring willingness to pay (WTP). Willingness to pay reveals “when” a customer is likely to pay “a maximum price” for a product or service, explains the data scientist. “It’s assumed that customers are ready to pay more when there is less time before departure time. And society finds this pricing fair. WTP in the airline industry, therefore, depends on the day before departure (DBD). In practice, specialists define median WTP — a price that 50 percent of customers would like to pay for a ticket on a specific DBD. Such WTP is equivalent to price elasticity (the number of passengers that would buy a ticket if a price drops by a certain percent) with some assumptions between market demand and supply.”

This metric is connected to dynamic pricing — the practice of pricing a product based on a specific customer’s willingness to pay. The calculation of WTP requires selecting data correctly. Revenue management can combine similar markets and, alternatively, distinguish high and low seasons, as well as holidays and weekends.

“Approaches to this type of statistical analysis were developed nearly 10 years ago. These days, it’s easier to conduct research and present its results thanks to the development of data science and visualization capabilities. Considering that each case is unique, it’s very important to choose the right amount of data to extract insights from,” concludes Konstantin. 

Source: Alexsoft


Yellowbrick Data, founded in 2014, has developed a system architecture that’s based on flash memory hardware and software developed to handle native flash memory queries. The appliance includes integrated CPU, storage and networking with data moving directly from flash memory to the CPU. The system’s modular design can scale up to handle petabytes of data by adding analytic nodes.

The system includes an analytic database designed for flash memory, able to handle high-volume data ingestion and processing, and capable of running mixed workloads of ad hoc queries, large batch queries, reporting, ETL (extract, transform and load) processes and ODBC inserts. The company says its system operates 140 times faster than conventional data warehouse systems for such tasks as retail and advertising analytics, security analysis and fraud detection, financial trading analysis, electronic health records processing and other applications.

The Yellowbrick Data Warehouse appliance occupies as little as 3 percent of the physical space of a legacy data warehouse, according to the company. It runs on premise, but also supports hybrid and private loud environments, co-location and edge-computing networks.



An open source Apache project that provides centralized infrastructure and services that enable synchronization across an Apache Hadoop cluster. ZooKeeper maintains common objects needed in large cluster environments. Examples of these objects include configuration information, hierarchical naming space, and so on. Applications leverage these services to coordinate distributed processing across large clusters.

Source: IBM