What is Argmax in Machine Learning?

What is Argmax in Machine Learning?

Introduction: What Is Argmax in Machine Learning?

In applied machine learning, you may encounter Argmax as a mathematical function. When describing algorithms, you may see terms like “argmax” or “argmax” in research papers. Your algorithm implementation may also involve using the argmax function.

Also Read: What Are Word Embeddings?

You may be unfamiliar with the argmax function and wonder what it is and how it works.

  • By using Argmax, you can determine which argument gives the highest value to a target function.
  • In machine learning, argmax is commonly used to find the class with the highest predicted probability.
  • In practice, the argmax() NumPy function is preferred over manual implementation of argmax. NumPy library is also preferred part of your project.

Also Read: Best Text Annotation Datasets and Tools for Computer Vision to Watch Out For In 2022. 

What Is Argmax?

The argmax function is a mathematical function. Typically, it is applied to functions that take arguments. As an example, given a function g() that takes the argument x, the argmax operation of that function would be as follows:

  • result = argmax(g(x))

argmax returns the argument or arguments (arg) for the target function that return the maximum value (max) from the target function.

Source: YouTube

For instance, if g(x) is calculated as the square of the x value and x is limited to integers from 1 to 5:

  • g(1) = 1^2 = 1
  • g(2) = 2^2 = 4
  • g(3) = 3^2 = 9
  • g(4) = 4^2 = 16
  • g(5) = 5^2 = 25

We can intuitively see that the argmax for the function g(x) is 5.

That is, the number (5) to the function g(x) that results in the largest value is 25. Argmax is a shorthand for specifying this parameter in an abstract way without having to know what its value is in a specific case.

  • argmax(g(x)) = 5

Note that this is not the max() of the values returned from function. This would be 25.

It is also not the max() of the arguments, though in this case, the argmax and max of the arguments are the same, e.g. 5. The argmax() of 5 is because g returns the largest value (25) when 5 is provided, not because 5 is the largest argument.

Typically, “argmax” is written as two separate words, e.g. “arg max“. For example:

  • result = arg max(g(x))

The arg max function is also commonly used as an operation without brackets surrounding the target function. The operation is often written and used in a research paper or textbook in this way. For example:

  • result = arg max g(x)

You can also use argmin or “arg min” to find the arguments to the target function that result in the minimum value from the target function.

In this exercise, probability values are important.

Also Read: How to Label Images Properly for AI: Top 5 Challenges & Best Practices.

How Is Argmax Used in Machine Learning?

In mathematics and machine learning, the argmax function is widely used. You may nevertheless need to implement argmax yourself in some circumstances where argmax is used in applied machine learning.

When applying machine learning, you will most likely encounter the need to use argmax in order to find the index of an array that yields the largest value. Recall that an array is a list or vector of numbers.

A multi-class classification model predicts a vector of probabilities (or probabilities-like values), with one probability for each class label. Probabilities represent the probability that a sample belongs to each class label.

Predicted probabilities are grouped in such a way that predicted probabilities at index 0 belong to the first class, predicted probabilities at index 1 belong to the second class, and so forth. It is often necessary to make a single class label prediction for a multi-class classification problem from a set of predicted probabilities.

It is most often described and implemented using the argmax function and argmax values, which converts a vector of predicted probabilities into a class label.

Let’s make this concrete with an example.

Consider a multi-class classification problem with three classes: “red“, “blue,” and “green.” The class labels are mapped to integer values for modeling, as follows:

  • red = 0
  • blue = 1
  • green = 2

Each class label integer values maps to an index of a 3-element vector that may be predicted by a model specifying the likelihood that an example belongs to each class.

Consider a model has made one prediction for an input sample and predicted the following vector of probabilities in numeric values in array format or array of indices: please do not use a huge array.

  • yhat = [0.4, 0.5, 0.1]

We can see that the example has a 40 percent probability of belonging to red, a 50 percent probability of belonging to blue, and a 10 percent probability of belonging to green.

We can apply the argmax function to the vector of probabilities. The vector is the function, the output of the function is the probabilities, and the input to the function is a vector element index or an array index.

  • arg max yhat

We can intuitively see that in this case, the argmax of the vector of predicted probabilities (yhat) is 1, as the probability at array index 1 is the largest value.

Note that this is not the max() of the probabilities, which would be 0.5. Also note that this is not the max of the arguments, which would be 2. Instead it is the argument that results in the maximum value, e.g. 1 that results in 0.5.

  • arg max yhat = 1

We can then map this integer value back to a class label, which would be “blue.”

  • arg max yhat = “blue”

Also Read: How do you enable better programming culture in teams?


  • Argmax determines the best argument from a target function that gives the maximum value.
  • In machine learning, Argmax is commonly used to find the class with the largest predicted probability.
  • Argmax can be implemented manually, but the argmax() NumPy function is preferable in practice.
  • Community for developers love working with argmax.
  • Additional reading required on the following topics.
    • cost function
    • domain function
    • exponential function
    • function def
    • function for reinforcement learning
    • function outputs
    • Function Syntax
    • function values
    • continuous real-valued function
    • common reduction functions
    • neural network image processing
    • neural network prediction
    • common reduction operation
    • mathematical operation