Top-N Accuracy Metrics

1. Introduction

In machine learning, we often find ourselves measuring the accuracy of our models, but, are we doing it correctly?

In this tutorial, we’ll talk about the difference between Top-1 Accuracy and Top-N Accuracy, and why they’re important.

2. Top-1 Accuracy

Let’s say we have a model, which tries to classify images of animals. Let’s assume we show the model the image of a cat. Using Top-1 Accuracy, this measurement will consider a prediction as correct if and only if the most probable prediction is a cat.

Let’s expand our example to several predictions:

Given this example, our model predicted correctly 3/5 images, having an accuracy of 60%. As can be seen, Top-1 Accuracy is just what we generally refer to when talking about accuracy.

3. Top-N Accuracy

Top-N Accuracy takes the $N$ model predictions with higher probability. If one of them is a true label, it classifies the prediction as correct. Top-1 Accuracy is a special case, in which only the highest probability prediction is taken into account.

Let’s use the same example as before, assuming a Top-3 Accuracy:

Now, using the 3 most probable predictions, we can see that the model predicted correctly 4/5 images, having a Top-3 Accuracy of 80%.

Notice that, with $N>K$ , Top-N Accuracy $\geq$ Top-K Accuracy. In other words, with a higher $N$ , the Top-N Accuracy can either get higher or remain the same. This allows us to get insight into how our model works. For example, if the Top-1 Accuracy is really low we might think our model doesn’t know much about the dataset. However, if $N$ accuracy increases significantly, we can find that it is actually learning but is lacking some fine-tuning. This can be especially helpful for classification problems with a high number of classes. Depending on the problem, this metric might be more appropriate to measure the model. For example, in the case of a recommendation system. Whether it is for videos, music, or online shops, we value novelty and diversity. We, as a client, are looking for new and diverse videos, music, or products. Therefore, we do not aim to find the most relevant recommendation, but a set of interesting recommendations. It might be more interesting to have the best prediction among a set of interesting predictions; rather than just one good prediction.

4. Conclusion

There are several different methods to measure how good a model is. It is really important to find the most appropriate one for the given problem. In this article, we showed how Top-N Accuracy can be used for certain problems. Also, we’ve seen the difference between Top-1 Accuracy and Top-N Accuracy, and how they can be used to get a better understanding of our model.

Full Archive

About Baeldung

Core Concepts

Operating Systems

Artificial Intelligence

Graph Theory

Latex

1. Introduction

2. Top-1 Accuracy

3. Top-N Accuracy

4. Conclusion