In this tutorial, we’ll explain the definitions of several object recognition tasks and illustrate their main differences. Furthermore, we’ll present the state-of-the-art machine learning models designed to address these computer vision tasks.
2. What Is Object Recognition?
Object recognition is a generic term to indicate a set of computer vision tasks for identifying objects in digital images. We can distinguish several tasks that belong to the object recognition field: image classification, object localization, object detection, semantic segmentation, and instance segmentation. Nowadays, machine learning and deep learning techniques are the best way to perform object recognition tasks.
3. Object Recognition Tasks
Let’s now analyze in detail the main object recognition tasks.
3.1. Image Classification
Image classification deals with assigning a single label to an image. Hence, an image classification system takes as input an image with one or more objects and returns a single label. For example, given an image depicting a room, an image classification model would return one of the following labels: “kitchen”, “bathroom”, “living room”.
3.2. Object Localization
Object localization deals with localizing objects by indicating their bounding boxes. Hence, an object localization algorithm takes as input an image and returns a set of bounding boxes, each one representing the location of a detected object. The bounding box is generally defined by a point (the centre or the top-left coordinate), the width and the height of the rectangle. An object localization system doesn’t return any label related to the category of the object.
An example of object localization is shown in the following figure:
3.3. Object Detection
Object detection is the task of locating objects in an image with bounding boxes and assigning a class label for each detection. Hence, an object detection system takes as input an image, with one or more objects, and returns a set of bounding boxes and the corresponding class labels:
3.4. Semantic Segmentation
Semantic segmentation deals with assigning a class label for each pixel of an image. It is worth noting that semantic segmentation cannot separate different instances of the same class. In other words, if there are two or more objects of the same class in the image, the semantic segmentation system will return a single mask including all the objects of the same class:
3.5. Instance Segmentation
Instance segmentation deals with classifying each pixel of an image and, unlike semantic segmentation, it can distinguish different instances of the same class. An instance segmentation system generates a mask for each instance of a particular class. Hence, it can be considered as the combination of semantic segmentation and object detection:
In this article, we reviewed the main tasks of object recognition. We illustrated the differences among them and we mentioned the machine learning models designed to address these computer vision tasks.