Machine Learning: How to Format Images for Training

1. Introduction

In this tutorial, we’ll talk about formatting image data for machine learning. We’ll present some standard image formatting techniques and explain why we need them in the first place.

Image processing is converting an image to a specific digital format and extracting usable information from it. Its purpose is to facilitate learning when training machine-learning models using image data. For example, we may want to make images smaller to speed up training.

2. Formatting Techniques

Now, we’ll talk about the five most common image formatting techniques.

2.1. Resizing

Resizing an image means making it bigger or smaller without removing any content. The image’s dimensions change as we resize it, which affects its size and quality.

Resizing is the essential step for convolutional neural networks. Typically, the original images, such as those we get from a surveillance camera, are relatively large. Training the network model on them could be faster and more feasible. So, how do we make those large images fit our network?

One simple solution is to resize the original image to the network model input size. For example:

Resizing an image typically entails stretching and compressing it to a specific size. Height and width can be changed separately. The image will appear deformed if the height-to-weight ratio is different in the original and new dimensions, as in the example above.

We can also keep the ratio and resize only one dimension. The other will be automatically take the value that maintains the ratio.

2.2. Cropping

To clean up the picture, improve framing, adjust the aspect ratio, or emphasize or isolate the subject from the background, we typically remove parts of the peripheral areas. This is what we call cropping. It means keeping only the regions we are interested in. For instance:

2.3. Flipping

When flipping an image, we rotate it about its horizontal or vertical axes. The flipping occurs on the vertical axis in a horizontal flip and on the horizontal axis in a vertical flip:

Flipping produces extra data to feed to our machine learning model. When flipping images, we’re producing their multiple versions without going through the time-consuming process of gathering and labeling them.

2.4. Adjusting the Contrast

Brightness and contrast are two facets of the same concept. The former affects the dark areas of the image, while the latter setting affects the bright spots. We lose the fine details in bright photographs if we set the contrast too high. On the other hand, the entire image will look lifeless and flat if we set it too low. For instance:

2.5. Rotating

Random rotation is a popular method of image data formatting. The position of an object in the frame is altered by randomly rotating a source image for a certain number of degrees (clockwise or counter-clockwise):

However, it’s essential to notice the difference between image rotation and flipping.

When we rotate an object, it maintains the same face facing us while moving left or right around an axis. When we flip it vertically or horizontally, we get a mirror image.

3. Conclusion

In this article, we talked about resizing, cropping, flipping, rotating, and changing the contrast of an image. The formatting of image data is undoubtedly one of the common issues while working with machine learning projects.

We need these image formatting techniques for better model training. For example, we make images smaller to speed up training or diversify the dataset by producing variations of our images, which is usually faster than collecting the variations.

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Core Concepts

Operating Systems

Neural Networks

Graph Theory

Latex

Full Archive

About Baeldung