1. Introduction

In this tutorial, we’ll review pre-training in a neural network: what it is, how it’s accomplished, and in what ways it’s used. Lastly, we’ll examine pre-training neural networks’ benefits and drawbacks.

2. Pre-training

In simple terms, pre-training a neural network refers to first training a model on one task or dataset. Then using the parameters or model from this training to train another model on a different task or dataset. This gives the model a head-start instead of starting from scratch.

Suppose we want to classify a data set of cats and dogs. We came up with a machine learning model, ml for this classification task. Once ml is done training, we save it along with all its parameters. Now suppose we have another task to accomplish: object detection. Instead of training a new model from scratch, we use ml on the object detection dataset. We refer to this as pre-training.

3. How To Pre-train?

The most crucial aspect of pre-training neural networks is the task at hand. Specifically, the task from which the model initially learns must be similar to the task the model is used for in future. We can’t train a model in weather forecasting and then, later on, use it for object detection.

Now pre-training a neural network entails four basic steps:

  1. We have a machine learning model mm and datasets A and B
  2. Train mm with dataset A
  3. Before training the model on dataset B, initialize some of the parameters of mm with the model which is trained on A
  4. Train mm on B

4. Applications of Pre-training

The applications of pre-training can be grouped into three categories: Transfer Learning, classification, and feature extraction.

4.1. Transfer Learning

Transfer Learning refers to using the knowledge gained from one machine learning problem in another one. For instance, using the knowledge gained from cat/dog detection to detect buildings. The main component of transfer learning is using pre-trained models to gather knowledge from one task and apply it to other tasks.

Most importantly, Transfer Learning is considered a giant leap for AI developers as it allows us to develop applications faster and in a more efficient way:

Transfer learning to a different task

4.2. Classification

Alternately, pre-trained models can also be applied to classification tasks e.g., image classification. Image classification refers to the task of identifying what a given image represents. There are a number of pre-trained models out there today trained specifically for image classification. These models have been trained on large image datasets and hence can be used for any image classification task.

4.3. Feature Extraction

On the contrary, feature extraction using pre-trained models entails using a pre-trained model to extract meaningful features using a pre-trained model. The features extracted can then be used as input to another model.

5. Benefits and Drawbacks to Pre-training

Pre-training neural network models allow for efficient model development. Although mostly beneficial, there are some drawbacks that exist with using pre-trained models. Let’s review the advantages and disadvantages in the subsequent section:

5.1. Benefits

If we consider the benefits first, the most notable benefit of pre-training is the ease of use. Suppose we’ve got a machine learning task to work on. All we need to do is to find a pre-trained model that was trained in similar tasks and apply it to the task we’re working on. There is no need to build a model from scratch.

Pre-training allows models to be optimized quickly. This means that a model can achieve optimal performance quicker if a pre-trained model is used. A model that has a head start in knowing which parameters are likely to achieve good results can be optimized faster compared to starting from scratch.

Furthermore, pre-trained models have the benefit of not needing as much data as building a model from scratch. This is because most pre-trained models available on the internet to date have been trained on extremely large datasets. Hence, using such a model for a different task would require lesser data to converge.

5.2. Drawbacks

Although beneficial, pre-training must be applied with a bit of caution. Firstly, we won’t always get good results as achieved by the pre-trained model. Several factors may cause this to happen. For instance, using a dataset from a completely different domain might not get the same results. Additionally, the network parameters, train-test split ratio and hardware for training used are deciding factors.

Furthermore, fine-tuning pre-trained models can be a difficult task. They require time and CPU resources to effectively fine-tune them.

6. Examples of Pre-trained Models

There are several pre-trained models used in industry and academia to date. Each of these achieves different performance levels and is used for different tasks. Some well-known examples of Computer Vision are:

  • VGG-16
  • ResNet50
  • Inceptionv3
  • EfficientNet

Some popular pre-trained models for Natural Language Processing (NLP) tasks:

  • GPT-3
  • BERT
  • ELMo
  • XLNet
  • ALBERT

It is important to also note that most of these pre-trained models are available in popular machine learning libraries such as TensorFlow, Keras, and PyTorch.

7. Conclusions

In this tutorial, we’ve reviewed pre-training in neural networks. Pre-trained neural network models are just models trained on one task and then used in a different task. To pre-train a neural network, we shall have an initial model and a dataset to train. The three main applications of pre-trained models are found in transfer learning, feature extraction, and classification.

In conclusion, pre-trained models are a fast and efficient way to build AI applications but do not always guarantee the same performance on different tasks.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.