1. Introduction

Bagging, boosting, and stacking belong to a class of machine learning algorithms known as ensemble learning algorithmsEnsemble learning involves combining the predictions of multiple models into one in an attempt to increase prediction performance.

In this tutorial, we’ll review the differences between bagging, boosting, and stacking.

2. Bagging

Bagging, also known as bootstrap aggregation, is an ensemble learning technique that combines the benefits of bootstrapping and aggregation to yield a stable model and improve the prediction performance of a machine learning model.

In bagging, we first sample equal-sized subsets of data from a dataset with bootstrapping, i.e., we sample with replacement. Then, we use those subsets to train several weak models independently. A weak model is one with low prediction accuracy. In contrast, strong models are very accurate. To get a strong model, we aggregate the predictions from all the weak models:

Steps in bagging showing the training data split into three different subsets and trained on three different models in parallel.

 

So, there are three steps:

  1. Sample equal-sized subsets with replacement
  2. Train weak models on each of the subsets independently and in parallel
  3. Combine the results from each of the weak models by averaging or voting to get a final result

The results are aggregated by averaging the results for regression tasks or by picking the majority class in classification tasks.

2.1. Algorithms That Use Bagging

The main idea behind bagging is to reduce the variance in a dataset, ensuring that the model is robust and not influenced by specific samples in the dataset.

For this reason, bagging is mostly applied to tree-based machine learning models such as decision trees and random forests.

2.2. Pros and Cons of Bagging

Here’s a quick summary of bagging:

Rendered by QuickLaTeX.com

3. Boosting

In boosting, we train a sequence of models. Each model is trained on a weighted training set. We assign weights based on the errors of the previous models in the sequence. 

The main idea behind sequential training is to have each model correct the errors of its predecessor. This goes on until the predefined number of models has been trained or some other criteria are met.

During training, instances that are classified incorrectly are assigned higher weights to give some form of priority when trained with the next model:

Steps in boosting showing the weighted training data trained on three different models in sequence.

 

Additionally, weaker models are assigned lower weights than strong models when combining their predictions into the final output.

So, we first initialize data weights to the same value and then perform the following steps iteratively:

  1. Train a model on all instances
  2. Calculate the error on model output over all instances
  3. Assign a weight to the model (high for good performance and vice-versa)
  4. Update data weights: give higher weights to samples with high errors
  5. Repeat the previous steps if the performance isn’t satisfactory or other stopping conditions are met

Finally, we combine the models into the one that we use for prediction.

3.1. Algorithms That Use Boosting

Boosting generally improves the accuracy of a machine learning model by improving the performance of weak learners. We typically use XGBoost, CatBoost, and AdaBoost.

These algorithms apply different boosting techniques and are most noted for achieving excellent performance.

3.2. Pros and Cons of Boosting

Boosting has many advantages but isn’t without shortcomings:

Rendered by QuickLaTeX.com

The decision to use boosting depends on whether the data aren’t noisy and our computational capabilities.

4. Stacking

In stacking, the predictions of base models are fed as input to a meta-model (or meta-learner). The job of the meta-model is to take the predictions of the base models and make a final prediction:

Steps in stacking showing the aggregation of the predictions of three different models trained on the training data

 

The base and meta-models don’t have to be of the same type. For example, we can pair a decision tree with a support vector machine (SVM).

Here are the steps:

  1. Construct base models on different portions of the training data
  2. Train a meta-model on the predictions from the base models

4.1. Pros and Cons of Stacking

We can summarize stacking as follows:

Rendered by QuickLaTeX.com

5. Differences Between Bagging, Boosting, and Stacking

The main differences between bagging, boosting, and stacking are in the approach, base models, subset selection, goals, and model combination:

Rendered by QuickLaTeX.com

The selection of the technique to use depends on the overall objective and task at hand. Bagging is best when the goal is to reduce variance, whereas boosting is the choice for reducing bias. If the goal is to reduce variance and bias and improve overall performance, we should use stacking.

6. Conclusions

In this article, we provided an overview of bagging, boosting, and stacking. Bagging trains multiple weak models in parallel. Boosting trains multiple homogenous weak models in sequence, with each successor improving on its predecessor’s errors. Stacking trains multiple models (that can be heterogeneous) to obtain a meta-model.

The choice of ensemble technique depends on the goal and task, as all three techniques aim to improve the overall performance by combining the power of multiple models.

Comments are closed on this article!