Hard vs. Soft Voting Classifiers | Baeldung on Computer Science

1. Introduction

Ensemble methods in machine learning involve combining multiple classifiers to improve the accuracy of predictions.

In this tutorial, we’ll explain the difference between hard and soft voting, two popular ensemble methods.

2. Ensemble Classifiers

The traditional approach in machine learning is to train one classifier using available data.

In traditional machine learning, a single classifier is trained on available data. However, each classifier family has assumptions about the data, and its performance depends on how well these assumptions are met. Additionally, training a model from the same classification family on different subsets of data can result in models of varying performance.

To address this issue, we can train multiple classifiers and combine their outputs when classifying new objects. This usually improves performance but at the cost of increased processing time.

3. Hard Voting

Let $f_1, f_2, \ldots, f_n$ be the various classifiers we trained using the same dataset or different subsets thereof. Each $f_i$ returns a class label when we feed it a new object $x$ .

In hard voting, we combine the outputs by returning the mode, i.e., the most frequently occurring label among the base classifiers’ outputs.

For example, if $n=3$ and $f_1(x)=1$ , $f_2(x)=1$ , and $f_3(x)=0$ , the hard-voting outputs 1 as it’s the mode.

The final output doesn’t need to be the majority label. In multiple classification problems, it can happen that no label achieves the majority.

4. Soft Voting

In soft voting, the base classifiers output probabilities or numerical scores.

4.1. Binary Classification

For instance, in binary classification, the output of logistic regression can be interpreted as the probability of the object belonging to class 1. Similarly, an SVM classifier’s score is the signed distance of the object being classified to the separating hyperplane.

A soft-voting ensemble calculates the average score (or probability) and compares it to a threshold value.

For example, let $f_1(x)=0.8$ , $f_2(x)=0.51$ , and $f_3(x)=0.1$ be the estimated probabilities that $x$ belongs to class 1. Soft voting outputs a mean probability lower than 0.5:

$\frac{0.8+0.51+0.1}{3}=\frac{0.141}{3}=0.047 < 0.5$

This soft-voting ensemble would assign the label 0 to $x$ , in contrast to the hard-voting ensemble from the previous example.

4.2. Do We Always Use Means?

We aggregate the results by averaging the base scores.

However, it’s also possible to use the median instead of the mean, as it’s less sensitive to outliers, so it will usually represent the underlying set of outputs better than the mean.

Still, that doesn’t imply that the median is always a better choice. For example, let’s say that $f_n$ and $f_{n-1}$ estimate near-zero probabilities that the input object $x$ is positive. The remaining $n-2$ classifiers return probabilities greater than 0.5, but none is as confident that $x$ is positive as $f_{n-1}$ and $f_{n}$ are that it isn’t. It may make sense to trust the two classifiers that are pretty confident over the rest. The rationale is that their evidence may be much stronger, which is why their probabilities are near zero.

4.3. Multiclass Classification

In this scenario, each underlying classifier outputs a vector whose $i$ th coordinate is the estimated probability that the input object belongs to the $i$ th class.

For example:

$f_1(x) = \begin{bmatrix} 0.71 \\ 0.09 \\ 0.05 \\ 0.15 \end{bmatrix} \quad f_2(x) = \begin{bmatrix} 0.43 \\ 0.25 \\ 0.2 \\ 0.12 \end{bmatrix} \quad f_3(x) = \begin{bmatrix} 0.51 \\ 0.29 \\ 0.17 \\ 0.03 \end{bmatrix}$

To combine them, we average the vectors element-wise:

$\frac{1}{3} \begin{bmatrix} 0.71 &+& 0.43 &+& 0.51 \\ 0.09 &+& 0.25 &+& 0.29 \\ 0.05 &+& 0.2 &+& 0.17 \\ 0.15 &+& 0.12 &+& 0.03 \end{bmatrix} = \begin{bmatrix} \boldsymbol{0.55} \\ 0.21 \\ 0.14 \\ 0.1 \end{bmatrix}$

The first coordinate is the maximum, so we assign $x$ to the first class.

We can use the vector approach in binary classification as well. In that case, we’ll deal with two-dimensional vectors.

5. Conclusion

In this article, we talked about hard and soft voting.

Hard-voting ensembles output the mode of the base classifiers’ predictions, whereas soft-voting ensembles average predicted probabilities (or scores).

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Core Concepts

Operating Systems

Neural Networks

Graph Theory

Latex

Full Archive

About Baeldung