1. Introduction

In this tutorial, we’ll show how to plot the decision boundary of a logistic regression classifier. We’ll focus on the binary case with two classes we usually call positive and negative. We’ll also assume that all the features are continuous.

2. Decision Boundary

Let \mathcal{X} be the space of objects we want to classify using machine learning. The decision boundary of a classifier is the subset of \boldsymbol{\mathcal{X}} containing the objects for which the classifier’s score is equal to its decision threshold.

In the case of logistic regression (LR), the score f(x) of an object x is the estimate of the probability that x is positive, and the decision threshold is 0.5:

    \[\{ x \in \mathcal{X} \mid f(x) = 0.5 \}\]

Visualizing the boundary helps us understand how our classifier works and compare it to other classification models.

3. Plotting the Boundary

To plot the boundary, we first have to find its equation.

3.1. The Boundary Equation

We can derive the equation of the decision boundary by plugging in the formula of LR into the condition f(x) = 0.5.

We’ll assume that x= [x_0, x_1, x_2, \ldots, x_n]^T is an (n+1) \times 1 vector with x_0=1 to make the LR equation more compact. In preprocessing, we can always prepend x_0=1 to any n-dimensional x.

Consequently, we have:

    \[f(x) = \frac{1}{1 + e^{-\theta x}} = \frac{1}{2}\]

where \theta = [\theta_1, \theta_2, \ldots, \theta_n]^T is the (n+1) \times 1 parameter vector of our LR model f. From there, we get:

    \[\begin{aligned} \frac{1}{1 + e^{-\theta^T x}} &= \frac{1}{2} \\ 1 + e^{-\theta^T x} &= 2 \\ e^{-\theta^T x} &= 1 \\ - \theta^T x &= 0 \\ \theta^T x &= 0 \\ \sum_{i=0}\theta_i x_i &= 0 \end{aligned}\]

3.2. The Shape of the Boundary in Two and Three Dimensions

If n=2, the boundary equation becomes:

    \[\theta_0 + \theta_1 x_1 + \theta_2 x_2 = 0\]

That’s a line in the (x_1, x_2) plane. For example, if we use the iris dataset with x_1 and x_2 being the sepal length and width, and with versicolor and virginica classes blended into one, we’ll get a straight line:

Decision boundary as a straight line

We don’t have to consider the degenerate case where \theta_0=\theta_1=\theta_2=0. That implies that no features of x are used, so we won’t use such a model anyway. Let’s say \theta_2 \neq 0. The explicit boundary’s equation is then:

    \[x_2 = -\frac{\theta_1}{\theta_2}x_1 - \frac{\theta_0}{\theta_2}\]

If n=3, we have a plane given by:

    \[\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3 = 0\]

Let \theta_3 \neq 0. Then, we can write the equation in the explicit form:

    \[x_3 = -\frac{\theta_2}{\theta_3} x_2  -\frac{\theta_1}{\theta_3}x_1 - \frac{\theta_0}{\theta_3}\]

3.3. Algorithm

We can use any plotting tool to visualize lines and planes corresponding to these equations. If we have to do it from scratch, we can iterate over the independent features in small increments and calculate the dependent feature using the explicit forms:

Rendered by QuickLaTeX.com

The limits l and u determine the part of the boundary we want to focus on.

3.4. The Limits of Visualization

We have two questions at his point:

  • Can we visualize a boundary in multiple dimensions?
  • Is a boundary always a line or a plane?

Let’s find out.

4. Multiple Dimensions

If our objects have more than three features, we can visualize only the boundary’s projections onto the planes and spaces defined by pairs and triplets of features.

One way we deal with this is to choose the features for visualization and keep the others at constant values, such as means or constants of interest we know from theory. Values that mean “the feature is absent or neutral” can also be helpful. In most, but not all cases, that would mean setting those other features to zeros.

4.1. Example

With 10 features x_1, x_2, \ldots, x_{10}, we have \binom{10}{2}=45 feature pairs. Let’s say we choose x_1  and x_2) for visualizing the boundary. In that case, we set x_3, x_4, \ldots, x_10 to some constant values. Let them be x_3 = \chi_3, x_4 = \chi_4, \ldots, x_{10}=\chi_{10}.

Then, \sum_{i = 3}^{10} \theta_i \chi_i is another constant. We add it to \theta_0 and proceed as if x_1 and x_2 are the only two features of interest:

    \[x_2 = -\frac{\theta_1}{\theta_2} - \frac{1}{\theta_2}\left(\theta_0  + \sum_{i = 3}^{10} \theta_i \chi_i \right)\]

This can work for any pair or triplet of x_1, x_2, \ldots, x_n.

However, a disadvantage of this approach is that the boundary depends on the chosen constants \chi_i.

5. Curvatures

We can introduce curvatures with feature engineering.

Let’s say that our original features are [x_1, x_2]. Before pretending x_0 = 1, we can add x_3=x_1^2. Then, the decision boundary becomes:

    \[\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_1^2 = 0 \implies x_2 = -\frac{\theta_0}{\theta_2} - \frac{\theta_1}{\theta_2}x_1 - \frac{\theta_3}{\theta_2}x_1^2\]

which is a curve in the original (x_1, x_2) space. For instance:

Decision boundary as a curve

 

However, the same boundary is a plane in the augmented (x_1, x_2, x_3) space.

6. Conclusion

In this article, we showed how to visualize the logistic regression’s decision boundary. Plotting it helps us understand how our logistic model works.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.