Authors Top

If you have a few years of experience in Computer Science or research, and you’re interested in sharing that experience with the community, have a look at our Contribution Guidelines.

1. Introduction

One Class Support Vector Machines (SVMs) are a type of outlier detection method.

In this tutorial, we’ll explore how SVMs perform outlier detection and illustrate its utility with a simple example.

Let’s start with the basics.

2. Support Vector Machines

Usually, SVMs aim to generalize from some input data. With this, they can decide whether any additional data are outliers or not. 

Basic SVMs do this by separating data into several classes. From there, they can decide which class any subsequent data belongs to.

One Class SVMs are similar, but there is only one class. Therefore, a boundary is decided on using the available data. Any new data that lies outside that boundary is classed as an outlier.

To see how this is done, let’s dive into the formulation.

3. One Class SVM

There are two formulations of One Class SVMs. The first is by Schölkopf, which involves using a hyperplane (a plane in n-dimensions) for the decision boundary. The second is by Tax and Duin, which uses a hypersphere for the decision boundary. Let’s explore the latter.

The hypersphere is characterized by a center \mathbf{a} with a radius R>0. Also, let’s say we have n data points that are given by \mathbf{x}_i, where i = {1,2,\cdots,n}.

Therefore, the Euclidean distance from the hypersphere center and a given data point is \lvert \mathbf{x}_i - \mathbf{a} \rvert. We want to minimize a cost function R^2 with the constraint that every point lies within or on the hypersphere. Therefore, our constraint is \lvert \mathbf{x}_i - \mathbf{a} \rvert ^2\  \leq  \ R^2   \ \forall   i.

As it stands, outliers will greatly affect the tuning. Let’s address this by modifying the cost function to R^2 + C\sum_i \xi_i and the constraint to \lvert \mathbf{x}_i - \mathbf{a} \rvert ^2\   \leq \  R^2 + \xi_i \  \forall   i.

\xi_i are the positive weights associated with each data point. The higher it is, the less that particular data point affects the tuning of R. Also, C provides a trade-off between volume and classification errors.

Combining this with the method of Lagrange Multipliers, we are left with an optimization problem to solve:

(1)   \begin{equation*} \mathcal{L}(R, \mathbf{a}, \alpha_i, \gamma_i, \xi_i) = R^2 + C\sum_i \xi_i - \sum_i \alpha_i (R^2 + \xi_i - (\lvert \mathbf{x}_i \rvert ^2 - 2\mathbf{a \cdot x}_i + \lvert \mathbf{a} \rvert ^2 ) ) - \sum_i \gamma_i \xi_i, \end{equation*}

where \alpha_i and \gamma_i are non-negative Lagrange multipliers. \mathcal{L} should be maximized with respect to \alpha_i and \gamma_i, but minimized with respect to R, \mathbf{a} and \xi_i.

Now let’s apply this to a simple problem!

4. Simple Example

Suppose we train a One Class SVM on 100 data points from a two-dimensional Gaussian distribution \mathcal{N}(0,0.3).

After that, we add both 20 additional data points and 20 data points drawn from a Uniform distribution \mathcal{U}(-3,3). Our goal is to correctly classify the Gaussian data points while ignoring the outlier data from the Uniform distribution.

This is the result:

Figure 1

5. Conclusion

In this brief article, we showed how they differ from basic SVMs. Then, we outlined the relevant equations and applied them to a simple example.

Authors Bottom

If you have a few years of experience in Computer Science or research, and you’re interested in sharing that experience with the community, have a look at our Contribution Guidelines.

Comments are closed on this article!