1. Overview

A self-organizing map (SOM) is a special type of neural network we use for reducing the dimensionality of high-dimensional data.

SOMs belong to unsupervised algorithms such as principal component analysis (PCA) and k-means clustering and are also called Kohonen maps or networks after their inventor, Teuvo Kohonen.

In this tutorial, we’ll learn about the motivation behind this algorithm, its procedure, and its applications.

2. Motivation

Most of the time, our data are high-dimensional. That makes them difficult to analyze and extract useful patterns from them. In such cases, we usually apply unsupervised learning algorithms for dimensionality reduction, e.g., k-means.

K-means breaks the data into a predefined number of clusters (k) and assigns each data point to one of them. However, we must use the correct number of clusters for k-means to work. Finding the number can be tricky and require multiple guesses and runs, which takes time.

A better way would be to devise an algorithm that can guess the correct number of clusters on its own. This is where self-organizing maps come to the rescue. As we’ll see in the next section, a SOM is a lattice of neurons. Its size depends on the application.

3. Self-Organizing Maps (SOMs)

A SOM is a single-layer neural network. Its neurons are in a k \times \ell lattice (we choose the values of k and \ell):

An example of a SOM with 9 neurons in a 3x3 lattice

The neurons in the lattice are vectors we initialize randomly and train using competitive learning. There’s no hard and fast rule for determining the lattice size. However, as a rule of thumb, we can use 5 \sqrt{N} neurons, where N is the number of data points in the training set. This is usually large enough to capture the dominant patterns in the data.

In the case of clustering, for instance, similar data points will make specific regions of neurons denser and thus form clusters. Other regions, on the other hand, will become shallower.

SOMs iteratively run three stages until convergence: competition, cooperation, and adaptation.

3.1. Competition

In this stage, we go over all the input data points. Then, for each, we measure the distance between its vector and each neuron in the lattice (e.g., Euclidean distance).

After calculating the distances, we find the neuron closest to the input vector neuron. This stage is called competition because the neurons compete in similarity to our data point.

3.2. Cooperation

The cooperation phase begins after finding the winning neuron. In it, we find the winning neuron’s neighbors.

The neighborhood is defined through the neighborhood function h. It quantifies the degree to which a neuron can be considered a neighbor of the winning neuron. While doing so, h follows two rules:

  • it should be symmetrical about the winning neuron
  • its value should decrease with the distance to the neuron

3.3. Adaptation

Here, we update the neurons according to \boldsymbol{h}:

    \[ w_{j}(n+1) = w_{j}(n) + \alpha h_{ij} \Bigl( d\left(x, w_{j}(n)\right) \Bigr)\]

where:

  • x is the input vector
  • w_{j}(n) is the vector representation of neuron j at iteration n (w_i(n) is the winning neuron)
  • \alpha is the learning rate
  • d is a distance function

For instance, we can use the Gaussian function to focus the winner’s influence on the neurons close to it:

    \[ h(d(i, j)) = e^{-\frac{d^2(i, j)}{2\sigma^2}}\]

where d(i, j) is the Euclidean distance between neighbor j and the winning neuron i, and \sigma is the standard deviation of the Gaussian function.

In some variations of the algorithm, \sigma decreases with time.

3.4. Algorithm

In addition to the training data, the algorithm also requires several hyperparameters, such as the size of lattice k \times l, the neighborhood function h, and the number of iterations M:

Rendered by QuickLaTeX.com

Convergence occurs when there’s no noticeable change in the map. In other words, if new inputs map to the same neurons from the previous iteration, the network has converged to a solution.

The cooperation and adaptation stages are intertwined. Therefore, we can consider them one phase in the computational sense.

4. SOMs vs. K-means

SOMs and K-means can both be used for clustering the data. However, we can think of SOMs as a constrained version of K-means. This means that the neurons in SOMs aren’t as free to move as the centroids in K-means are.

However, in K-means, we have to specify the number of clusters beforehand, while SOMs don’t need that. The reason is that the initial lattice that we choose can be large, allowing neurons to cluster on their own. We can see this in a diagram of how SOMs work:

An SOM in action

The nearby neurons move together and reveal clusters in the data. We don’t need to specify the number of clusters beforehand since the neurons will learn it from the data.

Another drawback of K-means is that it can be more sensitive to outliers since they can move the centroids toward themselves, away from the clusters in the data. This is not the case with SOMs.

Finally, SOMs keep the topological structure of the data, whereas K-means doesn’t. This is because, in K-means, each centroid moves freely without regard for the others. However, in SOMs, the shift in a neuron’s place is relative to the winning neuron. As a result, SOMs keep the underlying topology of the data, which is shown in the above diagram.

5. Conclusion

In this article, we learned about self-organizing maps (SOMs). We can use them to reduce data dimensionality and visualize the data structure while preserving its topology.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.