1. Introduction

In this tutorial, we’ll review two concepts: Pearson and Spearman Correlation. We’ll also discuss how to choose between the two.

2. Correlations

Before we delve into the intricacies of Pearson and Spearman correlations, let’s first define correlation. Correlation is used to refer to how variables are related. It describes the relationship between variables by quantifying the degree to which they relate. The correlation between variables can be linear, where a movement in one variable moves the other. Alternatively, it can be non-linear, where a change in one does not correspond to a change in another.

For example, suppose we observe that the amount of money spent in winter increases with the age of a person. We can assume that there is a correlation between these two variables. The correlation here is that the amount of money spent increases with age.

3. Pearson Correlation

Pearson correlation measures the degree and direction of linear correlations between variables. It is calculated as the covariance ratio of two variables to the product of their standard deviations. Suppose we have variables X and Y, the Pearson correlation would be calculated using this formula:

    \[\rho_{X,Y} = \frac{\operatorname{cov}(X,Y)}{\sigma_X \sigma_Y} \]

Here, \operatorname{cov}(X,Y)  is the covariance between X and Y. \sigma_X and \sigma_Y are the means of X and Y respectively.  The value of Pearson correlation between two variables is bounded between [-1, 1], where 1 indicates a strong, positive correlation. This means that when X increases, then Y increases. In contrast, -1 indicates a negative correlation; this means when X decreases, then Y increases. Similarly, a correlation close to 0 means that no correlation exists.

4. Spearman Correlation

Spearman correlation measures the strength and direction of monotonic correlations between variables by considering the ranking of these variables. A monotonic relationship refers to where two variables consistently change in the same direction. It is computed similarly to Pearson’s correlation but takes into account the ordinal ranks of the variables:

    \[ r_s = \rho_{\text{R}(X),\text{R}(Y)} = \frac{\text{cov}(\text{R}(X),\text{R}(Y))}{\sigma_{\text{R}(X)} \sigma_{\text{R}(Y)}} \]

Here \text{R}(X) and \text{R}(X) Similarly, the Spearman correlation between two variables is bounded between [-1, 1], where -1 indicates a negative monotonic correlation. This means that a decrease in  In contrast, 1 indicates a strong, positive correlation, while 0 indicates no monotonic relationship.

5. An Example

Suppose we have our variables, X and Y, X is the age of participants and Y is the amount of money spent:

X Y
26 500
56 780
78 200
18 20
50 300

Computing the Pearson correlation over these two variables will give

    \[\rho_{X,Y} = 0.20\]

.

Subsequently, Spearman correlation, using the ordering of datapoints as ranks, would give

    \[ r_s = \rho_{\text{R}(X),\text{R}(Y)} = 0.73 \]

. This computation suggests that there is no linear correlation between X and Y. However, a positive monotonic relationship exists between the two variables.

6. Which One Should You Choose?

The choice of whether to choose between Pearson and Spearman correlation depends on the characteristics of the data and the task at hand. For instance, choose:

  • Pearson’s correlation for linear relationships, Spearman’s correlation otherwise
  • Spearman’s correlation for data with ranking, Pearson’s correlation otherwise

7. Conclusions

In this article, we provided an overview of Pearson and Spearman Correlation.

Pearson correlation quantifies linear relationships, while Spearman correlation measures the degree of mon0tonic correlations between variables. The choice of which one to choose depends on the data’s characteristics or the analysis’s goal.

Comments are closed on this article!