In this tutorial, we’ll study how to choose a linear scale for a chart that represents a distribution.
2. Representing Distributions
We studied in our article on drawing charts on LaTeX the general techniques for representing distributions. Also, in our tutorial on the auto-layout of graphs, we discussed the problem of generating representations that support the process of understanding by our readers. In that context, we noted that not all representations are equal and that some of them work better than others.
In this article, instead, we focus on determining the correct linear scale for a chart. Specifically, we study a procedure for identifying the lower and upper bound of the chart, as well as the position of its ticks.
We’re going to first take an example that clarifies the nature of the problem, and then discuss its solution.
3. A Wrong Representation
Let’s start by representing a distribution, and by assigning five ticks to the axis:
Intuitively, we can tell that there’s something wrong with the scale that we use for the axis. In fact, we expect the scatterplot to cover most of the plane, not just the top portion.
We can also notice that the ticks are denser in an area of the chart that has no observations. On the other hand, in an area dense of observations such as the interval , there’s barely any tick.
Further, one tick is evidently different from all others, given that it follows a 2-decimal precision, whereas all others are rounded to the nearest integer.
From this consideration, we can state that the criterion which was followed for selecting the scale of is probably wrong.
3. A Better Representation
Let’s compare the previous chart with a new one, that contains the same observations and the same number of ticks, but uses a different scale:
This looks much better. With a quick glance, we can immediately understand what observation is higher than which other, and by approximately how much. The ticks are rounded nicely, and all hold integer values that are uniformly distributed between the minimum and maximum values of the distribution.
4. Criteria for Choosing a Linear Scale
Therefore, it appears that some kind criteria for finding the optimal linear scale for a chart, given the distribution and a number of ticks, exist.
These criteria are:
- the axis should extend from slightly below the lower bound, to slightly above the upper bound
- the ticks should be uniformly distributed between the lower bound and the upper bound
- preference should be given to round values for the ticks
We can see that the first chart above doesn’t follow any of these criteria, while the second one does. As a consequence, it looks better.
5. Procedure for Identifying Scale and Ticks
We can formalize in a series of steps the procedure through which we can assign ticks to the axis of a distribution.
First, we take the lower and the upper bound of the distribution and compute its range:
Then, we divide the range of the distribution by the desired number of ticks, and obtain the range of the ticks:
If the tick range corresponds to an unpleasant value, say, 6.7, we can round it up to the nearest nice round value. The meaning of “nice” is, of course, largely subjective. As a general rule, we can say that it corresponds to the multiples of 25, 10, 5, 2, or 1, in this order of preference.
Further, we can identify a new lower and upper bound, according to the rounded tick range that we’ve just calculated. We can compute the lower bound as:
We can also compute the upper bound in an analogous manner:
Notice that we add 1 inside the ceiling operator, in order to avoid the edge case in which the lower and upper bound correspond.
Finally, we can calculate the position of each tick. We do this by starting from the lower bound, and then iteratively add to it.
In this tutorial, we studied how to determine a nice scale for the axis in a chart.