 ## 1. Introduction

In this article, we’ll give an introduction to the mathematics of big-O notation, as well as show an example of a big-O proof.

## 2. Formal Definition

Definition: f(x) = O(g(x)) means that there exist two positive constants, x1 and c, such that 0 ≤ f(x) ≤ cg(x) for all x ≥ x1.

## 3. The First Positive Constant: x1

When saying f(x) = O(g(x)), we sayf of x is big-O g of x“.

Don’t let the equals sign fool you: f(x) is a function and O(g(x)) is a set. You can’t have a function equalling a set. That’s like saying that the Earth equals the Solar System. It’s more like, the Earth belongs in (the set of planets which comprise) the Solar System. Similarly, f(x) belongs in a set of functions called O(g(x)) (big-O g of x).

We have two functions now – f(x) and g(x). Functions grow at different speeds. For example, a quadratic function will grow quicker than a linear function. But sometimes, it takes some time for the quadratic to catch up!

For example, if we have a(x) = 2x2 + 2x + 1 and b(x) = 10x, we’ll have a(1) = 5 and b(1) = 10. But say now we choose x = 15. Now we have a(15) = 450 + 30 + 1 = 481 and b(x) = 150. What’s more is that for any value greater than x = 15, say x = 20, a(x) will be bigger than b(x). (Now is a good time to refer back to the formal definition; this is the x ≥ x1 part).

Importantly for the case of understanding big-O formally, we don’t particularly care at which point a(x) began outgrowing b(x). Just that it did, and that from that point onwards, it continues to be larger than b(x).

## 4. The Second Positive Constant: c

We saw above how it took a while for a(x) to catch up to b(x). Eventually, it did, but it took a while.

The reason it eventually grows faster is because of the 2x2 term. Or, more precisely, the x2 part of it. This tells us that a(x) = O(x2). Using the symbols of the original definition: f(x) = a(x) and g(x) = x2.

For big-O, we don’t care about the other terms in a(x), or the constant 2 in the 2x2 term.

Looking again at our definition from section 2, this is where the constant c comes in. We can scale g(x) by any positive constant – as long as f(x) stays smaller than the scaled version (past a certain point, x ≥ x1).

If we take d(x) = 3x3 + 2x + 10, we’ll need to find values of c and x1, such that d(x) ≤ cx3 for all values greater than x1. This is exactly what we do in section 4.

## 5. Putting the Pieces Together

In order to prove f(x) = O(g(x)), we need to find two positive constants, c and x1, such that 0 ≤ f(x) ≤ cg(x) for all x ≥ x1. We need to find values for c and x1 such that the inequality holds.

What it means, is that past a certain point, a scaled version of g(x) will always be bigger than f(x).

## 6. Example

Let d(x) = 3x3 + 2x + 10.

Suppose we wish to prove that d(x) = O(x3). This means we need to find two positive integers, c and x1 such that 0 ≤ d(x) ≤ cx3  for all x ≥ x1.

Well, we know that
d(x) = 3x3 + 2x + 10 ≤ 3x3 + 2x3 + 10x3 = 15x3

so,

d(x) ≤ 15x3.

So if we set x1 = 1 and c = 15, then we have that for any x ≥ x1, 0 ≤ d(x) ≤ cx3. So d(x) = O(x3).

We could have found other values for x1 and c which satisfy the above condition. All that matters is that there exist values of x1 and c which satisfy the condition.

## 7. Conclusion

In this article, we focused on an introduction to the theory of big-O notation.

A more practical look at this topic can be found here.

Subscribe
Notify of Inline Feedbacks Alnir
1 month ago

Hello everybody. How to learn CS core concepts/algorithms from baeldung web-site? Is there a sequence of articles that I should read? I have a CS degree but I would like to learn CS concepts/algorithms better. Thank you in advance Loredana Crusoveanu
17 days ago