1. Introduction

In this article, we’ll give an introduction to the mathematics of big-O notation, as well as show an example of a big-O proof.

2. Formal Definition

Definition: f(x) = O(g(x)) means that there exist two positive constants, x1 and c, such that 0 ≤ f(x) ≤ cg(x) for all x ≥ x1.

3. The First Positive Constant: x1

When saying f(x) = O(g(x)), we sayf of x is big-O g of x“.

Don’t let the equals sign fool you: f(x) is a function and O(g(x)) is a set. You can’t have a function equalling a set. That’s like saying that the Earth equals the Solar System. It’s more like, the Earth belongs in (the set of planets which comprise) the Solar System. Similarly, f(x) belongs in a set of functions called O(g(x)) (big-O g of x).

We have two functions now – f(x) and g(x). Functions grow at different speeds. For example, a quadratic function will grow quicker than a linear function. But sometimes, it takes some time for the quadratic to catch up!

For example, if we have a(x) = 2x2 + 2x + 1 and b(x) = 10x, we’ll have a(1) = 5 and b(1) = 10. But say now we choose x = 15. Now we have a(15) = 450 + 30 + 1 = 481 and b(x) = 150. What’s more is that for any value greater than x = 15, say x = 20, a(x) will be bigger than b(x). (Now is a good time to refer back to the formal definition; this is the x ≥ x1 part).

Importantly for the case of understanding big-O formally, we don’t particularly care at which point a(x) began outgrowing b(x). Just that it did, and that from that point onwards, it continues to be larger than b(x).

4. The Second Positive Constant: c  

We saw above how it took a while for a(x) to catch up to b(x). Eventually, it did, but it took a while.

The reason it eventually grows faster is because of the 2x2 term. Or, more precisely, the x2 part of it. This tells us that a(x) = O(x2). Using the symbols of the original definition: f(x) = a(x) and g(x) = x2.

For big-O, we don’t care about the other terms in a(x), or the constant 2 in the 2x2 term.

Looking again at our definition from section 2, this is where the constant c comes in. We can scale g(x) by any positive constant – as long as f(x) stays smaller than the scaled version (past a certain point, x ≥ x1).

If we take d(x) = 3x3 + 2x + 10, we’ll need to find values of c and x1, such that d(x) ≤ cx3 for all values greater than x1. This is exactly what we do in section 4.

5. Putting the Pieces Together

In order to prove f(x) = O(g(x)), we need to find two positive constants, c and x1, such that 0 ≤ f(x) ≤ cg(x) for all x ≥ x1. We need to find values for c and x1 such that the inequality holds.

What it means, is that past a certain point, a scaled version of g(x) will always be bigger than f(x).

6. Example

Let d(x) = 3x3 + 2x + 10.

Suppose we wish to prove that d(x) = O(x3). This means we need to find two positive integers, c and x1 such that 0 ≤ d(x) ≤ cx3  for all x ≥ x1.

Well, we know that
d(x) = 3x3 + 2x + 10 ≤ 3x3 + 2x3 + 10x3 = 15x3


d(x) ≤ 15x3.

So if we set x1 = 1 and c = 15, then we have that for any x ≥ x1, 0 ≤ d(x) ≤ cx3. So d(x) = O(x3).

We could have found other values for x1 and c which satisfy the above condition. All that matters is that there exist values of x1 and c which satisfy the condition.

7. Conclusion

In this article, we focused on an introduction to the theory of big-O notation.

A more practical look at this topic can be found here.

Inline Feedbacks
View all comments
1 year ago

Hello everybody. How to learn CS core concepts/algorithms from baeldung web-site? Is there a sequence of articles that I should read? I have a CS degree but I would like to learn CS concepts/algorithms better. Thank you in advance

Loredana Crusoveanu
Loredana Crusoveanu
1 year ago
Reply to  Alnir

Hi Alnir,
You can find our Computer Science Tutorials and Guides in the menu above. For example, here’s the one for Algorithms.

Comments are closed on this article!