In this tutorial, we’ll examine the N+1 Select Problem in ORM frameworks. We’ll start by providing a conceptual overview of the problem. We’ll also discuss possible solutions in theory and how they are implemented in practice.
2. Problem Theory
Let us start from the beginning and understand why we have this problem.
We know that ORM is there to solve the problem of mapping entities in our heap to the schema objects in the relational database. In a relational database, we can have foreign keys, which represent a reference to the foreign table. It allows us to model a relationship between tables in the RDBMS. The example that the equipment table may have foreign key reference to the manufacturer and equipment_items tables:
Now, we also want that relationship to be in our code somehow. So we have the ManyToOne and OneToMany logical relationship here, respectively:
Now, let’s say we want to query the equipment from the database by id. We can do:
SELECT e.* FROM equipment e WHERE e.id = ?
And let’s say that our ORM does the mapping for us. But anyway, we only fetched the equipment itself. We cannot possibly have the information from the manufacturer table for our equipment in the application because we have not fetched it. If we need this information, then the only thing left is to use SELECT to load the corresponding manufacturer table row later once we need it in the application:
FROM manufacturer m
INNER JOIN equipment e ON e.manufacturer_id = m.id
WHERE e.id = ?
That is called lazy loading, another feature of some ORM frameworks. We’ll not discuss it here, but this feature allows us to first only partially load the entity and some dependencies of it later in case we need it further. Thus, if we would want to get equipment_items along with the equipment later as well, we would also have to issue another SELECT:
SELECT ei.* FROM equipment_item ei WHERE ei.equipment_id = ?
This way, we finally loaded the entire equipment entity. But the problem is – we could’ve done in 1 simple query:
FROM equipment e
LEFT JOIN manufacturer m ON e.manufacturer_id = m.id
LEFT JOIN equipment_item ei ON ei.equipment_id = e.id
WHERE e.id = ?
This way, we have all of the data we need. Instead of 3 queries, we have just one. Even considering the complexity of this one query, it is significantly faster to issue one query with 2 JOINs than 3 separate queries. This is because fetching the result comes with the inevitable overhead of layers of abstraction:
- Application code
- JDBC driver
- Database query parsing, scheduling, and execution
- And then all of this chain way back…
We just reproduced the “N + 1 Select Problem”. This problem basically represents a situation when we load an entity from a database using 1 Select to fetch the entity itself, and then N Selects more to fetch its dependencies. So, in our case, we fetched the equipment entity and then issued 2 more SELECT statements to fetch 2 relationships. This problem often leads to poor application performance.
3. Problem Solution
It is important to understand that the problem is not because of lazy loading per se. The problem is that sometimes we need to load equipment, which would be enough. Thus, a simple SELECT statement for fetching equipment without any JOIN is sufficient. Lazy loading helps us with that case. But we also sometimes need equipment with some or all of its relationships. And this is when lazy loading hampers us.
Theoretically, the solution could be as simple as issuing different SQL queries for each scenario. For instance, if we need just equipment, then we just query the equipment table. If we need it with the manufacturer, then we issue SELECT with manufacturer table JOIN. Thereby, we still benefit from lazy loading, and we also do not suffer from the N + 1 problem. Luckily, the ORM frameworks typically also support this kind of solution. For instance, in Hibernate, since JPA 2.1, we can use an EntityGraph to solve this problem. In C# Entity Framework, we can use an Include to fetch some lazy associations eagerly where we need them.
In this article, we’ve explored the “N + 1 Select Problem”. This problem is a condition when the entity’s relationships are loaded via separate SQL queries. This is usually due to lazy loading, which is itself a good feature, but sometimes it creates such complications. To overcome this, we often need to instruct the ORM framework to fetch specific parts of an entity on a query basis. Most ORM frameworks, like Hibernate or Entity Framework, allow us to do exactly that.