When to Avoid Using SELECT * in SQL Queries? Baeldung on SQL

1. Introduction

Efficient querying is vital for application performance and data security. SELECT * misuse can cause inefficiencies, exposure of sensitive data, and database schema complexities. As databases grow, it’s important to understand when to avoid this practice.

In this tutorial, we’ll show how SELECT * in SQL queries affects performance, security, and code readability.

2. Understanding SELECT *

SELECT * in SQL gets every column from a table. When we use it, the RDBMS returns all columns from the specified table.

This works well for data exploration or ad-hoc queries in development.

3. Performance Considerations

Using SELECT * retrieves all columns from a table, even if they are unnecessary for the task. This increases data retrieval and processing time. To fix this problem, we should specify only the columns we need. That way, our database system will work better and run more smoothly.

We’ll consider our university database with its Student table. This table has several columns. Those are id, name, national_id, birth_date, enrollment_date, graduation_date and GPA, e.g.:

id	name	national_id	birth_date	enrollment_date	graduation_date	GPA
1001	John Liu	123345566	2001-04-05	2020-01-15	2024-06-15	4
1003	Rita Ora	132345166	2001-01-14	2020-01-15	2024-06-15	4.2

To retrieve the list of students, we can execute the following command:

SELECT * FROM Student;

It will show all the columns related to each student. However, what if we only need name, enrollment_date, and GPA for a specific task? SELECT * will return more data than we need, causing an unnecessary data transfer. To avoid this, we should specify only the columns we’ll work with:

SELECT name, enrollment_date, GPA FROM Student;

That way, we transfer less data.

4. Security Concerns

Using SELECT * in SQL queries can be harmful because it might expose private data. With this method, all columns of a table are shown, which could include secret information like passwords or financial details. This mistake makes it easier for fraudsters to access user files and sensitive data.

To prevent this from happening, developers should select only the required columns in the queries. This ensures that only necessary information gets shared with users.

5. Unpredictable Schema Changes

The utilization of SELECT * in SQL queries is associated with a major drawback, namely vulnerability to schema changes. Such alterations can result in various issues related to compatibility and maintenance. A table schema can undergo modifications through column additions, removals, or renaming. So, there may be unexpected changes in the query results.

If we add a new column to an existing table, any SELECT * query will also include this new column. This can cause problems for applications that are not well-equipped to handle additional data.

Moreover, there is another issue to consider. Some applications might expect some columns that aren’t there anymore. They may also access the returned data by column indices, which have now changed.

We’ll consider the table with columns for student details. To access the birth_date column of a row by index, we might use row[4] in our code. If we remove one column (like national_id), row[4] will now refer to enrollment_date. This can break the application logic.

Even when using SELECT *, the code that uses the results should be robust to schema changes.

6. Code Readability

Using SELECT * in SQL queries can make the code hard to read. This lack of clarity isn’t good for applications that rely on databases. Developers may not know what data is being used when columns aren’t clearly stated. This can cause mistakes or confusion among team members who work on shared projects.

Specifying the column list in SQL queries makes them easier to read and understand. For example, if we want to access id, name, and national_id from the Student table, we should include those column names in our query. That way, it will be clear which data the query returns. This will also make debugging easier because developers will know columns are involved and won’t have to keep checking back on a table schema.

7. Some Considerations and Exceptions

Though not always advisable, SELECT * can have benefits in specific cases:

When developing and testing a new product, SELECT * can be used to get all the data from a table. This helps to test different features without listing every column.
Data analysts sometimes use SELECT * to explore data. This helps them learn about the information without selecting certain columns for each query.
SELECT * can be used for small tables with stable schema and no performance concerns.
If we really want all the columns, SELECT * is shorter than listing all the columns but can still lack clarity.

8. Conclusion

In this article, we explained how specifying necessary columns in SQL queries improves efficiency, security, and code readability.

SELECT * can be used for development, testing, ad-hoc queries, data exploration, and small tables.

Although SELECT * is sometimes convenient, it has limitations and risks. It can negatively affect the performance, security, and maintainability of the applications relying on its results.

Full Archive

About Baeldung