Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: October 29, 2024
In database management and data processing, we often find ourselves needing to manipulate string data. One common operation is extracting a specific number of characters from the beginning of a string. This technique can be invaluable for various tasks, such as data normalization, generating unique identifiers, or simplifying complex string data for reporting purposes.
In this tutorial, we’ll explore different methods to extract the first N characters of a value in SQL. We’ll use the University database schema for our examples. The techniques we’ll see work on the MySQL, PostgreSQL, and Microsoft SQL Server database management systems.
The SUBSTRING() function is a versatile tool for extracting a portion of a string. It’s available in most SQL databases, including MySQL, PostgreSQL, and Microsoft SQL Server. The SUBSTRING() function is particularly useful when we need to create abbreviated versions of longer text fields.
For instance, we could use it to generate short codes for courses based on their names:
SELECT id, name, SUBSTRING(name, 1, 3) AS course_code
FROM Course
LIMIT 5;
+-------+----------------------------------------+-------------+
| id | name | course_code |
+-------+----------------------------------------+-------------+
| CE111 | Introduction to Structural Engineering | Int |
| CE121 | Geotechnical Engineering-I | Geo |
| CE131 | Mechanics of Solids-I | Mec |
| CE141 | Mechanics of Fluids-I | Mec |
| CE151 | Modeling Tools for Civil Engineers | Mod |
+-------+----------------------------------------+-------------+
5 rows in set (0.01 sec)
This query creates a three-letter course_code from the first three characters of each Course name. Such abbreviated codes can be handy for creating unique identifiers or simplifying complex data for reporting purposes.
Another function we can use to accomplish the same task is LEFT(). It’s a bit more useful when we’re specifically extracting characters from the beginning of a string. Let’s use LEFT() to extract the first 10 characters of Course textbook titles:
SELECT id, LEFT(textbook, 10) AS textbook_start
FROM Course
WHERE textbook IS NOT NULL
LIMIT 5;
+-------+----------------+
| id | textbook_start |
+-------+----------------+
| CE111 | Structures |
| CE121 | Introducti |
| CE131 | Mechanics |
| CE141 | Mechanics |
| CE151 | Modeling T |
+-------+----------------+
5 rows in set (0.00 sec)
This query retrieves the course id and the first 10 characters of the textbook title for up to five courses. We use the WHERE clause to exclude any courses with null textbook entries, ensuring our results are meaningful. The LEFT() function takes two arguments: the column name (textbook) and the number of characters to extract (10).
While SUBSTRING() and LEFT() can often be used interchangeably for this task, LEFT() is generally more readable when we’re always starting from the first character. However, SUBSTRING() offers more flexibility if we need to start from a different position in the future.
When working with real-world data, we often encounter null values. It’s crucial to handle these gracefully to avoid errors or unexpected results. The COALESCE() function comes in handy here:
SELECT id, COALESCE(LEFT(name, 5), 'N/A') AS first_five_chars
FROM Student
LIMIT 5;
+------+------------------+
| id | first_five_chars |
+------+------------------+
| 1001 | John |
| 1003 | Rita |
| 1007 | Phili |
| 1010 | Saman |
| 1011 | Vikas |
+------+------------------+
5 rows in set (0.02 sec)
In this example, if a student’s name is null, we’ll see the ‘N/A’ instead of a null value in our results. This approach ensures our queries remain robust even when dealing with incomplete data.
Sometimes, we need to extract characters based on certain conditions. Let’s say we want to get the first three characters of Faculty names, but only for those in the Computer Science department:
SELECT id, LEFT(name, 3) AS name_start, department_id
FROM Faculty
WHERE department_id = (
SELECT id
FROM Department
WHERE name = 'Computer Science'
)
LIMIT 5;
+-----+------------+---------------+
| id | name_start | department_id |
+-----+------------+---------------+
| 111 | AV | 1 |
| 121 | Ris | 1 |
| 122 | Ris | 1 |
| 131 | Wll | 1 |
| 132 | Wll | 1 |
+-----+------------+---------------+
5 rows in set (0.03 sec)
This query demonstrates how we can combine string extraction with other SQL clauses to get more specific results. We use a subquery to find the Computer Science department_id, and then use that to filter our faculty members.
In this article, we’ve explored various techniques for extracting the first N characters of a value in SQL. We’ve seen how functions like SUBSTRING() and LEFT() can be powerful tools for string manipulation, and how we can handle null values and apply conditional logic to our extractions.
These techniques are more than just query tricks — they’re essential tools for data cleaning, normalization, and presentation. This ability to extract specific portions of strings is a valuable skill in any SQL developer’s kit.