Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: September 10, 2024
In database management, dealing with inconsistent data formats is a common challenge in our day-to-day activities. One such issue is having unnecessary leading zeros in our data which can cause various issues, especially interfering with sorting.
This tutorial explores several methods of removing leading zeros across different SQL databases. We’ll focus on two primary techniques and their performance implications, using the Baeldung University database schema to illustrate these concepts.
First, it’s important to understand how the leading zeros can affect our string or text-based columns.
Numeric data types in databases don’t store leading zeros. However, when numeric data is stored as text, leading zeros can be present.
To illustrate this, we’ll create a secondary id column with leading zeros to the Student table. This will contain a copy of the id values to which we’ll add some leading zeroes.
Let’s add the new column to the table with the ALTER function.
The query below works with MySQL and PostgreSQL:
ALTER TABLE Student
ADD COLUMN formatted_id VARCHAR(10);
UPDATE Student
SET formatted_id = CONCAT('0', id)
WHERE id <= 2000;
UPDATE Student
SET formatted_id = id
WHERE formatted_id IS NULL;
We added a new column called formatted_id to the Student table. Furthermore, we added a leading zero for IDs at 2,000 and below. Finally, we strictly copied any IDs above 2,000 to simulate mixed data with and without leading zeros.
Let’s verify our setup by querying the modified table:
SELECT name, formatted_id
FROM Student
WHERE id in (1607,1007,2017,1010,1111,2008);
+-----------------+--------------+
| name | formatted_id |
+-----------------+--------------+
| Philip Lose | 01007 |
| Samantha Prabhu | 01010 |
| Vikram Kohli | 01111 |
| Peter Liu | 01607 |
| Julia Roberts | 2008 |
| Potu Singh | 2017 |
+-----------------+--------------+
As noted, we have a mix of IDs in our formatted_id column with some having leading zeros and others without.
We can achieve the same thing in SQL Server using a slightly different syntax. The primary difference is that SQL Server doesn’t require the keyword COLUMN when adding a new column to a table:
ALTER TABLE Student
ADD formatted_id VARCHAR(10);
UPDATE Student
SET formatted_id = '0' + CAST(id AS VARCHAR(7))
WHERE id <= 2000;
UPDATE Student
SET formatted_id = CAST(id AS VARCHAR(10))
WHERE formatted_id IS NULL;
Now, let’s also verify that this data was inserted correctly:
SELECT name, formatted_id
FROM Student
WHERE id in (1607,1007,2017,1010,1111,2008);
name formatted_id
------------------------------------------------------------ ------------
Philip Lose 01007
Samantha Prabhu 01010
Vikram Kohli 01111
Peter Liu 01607
Julia Roberts 2008
Potu Singh 2017
This returns the same result as we had for MySQL and PostgreSQL. Thus, we’ll use this dataset to learn how to remove leading zeros from the column.
The CAST function transforms data types from one form to another. When we cast a string with leading zeros to a number type, the leading zeros are automatically removed.
It’s important to note that we typically get an error when using CAST on a non-numeric data column. The exception is MySQL, which will return zero instead.
With MySQL, we can use the following query to select student names, and their formatted IDs, and cast the formatted IDs to unsigned integers:
SELECT name, formatted_id,
CAST(formatted_id AS UNSIGNED) AS cleaned_id
FROM Student
WHERE id in (1607,1007,2017,1010,1111,2008);
+-----------------+--------------+------------+
| name | formatted_id | cleaned_id |
+-----------------+--------------+------------+
| Philip Lose | 01007 | 1007 |
| Samantha Prabhu | 01010 | 1010 |
| Vikram Kohli | 01111 | 1111 |
| Peter Liu | 01607 | 1607 |
| Julia Roberts | 2008 | 2008 |
| Potu Singh | 2017 | 2017 |
+-----------------+--------------+------------+
As a result, the query returns with the leading zeros removed in the cleaned_id column.
Similarly, when working with SQL Server, we can achieve the same result by casting to an INT:
SELECT name, formatted_id,
CAST(formatted_id AS INT) AS cleaned_id
FROM Student
WHERE id in (1607,1007,2017,1010,1111,2008);
name formatted_id cleaned_id
------------------------------------------------------------ ------------ -----------
Philip Lose 01007 1007
Samantha Prabhu 01010 1010
Vikram Kohli 01111 1111
Peter Liu 01607 1607
Julia Roberts 2008 2008
Potu Singh 2017 2017
The result in SQL Server maintains the same structure as MySQL effectively removing the leading zeros.
We also have a slightly different query for PostgreSQL, where we’ll cast as an INTEGER:
SELECT name, formatted_id,
CAST(formatted_id AS INTEGER) AS cleaned_id
FROM Student
WHERE id in (1607,1007,2017,1010,1111,2008);
+-----------------+--------------+------------+
| name | formatted_id | cleaned_id |
+-----------------+--------------+------------+
| Philip Lose | 01007 | 1007 |
| Samantha Prabhu | 01010 | 1010 |
| Vikram Kohli | 01111 | 1111 |
| Peter Liu | 01607 | 1607 |
| Julia Roberts | 2008 | 2008 |
| Potu Singh | 2017 | 2017 |
+-----------------+--------------+------------+
Hence, we can see that using the CAST function works with MySQL, PostgreSQL, and SQL Server databases.
Note that all three databases support CAST, but the number type varies between them. MySQL uses UNSIGNED, SQL Server uses INT, and PostgreSQL uses INTEGER.
Alternatively, we can use the CONVERT function in all three databases. The examples are similar to CAST, but we need to use the right numeric type based on the database.
In MySQL, we can use the CONVERT function to achieve the same results as CAST:
SELECT name, formatted_id,
CONVERT(formatted_id, UNSIGNED) AS cleaned_id
FROM Student
WHERE id IN (1607,1007,2017,1010,1111,2008);
+-----------------+--------------+------------+
| name | formatted_id | cleaned_id |
+-----------------+--------------+------------+
| Philip Lose | 01007 | 1007 |
| Samantha Prabhu | 01010 | 1010 |
| Vikram Kohli | 01111 | 1111 |
| Peter Liu | 01607 | 1607 |
| Julia Roberts | 2008 | 2008 |
| Potu Singh | 2017 | 2017 |
+-----------------+--------------+------------+
This output shows how the CONVERT function effectively removes leading zeros in MySQL.
For SQL Server, the following query accomplishes the same task:
SELECT name, formatted_id,
CONVERT(INT, formatted_id) AS cleaned_id
FROM Student
WHERE id in (1607,1007,2017,1010,1111,2008);
name formatted_id cleaned_id
------------------------------------------------------------ ------------ -----------
Philip Lose 01007 1007
Samantha Prabhu 01010 1010
Vikram Kohli 01111 1111
Peter Liu 01607 1607
Julia Roberts 2008 2008
Potu Singh 2017 2017
We can see that SQL Server’s CONVERT function successfully removes the leading zeros, similar to MySQL’s result.
PostgreSQL offers a concise syntax for typecasting:
SELECT name, formatted_id,
formatted_id::INTEGER AS cleaned_id
FROM Student
WHERE id IN (1607,1007,2017,1010,1111,2008);
+-----------------+--------------+------------+
| name | formatted_id | cleaned_id |
+-----------------+--------------+------------+
| Philip Lose | 01007 | 1007 |
| Samantha Prabhu | 01010 | 1010 |
| Vikram Kohli | 01111 | 1111 |
| Peter Liu | 01607 | 1607 |
| Julia Roberts | 2008 | 2008 |
| Potu Singh | 2017 | 2017 |
+-----------------+--------------+------------+
Although we adjusted each query according to their database documentation, we can see that all three queries produced the same result.
We can also trim the strings to strip leading zeroes from our data.
Just as there were differences between databases for the CAST and CONVERT functions, each database implements trimming differently.
MySQL syntax is simple and direct using the TRIM function in combination with the LEADING keyword:
SELECT name, formatted_id,
TRIM(LEADING '0' FROM formatted_id) AS cleaned_id
FROM Student
WHERE id in (1607,1007,2017,1010,1111,2008);
+-----------------+--------------+------------+
| name | formatted_id | cleaned_id |
+-----------------+--------------+------------+
| Philip Lose | 01007 | 1007 |
| Samantha Prabhu | 01010 | 1010 |
| Vikram Kohli | 01111 | 1111 |
| Peter Liu | 01607 | 1607 |
| Julia Roberts | 2008 | 2008 |
| Potu Singh | 2017 | 2017 |
+-----------------+--------------+------------+
This method directly removes leading zeros from a string without any intermediate steps. Moreover, the LEADING keyword specifically instructs the TRIM function to remove the specified character (a literal zero in this case) from the beginning of the string.
Conversely, SQL Server requires a more complex approach:
SELECT name, formatted_id,
SUBSTRING(formatted_id,
PATINDEX('%[^0]%', formatted_id),
LEN(formatted_id)
) AS cleaned_id
FROM Student
WHERE id in (1607,1007,2017,1010,1111,2008);
name formatted_id cleaned_id
------------------------------------------------------------ ------------ -----------
Philip Lose 01007 1007
Samantha Prabhu 01010 1010
Vikram Kohli 01111 1111
Peter Liu 01607 1607
Julia Roberts 2008 2008
Potu Singh 2017 2017
We can deduce some important things from this query:
As a result, this query successfully removes leading zero as well.
Finally, PostgreSQL offers a concise solution using the LTRIM function:
SELECT name, formatted_id,
LTRIM(formatted_id, '0') AS cleaned_id
FROM Student
WHERE id in (1607,1007,2017,1010,1111,2008);
+-----------------+--------------+------------+
| name | formatted_id | cleaned_id |
+-----------------+--------------+------------+
| Philip Lose | 01007 | 1007 |
| Samantha Prabhu | 01010 | 1010 |
| Vikram Kohli | 01111 | 1111 |
| Peter Liu | 01607 | 1607 |
| Julia Roberts | 2008 | 2008 |
| Potu Singh | 2017 | 2017 |
+-----------------+--------------+------------+
The LTRIM function in PostgreSQL takes two parameters:
This method directly removes the specified leading character, in this case, ‘0’.
Thus, all the queries from the three databases above produce the same result.
In this article, we explored several methods for removing leading zeros in SQL across different database systems. This included the CAST and CONVERT functions and various trimming techniques, which can all remove leading zeros from our data. It’s important to remember that while the concept works for each database technology, the specific syntax varies by database system and function.