Baeldung Pro – SQL – NPI EA (cat = Baeldung on SQL)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Overview

In SQL, it’s common to encounter situations where a single column in a table contains multiple pieces of information separated using delimiters such as commas, hyphens, or spaces.

While this approach reduces redundancy and saves space, extracting individual elements from the combined values is hard. Splitting the data into multiple columns makes it easier to work with and analyze.

In this tutorial, we’ll explore how to split single-column values into multiple columns in different SQL databases, such as MySQL, PostgreSQL, and SQL Server.

2. Understanding the Problem

We’ll use the Student table from the Baeldung University database schema to demonstrate:

SELECT * FROM Student;
+------+-----------------+-------------+------------+-----------------+-----------------+------+
| id   | name            | national_id | birth_date | enrollment_date | graduation_date | gpa  |
+------+-----------------+-------------+------------+-----------------+-----------------+------+
| 1001 | John Liu        |   123345566 | 2001-04-05 | 2020-01-15      | 2024-06-15      |    4 |
| 1003 | Rita Ora        |   132345166 | 2001-01-14 | 2020-01-15      | 2024-06-15      |  4.2 |
| 1007 | Philip Lose     |   321345566 | 2001-06-15 | 2020-01-15      | 2024-06-15      |  3.8 |
| 1010 | Samantha Prabhu |  3217165566 | 2001-03-21 | 2020-01-15      | 2024-06-15      |  4.9 |
| 1011 | Vikas Jain      |   321345662 | 2001-07-18 | 2020-01-15      | NULL            |  3.3 |

Here, we’ll focus on the name column, containing each student’s name. We’ll split the values of the name column into two separate columns, first_name and last_name.

3. Splitting Column Values in MySQL

MySQL provides various string functions for splitting a single column values into multiple columns. Some of these string functions include SUBSTRING_INDEX, LOCATE, and REPLACE.

3.1. Using the SUBSTRING_INDEX Function

The SUBSTRING_INDEX function allows us to split a string based on a specified delimiter.

It follows a basic syntax:

SUBSTRING_INDEX(string, delimiter, count)
  • string – represents the input string from which we want to extract a substring
  • delimiter – represents the delimiter that separates the parts of the string
  • count – represents an integer that determines which part of the string to return

Now, let’s split the values in the name column:

SELECT 
    SUBSTRING_INDEX(name, ' ', 1) AS first_name,
    SUBSTRING_INDEX(name, ' ', -1) AS last_name
FROM 
    Student;

This query extracts the values from the name column by splitting the values at the space delimiter. We then assign the first value to the first_name column and the second value to the last_name column.

3.2. Handling More Complex Scenarios

The SUBSTRING_INDEX function works well for simple cases like splitting first and last names. However, if the names contain more than two values such as a first, middle, and last name, we can use a combination of SUBSTRING and LOCATE functions to split them:

SELECT
    SUBSTRING_INDEX(name, ' ', 1) AS first_name,
    SUBSTRING(name, LOCATE(' ', name) + 1,
              LOCATE(' ', name, LOCATE(' ', name) + 1) - LOCATE(' ', name) - 1) AS middle_name,
    SUBSTRING_INDEX(name, ' ', -1) AS last_name
FROM Student;

Above, we use the SUBSTRING and LOCATE functions to extract the part of the name between the first and second spaces, which is the middle name. We then split the values into multiple columns, namely first_name, middle_name, and last_name.

3.3. Handling Extra Spaces and Multiple Delimiters

Extra spaces can cause issues when we’re splitting a string. To handle them, we can make use of the TRIM and REPLACE functions:

SELECT 
    SUBSTRING_INDEX(TRIM(REPLACE(name, '  ', ' ')), ' ', 1) AS first_name,
    SUBSTRING_INDEX(TRIM(REPLACE(name, '  ', ' ')), ' ', -1) AS last_name
FROM Student;

The above query uses the TRIM function to remove any leading or trailing spaces, while the REPLACE function replaces all double spaces with a single space.

Furthermore, if different delimiters separate the values in the name column, we can use REPLACE to standardize the delimiters before splitting the values:

SELECT
    SUBSTRING_INDEX(REPLACE(REPLACE(name, '-', ' '), '_', ' '), ' ', 1) AS first_name,
    SUBSTRING_INDEX(REPLACE(REPLACE(name, '-', ' '), '_', ' '), ' ', -1) AS last_name
FROM Student;

The query above handles names with different delimiters (hyphens and underscores) by converting them into spaces and then splitting them into first_name and last_name columns.

4. Splitting Values in PostgreSQL

PostgreSQL supports string manipulation, which makes it easier to split column values into multiple columns. We’ll explore using the SPLIT_PART and STRING_TO_ARRAY functions to achieve this.

4.1. Using SPLIT_PART Function

The SPLIT_PART function allows us to extract a substring from a string based on a specified delimiter and the position of the desired substring. Here, we’ll use it to split the values of a specific column into multiple columns:

SELECT 
    SPLIT_PART(name, ' ', 1) AS first_name,
    SPLIT_PART(name, ' ', 2) AS last_name
FROM
    Student;

Above, we split the values of the name column into two separate columns, first_name and last_name.

4.2. Using STRING_TO_ARRAY Function

The STRING_TO_ARRAY function splits a string into an array of substrings based on a delimiter we specify. We then use array indexing to assign these array values to different columns:

WITH name_array AS (
    SELECT
        STRING_TO_ARRAY(name, ' ') AS name_parts
    FROM Student
)
SELECT
    name_parts[1] AS first_name,
    name_parts[2] AS last_name
FROM name_array;

Using a common table expression, the above query selects the name column from the Student table and splits it into an array of strings using the STRING_TO_ARRAY function. We then store the array in a column named name_parts.

Next, we extract the first and second elements from the name_parts array and assign them to the first_name and last_name columns.

4.3. Handling More Complex Scenarios

Sometimes, the data stored in the name column may contain a middle name. To handle them, we can use a combination of STRING_TO_ARRAY and conditional logic:

WITH name_array AS (
    SELECT
        STRING_TO_ARRAY(name, ' ') AS name_parts
    FROM Student
)
SELECT
    name_parts[1] AS first_name,
    CASE WHEN array_length(name_parts, 1) = 3 THEN name_parts[2] END AS middle_name,
    CASE WHEN array_length(name_parts, 1) = 3 THEN name_parts[3]
         WHEN array_length(name_parts, 1) = 2 THEN name_parts[2]
    END AS last_name
FROM name_array;

In this query, we handle names with and without middle names by checking the length of the name_parts array. We start by extracting the first element in the array as first_name.

Next, if the array has three elements, we assign the second element to middle_name. Lastly, if the array has three elements, we assign the third element to last_name. Additionally, if the array has only two elements, we assign the second name to last_name.

5. Splitting Values in SQL Server

SQL Server offers a range of string manipulation functions that make it possible to split column values into multiple columns. Here, we’ll discuss using the CHARINDEX and SUBSTRING functions.

5.1. Using SUBSTRING and CHARINDEX Functions

Combining the SUBSTRING and CHARINDEX functions makes splitting single-column values into multiple columns easier. Here, the CHARINDEX function helps locate the position of spaces within the string, while SUBSTRING extracts the actual text based on those positions:

SELECT 
    SUBSTRING(name, 1, CHARINDEX(' ', name + ' ') - 1) AS first_name,
    SUBSTRING(name, LEN(name) - CHARINDEX(' ', REVERSE(name)) + 2, LEN(name)) AS last_name
FROM Student;

The above query extracts the values in the name column by splitting them at the space delimiter. We then assign the values to two separate columns named first_name and last_name.

5.2. Handling More Complex Scenarios

In some cases, the name column might contain a middle name or more than one space. To handle this, we can make use of a CASE statement to check the number of spaces in the name:

SELECT
  SUBSTRING(name, 1, CHARINDEX(' ', name + ' ') - 1) AS first_name,
  
  CASE
    WHEN LEN(name) - LEN(REPLACE(name, ' ', '')) < 2
    THEN NULL
    ELSE SUBSTRING(name, CHARINDEX(' ', name) + 1, 
                   CHARINDEX(' ', name + ' ', CHARINDEX(' ', name) + 1) - CHARINDEX(' ', name) - 1)
  END AS middle_name,

  SUBSTRING(name, LEN(name) - CHARINDEX(' ', REVERSE(name)) + 2, LEN(name)) AS last_name
FROM Student;

In this query, we split the values in the name column into separate columns named first_name, middle_name, and last_name. The CASE statement checks if there is a middle name by counting the number of spaces in the name column.

To clarify, if there is only one space, we return NULL for the middle_name; otherwise, we extract the middle_name.

6. Conclusion

In this article, we discussed splitting single-column values into multiple-columns in SQL, focusing on the MySQL, PostgreSQL, and SQL Server databases.

In MySQL, we looked at the SUBSTRING_INDEX, SUBSTRING, LOCATE, and TRIM functions. Next, we discussed using the SPLIT_PART and STRING_TO_ARRAY functions in PostgreSQL. Lastly, we explored using the SUBSTRING and CHARINDEX functions in SQL Server.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.