Baeldung Pro – Linux – NPI EA (cat = Baeldung on Linux)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

Partner – Orkes – NPI EA (tag=Kubernetes)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

1. Overview

As Linux users, we often perform various operations on our files. One of the more common operations is delimiter conversion. For example, we may wish to convert a tab-delimited file to Comma Separated Values (CSV) in order to use it with an application that needs that format.

In this tutorial, we’ll look at various ways to accomplish this using bash.

2. Setting up an Example

Let’s create a sample file input.txt with tabs in it:

str1   str2
str3       str4
str5               str6
str7           str8

And then let’s check that this file has the right content:

$ cat --show-tabs input.txt
str1^Istr2
str3^I^Istr4
str5^I^I^I^Istr6
str7^I^I^Istr8

In the above example, we’ve used the –show-tabs option of the cat command. This option displays the TAB character as ^I.

3. Using the tr Command

We can use tr manipulate a file when we want to translate or delete characters. Let’s use it to convert the TAB characters to commas:

$ cat input.txt | tr -s "\\t" "," > output.txt

In this example, -s represents the squeeze-repeats operation, which we’ve used to replace multiple TAB characters with a single comma.

Let’s check the result:

$ cat output.txt
str1,str2
str3,str4
str5,str6
str7,str8

We should note that although the file had multiple tab delimiters, tr has converted them to single commas in each case.

4. Using the awk Command

The awk command is an interpreter for the AWK programming language. It allows us to perform complex text processing using concise code. We can use its string manipulation functions to achieve the desired results:

$ awk '{ gsub(/[\t]/,","); print }' input.txt > output.txt
$ cat output.txt 
str1,str2
str3,,str4
str5,,,,str6
str7,,,str8

In the above example, we used a regular expression with the gsub function. This converted each tab into a separate comma. We could, if we prefer, use the expression gsub(/[\t]+/,”,”); to substitute multiple TAB characters.

5. Using the sed Command

sed is a stream editor for filtering and transforming text. It allows us to perform text processing in a non-interactive way. We can use its substitute command for converting TABs to commas:

$ sed 's/\t\+/,/g' input.txt > output.txt
$ cat output.txt 
str1,str2
str3,str4
str5,str6
str7,str8

In this example, we used a regular expression with the sed command. Here we’ve chosen to replace multiple tabs using the \t\+ regular expression.

6. Conclusion

In this article, we discussed some of the common ways of converting tab-delimited files to CSV.

First, we used the tr command. Then we saw how to use sed and awk with regular expressions. We also looked at whether we wanted to convert all TAB characters to a single comma, or whether we wished to preserve the blank columns and convert each TAB character individually.