1. Overview

As Linux users, we often perform various operations on our files. One of the more common operations is delimiter conversion. For example, we may wish to convert a tab-delimited file to Comma Separated Values (CSV) in order to use it with an application that needs that format.

In this tutorial, we’ll look at various ways to accomplish this using bash.

2. Setting up an Example

Let’s create a sample file input.txt with tabs in it:

str1   str2
str3       str4
str5               str6
str7           str8

And then let’s check that this file has the right content:

$ cat --show-tabs input.txt
str1^Istr2
str3^I^Istr4
str5^I^I^I^Istr6
str7^I^I^Istr8

In the above example, we’ve used the –show-tabs option of the cat command. This option displays the TAB character as ^I.

3. Using the tr Command

We can use tr manipulate a file when we want to translate or delete characters. Let’s use it to convert the TAB characters to commas:

$ cat input.txt | tr -s "\\t" "," > output.txt

In this example, -s represents the squeeze-repeats operation, which we’ve used to replace multiple TAB characters with a single comma.

Let’s check the result:

$ cat output.txt
str1,str2
str3,str4
str5,str6
str7,str8

We should note that although the file had multiple tab delimiters, tr has converted them to single commas in each case.

4. Using the awk Command

The awk command is an interpreter for the AWK programming language. It allows us to perform complex text processing using concise code. We can use its string manipulation functions to achieve the desired results:

$ awk '{ gsub(/[\t]/,","); print }' input.txt > output.txt
$ cat output.txt 
str1,str2
str3,,str4
str5,,,,str6
str7,,,str8

In the above example, we used a regular expression with the gsub function. This converted each tab into a separate comma. We could, if we prefer, use the expression gsub(/[\t]+/,”,”); to substitute multiple TAB characters.

5. Using the sed Command

sed is a stream editor for filtering and transforming text. It allows us to perform text processing in a non-interactive way. We can use its substitute command for converting TABs to commas:

$ sed 's/\t\+/,/g' input.txt > output.txt
$ cat output.txt 
str1,str2
str3,str4
str5,str6
str7,str8

In this example, we used a regular expression with the sed command. Here we’ve chosen to replace multiple tabs using the \t\+ regular expression.

6. Conclusion

In this article, we discussed some of the common ways of converting tab-delimited files to CSV.

First, we used the tr command. Then we saw how to use sed and awk with regular expressions. We also looked at whether we wanted to convert all TAB characters to a single comma, or whether we wished to preserve the blank columns and convert each TAB character individually.

Comments are closed on this article!