
Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: March 18, 2024
Computers use character encoding to map characters to binary numbers in order to store text data. Examples of character encodings include UTF-8, UTF-16, UTF-32, ASCII.
In this tutorial, we’ll learn how to find the encoding of a file in Linux.
One way to find the encoding of a file is using the file command:
$ file -bi text1.txt
text/plain; charset=us-ascii
$ file -bi text2.txt
text/plain; charset=utf-8
In the above snippet:
Firstly, text1.txt is a plain text file with US-ASCII character-set encoding. Secondly, text2.txt is a plain text file with UTF-8 character-set encoding.
Another way to find file encoding is to use enca. However, enca is not installed by default. So we need to install it first:
$ sudo apt update
$ sudo apt install enca
Now we can use enca:
$ enca -L none text1.txt
7bit ASCII characters
$ enca -L none text2.txt
Universal transformation format 8 bits; UTF-8
In the snippet above, -L determines the language of the input file, which is in English. We should set it to none if it’s English.
Firstly, text1.txt uses 7-bit ASCII, aka US-ASCII, as character-set encoding. Secondly, text2.txt uses UTF-8 character-set encoding.
In this brief article, we discussed two methods to find the character encoding of a file in Linux.