Baeldung Pro – Linux – NPI EA (cat = Baeldung on Linux)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

Partner – Orkes – NPI EA (tag=Kubernetes)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

1. Overview

Computers use character encoding to map characters to binary numbers in order to store text data. Examples of character encodings include UTF-8, UTF-16, UTF-32, ASCII.

In this tutorial, we’ll learn how to find the encoding of a file in Linux.

2. Using file

One way to find the encoding of a file is using the file command:

$ file -bi text1.txt
text/plain; charset=us-ascii
$ file -bi text2.txt
text/plain; charset=utf-8

In the above snippet:

  • -b tells file to exclude the file name from the output; so the output is brief
  • -i tells file to include MIME-type information in the output; this information includes the media type and the character encoding of the file

Firstly, text1.txt is a plain text file with US-ASCII character-set encoding. Secondly, text2.txt is a plain text file with UTF-8 character-set encoding.

3. Using enca

Another way to find file encoding is to use enca. However, enca is not installed by default. So we need to install it first:

$ sudo apt update
$ sudo apt install enca

Now we can use enca:

$ enca -L none text1.txt 
7bit ASCII characters
$ enca -L none text2.txt
Universal transformation format 8 bits; UTF-8

In the snippet above, -L determines the language of the input file, which is in English. We should set it to none if it’s English.

Firstly, text1.txt uses 7-bit ASCII, aka US-ASCII, as character-set encoding. Secondly, text2.txt uses UTF-8 character-set encoding.

4. Conclusion

In this brief article, we discussed two methods to find the character encoding of a file in Linux.