Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

Sometimes, we need to read raw text from files and clean up messy content by removing line breaks.

In this tutorial, we’ll explore various approaches for removing line breaks from files in Java.

2. A Word About Line Breaks

Before we dive into the code for reading from files and removing line breaks, let’s quickly understand the target objects we want to remove: the line breaks.

At first glance, it’s pretty straightforward. A line break is a character breaking a line. However, there are different kinds of line breaks. We may fall into pitfalls if we don’t treat them properly. An example can explain it quickly.

Let’s say we have two text files, mutiple-line-1.txt and multiple-line-2.txt. Let’s call them file1 and file2. If we open them in IDE’s editor, for example, IntelliJ, both files look the same:

A,
 B,
 C,
 D,
 E,
 F

As we can see, each file has six lines, and there is a leading space character on each line from the second line. So, we believe file1 and file2 contain the exact text.

However, now let’s print the file content using the cat command with the -n (show line numbers) and -e (show non-printing characters) options:

$ cat -ne multiple-line-1.txt
     1  A,$
     2   B,$
     3   C,$
     4   D,$
     5   E,$
     6   F$

file1’s output is the same as we saw in the IntelliJ editor. But file2 looks quite different:

$ cat -ne multiple-line-2.txt
     1  A,^M B,$
     2   C,$
     3   D,^M E,$
     4   F$

This is because there are three different line breaks:

  • ‘\r’ – CR (Carriage Return), the line break in Mac OS before X
  • ‘\n’ – LF (Line Feed), the line break in *nix and Mac OS
  • ‘\r\n’ – CRLF, the line break in Windows

cat -e displays CRLF as ‘^M‘. So, we see file2 contains CRLF. Possibly, the file is created in Windows. Depending on requirements, we may want to remove all kinds of line breaks or only line breaks of the current system.

Next, we’ll take these two files as examples to see how to read content from them and remove line breaks. For simplicity, we’ll create two helper methods to return each file’s Path:

Path file1Path() throws Exception {
    return Paths.get(this.getClass().getClassLoader().getResource("multiple-line-1.txt").toURI());
} 

Path file2Path() throws Exception {
    return Paths.get(this.getClass().getClassLoader().getResource("multiple-line-2.txt").toURI());
}

Note that the approaches used in this article require reading the whole text into memory, so be aware of very large files.

3. Replacing line.separator With an Empty String

The system property line.separator stores the line separator that is specific to the current operating system. Therefore, if we only want to remove line breaks particular to the current system, we can replace line.separator with an empty string. For example, this approach removes all line breaks from file1 on a Linux box:

String content = Files.readString(file1Path(), StandardCharsets.UTF_8);

String result = content.replace(System.getProperty("line.separator"), "");
assertEquals("A, B, C, D, E, F", result);

We use the Files class’s readString() method to load file content in a string. Then, we apply the replacement by replace().

However, the same approach won’t remove all line breaks from file2, as it contains CRLF line breaks:

String content = Files.readString(file2Path(), StandardCharsets.UTF_8);

String result = content.replace(System.getProperty("line.separator"), "");
assertNotEquals("A, B, C, D, E, F", result); // <-- NOT equals assertion!

Next, let’s see if we can remove all line breaks system-independently.

4. Replacing “\n” and “\r” With Empty Strings

We’ve learned all three different line breaks cover “\n” and “\r” characters. Therefore, if we want to remove all line breaks system-independently, we can replace “\n” and “\r” with empty strings:

String content1 = Files.readString(file1Path(), StandardCharsets.UTF_8);

// file contains CRLF
String content2 = Files.readString(file2Path(), StandardCharsets.UTF_8);

String result1 = content1.replace("\r", "").replace("\n", "");
String result2 = content2.replace("\r", "").replace("\n", "");

assertEquals("A, B, C, D, E, F", result1);
assertEquals("A, B, C, D, E, F", result2);

Of course, we can also use the regex-based replaceAll() method to achieve the same goal. Let’s take file2 as an example to see how it works:

String resultReplaceAll = content2.replaceAll("[\\n\\r]", "");
assertEquals("A, B, C, D, E, F", resultReplaceAll);

5. Using readAllLines() and Then join()

Let’s recall the two approaches we’ve learned so far. We first read the entire content from a file, then replace the line.separator system property or “\n” and “\r” characters with empty. One commonality between these approaches is that we manually manage the line breaks ourselves.

The Files class offers readAllLines() to read the file content into lines and return a list of strings. It’s worth noting that readAllLines() takes all mentioned three line breaks as a line separator. In other words, this method removes all line breaks from the input. What we need to do is join the elements in the returned list.

The join() method is pretty convenient to join a list or an array of strings:

List<String> lines1 = Files.readAllLines(file1Path(), StandardCharsets.UTF_8);

// file contains CRLF
List<String> lines2 = Files.readAllLines(file2Path(), StandardCharsets.UTF_8);

String result1 = String.join("", lines1);
String result2 = String.join("", lines2);

assertEquals("A, B, C, D, E, F", result1);
assertEquals("A, B, C, D, E, F", result2);

6. Conclusion

In this article, we first discussed the different kinds of line breaks. Then, we explored various approaches to removing line breaks from a file.

As always, the complete source code for the examples is available over on GitHub.

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
4 Comments
Oldest
Newest
Inline Feedbacks
View all comments
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.