1. Overview
Simply put, a CSV (Comma-Separated Values) file contains organized information separated by a comma delimiter.
In this tutorial, we’ll look into different options to read a CSV file into a Java array.
2. Sample CSV File
Let’s use a sample CSV file, book.csv:
Mary Kom,Unbreakable
Kapil Isapuari,Farishta
As a first choice, we should specify a comma delimiter to split a line into distinct values in a Java application:
public static final String COMMA_DELIMITER = ",";
It is important to note that no single delimiter character or regular expression can parse all types of values. This applies not just to values with embedded commas, but those with other embedded characters as well.
When we have a preexisting CSV file with the delimiter character or regular expression embedded in one or more values, we should change the delimiter character or regular expression if we can’t use any of the methods discussed in this tutorial.
When we create a new CSV file, we should choose the delimiter character or regular expression to be one that is not embedded in any of the values.
3. BufferedReader in java.io
First, let’s read the records line by line using readLine() in BufferedReader and then split each line into tokens based on the comma delimiter:
List<List<String>> records = new ArrayList<>();
try (BufferedReader br = new BufferedReader(new FileReader("book.csv"))) {
String line;
while ((line = br.readLine()) != null) {
String[] values = line.split(COMMA_DELIMITER);
records.add(Arrays.asList(values));
}
}
4. Scanner in java.util
Next, let’s use a java.util.Scanner to run through the contents of the file and retrieve lines serially, one by one:
List<List<String>> records = new ArrayList<>();
try (Scanner scanner = new Scanner(new File("book.csv"))) {
while (scanner.hasNextLine()) {
records.add(getRecordFromLine(scanner.nextLine()));
}
}
Next, let’s parse the lines and store them in an array:
private List<String> getRecordFromLine(String line) {
List<String> values = new ArrayList<String>();
try (Scanner rowScanner = new Scanner(line)) {
rowScanner.useDelimiter(COMMA_DELIMITER);
while (rowScanner.hasNext()) {
values.add(rowScanner.next());
}
}
return values;
}
5. Using Files Utility Class
Alternatively, we can use the Files class to achieve the same objective. This utility class consists of several static methods that operate on files and directories. So, let’s see how to use it in practice.
5.1. Using Files#lines
The lines() method is one of the enhancements introduced in Java 8. It allows us to read all lines of a given file as a stream. So, let’s see it in action:
try (Stream<String> lines = Files.lines(Paths.get(CSV_FILE))) {
List<List<String>> records = lines.map(line -> Arrays.asList(line.split(COMMA_DELIMITER)))
.collect(Collectors.toList());
}
Here, the Paths.get(CSV_FILE) method returns a Path instance, which denotes the path to the CSV file. Furthermore, we used the map() method to convert each line of the CSV file to a list of strings. Please note that we used a try-with-resources to ensure that the file is closed automatically at the end.
5.2. Using Files#readAllLines
Similarly, Files offers the readAllLines() method as another alternative to achieve the same outcome. This method, like lines(), accepts a Path object as a parameter and returns directly a list containing each line of the specified CSV file:
List<List<String>> records = Files.readAllLines(Paths.get(CSV_FILE))
.stream()
.map(line -> Arrays.asList(line.split(COMMA_DELIMITER)))
.collect(Collectors.toList());
Notably, we used the stream API to read the CSV file into a List<List<String>>. An important caveat to mention here is that readAllLines() puts everything in memory at once, so don’t use it to read large files.
5.3. Using Files#newBufferedReader
Another option would be to use the newBufferedReader() method. It returns an instance of BufferedReader, which provides a way to read the file more efficiently.
Next, let’s learn how to use this method through an example:
try (BufferedReader reader = Files.newBufferedReader(Paths.get(CSV_FILE))) {
List<List<String>> records = reader.lines()
.map(line -> Arrays.asList(line.split(COMMA_DELIMITER)))
.collect(Collectors.toList());
}
As shown above, we used the same logic as before to read the CSV file. Please note that newBufferedReader() is the best way to go when working with large files compared to other methods.
6. Reading Values With Embedded Commas
With more sophisticated CSVs that include commas in values, we can’t use a single, unmodified comma (,) as a delimiter with the BufferedReader, the Scanner, or the Files utility class. This is because it splits a line into distinct values using a comma; therefore, it splits the values themselves as it is not able to distinguish between a comma embedded within a value and a comma used to delimit two values.
To parse the comma-containing values as distinct values, we have several options:
- Pad the comma delimiter with whitespaces to distinguish it from a comma embedded within a value
- Use a custom CSV parser
- Use an alternative delimiter
Let’s explore some of these alternatives.
6.1. Using a Custom CSV Parser
One option would be to use a custom CSV parser that reads line by line and uses a StringBuilder to fetch each value. An advantage of using this approach is being able to use a CSV file with commas embedded within values, along with a comma delimiter:
"Kom, Mary",Unbreakable
"Isapuari, Kapil",Farishta
Let’s learn how to use this method through an example:
List<List<String>> records = new ArrayList<List<String>>();
try (BufferedReader br = new BufferedReader(new FileReader(CSV_FILE))) {
String line = "";
while ((line = br.readLine()) != null) {
records.add(parseLine(line));
}
}
Next, let’s parse each line and store it in a List<String>:
private static List<String> parseLine(String line) {
List values = new ArrayList<>();
boolean inQuotes = false;
StringBuilder currentValue = new StringBuilder();
for (char c : line.toCharArray()) {
if (c == '"') {
inQuotes = !inQuotes;
} else if (c == ',' && !inQuotes) {
values.add(currentValue.toString());
currentValue = new StringBuilder();
} else {
currentValue.append(c);
}
}
values.add(currentValue.toString());
return values;
}
6.2. Using an Alternative Parser
We can use a delimiter other than a comma, such as ‘|’, ‘/’, ‘\’, or ‘;’. Accordingly, let’s enclose values in double quotes in the CSV file, and use the pipe character “|” as the delimiter:
"Kom, Mary"|Unbreakable
"Isapuari, Kapil"|Farishta
Furthermore, to parse the CSV file, we need to make the delimiter the same as in the CSV file:
public static final String COMMA_DELIMITER = "\\|";
Let’s note that we need to escape special characters in a Java regular expression. With this regular expression, we can use a BufferedReader, Scanner, or the Files utility class to parse the sample CSV file into an ArrayList with elements as [“Kom, Mary”,Unbreakable], and [“Isapuari, Kapil”,Farishta].
We have the flexibility of using a different delimiter regular expression depending on how the commas and other characters are embedded within values. Even so, this approach has its limitations, for it can’t be used if the chosen delimiter character, such as the pipe character “|” in the example, also appears in any of the values.
7. OpenCSV
Let’s explore a more resilient approach to reading a CSV file. OpenCSV is a third-party library that provides an API to work with CSV files.
We’ll use the readNext() method in CSVReader to read the records in the file:
List<List<String>> records = new ArrayList<List<String>>();
try (CSVReader csvReader = new CSVReader(new FileReader("book.csv"));) {
String[] values = null;
while ((values = csvReader.readNext()) != null) {
records.add(Arrays.asList(values));
}
}
By default, we can use OpenCSV to read a CSV file with commas embedded within values without requiring an alternative delimiter:
"Kom, Mary",Unbreakable
"Isapuari, Kapil",Farishta
To dig deeper and learn more about OpenCSV, check out our OpenCSV tutorial.
8. Conclusion
In this quick article, we explored different ways to read CSV files into an array. Further, we explored the options to read a CSV file when a comma is embedded in values.
The code backing this article is available on GitHub. Once you're
logged in as a Baeldung Pro Member, start learning and coding on the project.