Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

When it comes to Microsoft Excel files, reading values from different cells can be a little tricky. Excel files are spreadsheets organized in rows and cells which can contain String, Numeric, Date, Boolean, and even Formula values. Apache POI is a library offering a full suite of tools to handle different excel files and value types.

In this tutorial, we’ll focus on learning how to handle excel files, iterate through rows and cells, and use the proper way to read each cell value type.

2. Maven Dependency

Let’s start by adding the Apache POI dependency to pom.xml:

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>5.2.5</version>
</dependency>

The latest versions of poi-ooxml can be found at Maven Central.

3. Apache POI Overview

The hierarchy starts with the workbook, which represents the whole Excel File. Each file can contain one or more worksheets, which are collections of rows and cells. Depending on the version of the excel file HSSF is the prefix of the classes representing the old Excel files (.xls), whereas the XSSF is used for the newest versions (.xlsx). Therefore we have:

  • XSSFWorkbook and HSSFWorkbook classes represent the Excel workbook
  • Sheet interface represents Excel worksheets
  • The Row interface represents rows
  • The Cell interface represents cells

3.1. Handling Excel Files

First, we open the file we want to read and convert it into a FileInputStream for further processing. FileInputStream constructor throws a java.io.FileNotFoundException so we need to wrap it around a try-catch block and close the stream at the end:

public static void readExcel(String filePath) {
    File file = new File(filePath);
    try {
        FileInputStream inputStream = new FileInputStream(file);
        ...
        inputStream.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

3.2. Iterating Through the Excel File

After we successfully open the InputStream it’s time to create the XSSFWorkbook and iterate through the rows and cells of each sheet. In case we know the exact number of sheets or the name of a specific sheet, we can use the getSheetAt(int index) and getSheet(String sheetName) methods of XSSFWorkbook, respectively.

Since we want to read through any kind of Excel file, we’ll iterate through all the sheets using three nested for loops, one for the sheets, one for the rows of each sheet, and finally one for the cells of each sheet.

For the sake of this tutorial we will only print the data to the console:

FileInputStream inputStream = new FileInputStream(file);
Workbook baeuldungWorkBook = new XSSFWorkbook(inputStream);
for (Sheet sheet : baeuldungWorkBook) {
...
}

Then, in order to iterate through the rows of a sheet, we need to find the index of the first row and the last row which we get from the sheet object:

int firstRow = sheet.getFirstRowNum();
int lastRow = sheet.getLastRowNum();
for (int index = firstRow + 1; index <= lastRow; index++) {
    Row row = sheet.getRow(index);
}

Finally, we do the same for the cells. Also, while accessing each cell we can optionally pass down a MissingCellPolicy which basically tells the POI what to return when a cell value is empty or null. The MissingCellPolicy enum contains three enumerated values:

  • RETURN_NULL_AND_BLANK
  • RETURN_BLANK_AS_NULL
  • CREATE_NULL_AS_BLANK;

The code for the cell iteration is as follows:

for (int cellIndex = row.getFirstCellNum(); cellIndex < row.getLastCellNum(); cellIndex++) {
    Cell cell = row.getCell(cellIndex, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK);
    ...
}

3.3. Reading Cell Values in Excel

As we mentioned before, Microsoft Excel’s cells can contain different value types, so it’s important to be able to distinguish one cell value type from another and use the appropriate method to extract the value. Below there’s a list of all the value types:

  • NONE
  • NUMERIC
  • STRING
  • FORMULA
  • BLANK
  • BOOLEAN
  • ERROR

We’ll focus on four main cell value types: Numeric, String, Boolean, and Formula, where the last one contains a calculated value that is of the first three types.

Let’s create a helper method that basically will check for each value type and based on that it’ll use the appropriate method to access the value. It’s also possible to treat the cell value as a String and retrieve it with the corresponding method.

There are two important things worth noting. First, Date values are stored as Numeric values, and also if the cell’s value type is FORMULA we need to use the getCachedFormulaResultType() instead of the getCellType() method to check the result of Formula’s calculation:

public static void printCellValue(Cell cell) {
    CellType cellType = cell.getCellType().equals(CellType.FORMULA)
      ? cell.getCachedFormulaResultType() : cell.getCellType();
    if (cellType.equals(CellType.STRING)) {
        System.out.print(cell.getStringCellValue() + " | ");
    }
    if (cellType.equals(CellType.NUMERIC)) {
        if (DateUtil.isCellDateFormatted(cell)) {
            System.out.print(cell.getDateCellValue() + " | ");
        } else {
            System.out.print(cell.getNumericCellValue() + " | ");
        }
    }
    if (cellType.equals(CellType.BOOLEAN)) {
        System.out.print(cell.getBooleanCellValue() + " | ");
    }
}

Now, all we need to do is call the printCellValue method inside the cell loop and we are done. Here’s a snippet of the full code:

...
for (int cellIndex = row.getFirstCellNum(); cellIndex < row.getLastCellNum(); cellIndex++) {
    Cell cell = row.getCell(cellIndex, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK);
    printCellValue(cell);
}
...

4. Conclusion

In this article, we have shown an example project for reading Excel files and accessing different cell values using Apache POI.

The full source code can be found over on GitHub.

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.