eBook – Guide Spring Cloud – NPI EA (cat=Spring Cloud)
announcement - icon

Let's get started with a Microservice Architecture with Spring Cloud:

>> Join Pro and download the eBook

eBook – Mockito – NPI EA (tag = Mockito)
announcement - icon

Mocking is an essential part of unit testing, and the Mockito library makes it easy to write clean and intuitive unit tests for your Java code.

Get started with mocking and improve your application tests using our Mockito guide:

Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Reactive – NPI EA (cat=Reactive)
announcement - icon

Spring 5 added support for reactive programming with the Spring WebFlux module, which has been improved upon ever since. Get started with the Reactor project basics and reactive programming in Spring Boot:

>> Join Pro and download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Jackson – NPI EA (cat=Jackson)
announcement - icon

Do JSON right with Jackson

Download the E-book

eBook – HTTP Client – NPI EA (cat=Http Client-Side)
announcement - icon

Get the most out of the Apache HTTP Client

Download the E-book

eBook – Maven – NPI EA (cat = Maven)
announcement - icon

Get Started with Apache Maven:

Download the E-book

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

eBook – RwS – NPI EA (cat=Spring MVC)
announcement - icon

Building a REST API with Spring?

Download the E-book

Course – LS – NPI EA (cat=Jackson)
announcement - icon

Get started with Spring and Spring Boot, through the Learn Spring course:

>> LEARN SPRING
Course – RWSB – NPI EA (cat=REST)
announcement - icon

Explore Spring Boot 3 and Spring 6 in-depth through building a full REST API with the framework:

>> The New “REST With Spring Boot”

Course – LSS – NPI EA (cat=Spring Security)
announcement - icon

Yes, Spring Security can be complex, from the more advanced functionality within the Core to the deep OAuth support in the framework.

I built the security material as two full courses - Core and OAuth, to get practical with these more complex scenarios. We explore when and how to use each feature and code through it on the backing project.

You can explore the course here:

>> Learn Spring Security

Course – LSD – NPI EA (tag=Spring Data JPA)
announcement - icon

Spring Data JPA is a great way to handle the complexity of JPA with the powerful simplicity of Spring Boot.

Get started with Spring Data JPA through the guided reference course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (cat=Spring Boot)
announcement - icon

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

Course – LJB – NPI EA (cat = Core Java)
announcement - icon

Code your way through and build up a solid, practical foundation of Java:

>> Learn Java Basics

1. Introduction

Once complex and niche, document conversion is now a common part of not only toolsets, but also libraries and even native functionality of different programming languages.

In this tutorial, we’ll learn how to convert a Word document into an HTML page that can be rendered inside a browser. Specifically, we’ll learn two ways to convert documents programmatically using Apache POI. First, we’ll convert modern docx files. After that, we’ll look at the legacy doc format. In general, this use case is common in enterprise applications.

2. Differences Between doc and docx

Until 2007, Microsoft Word used the legacy doc format, which relied on a binary representation. As a consequence, interoperability and preservation of formatting became harder while working across different tools.​

After 2007, Word moved to the Office Open XML-based docx format. This format is structured, standardized, and often much easier to process programmatically.

Because of that, converting Word documents requires a different approach depending on the format. To that end, we start with docx, but then also cover doc for backward compatibility.

3. Maven Dependencies

To support both formats, we need Apache POI modules plus the XHTML converter provided by XDocReport:

<dependency>  
    <groupId>org.apache.poi</groupId>  
    <artifactId>poi-ooxml</artifactId>  
    <version>5.5.1</version>  
</dependency>  
<dependency>  
    <groupId>org.apache.poi</groupId>  
    <artifactId>poi-scratchpad</artifactId>  
    <version>5.5.1</version>  
</dependency>  
<dependency>  
    <groupId>fr.opensagres.xdocreport</groupId>  
    <artifactId>fr.opensagres.poi.xwpf.converter.xhtml</artifactId>  
    <version>2.1.0</version>  
</dependency>

The converter can also be configured with an ImageManager so that embedded images are written to secondary storage and referenced from the generated HTML.

4. Converting docx Documents

A docx file is essentially a ZIP archive containing XML parts. Apache POI hides that complexity behind the XWPFDocument API, which gives us a much cleaner way to work with Word content.

4.1. Using Apache POI to Convert Documents

Apache POI represents docx files with the XWPFDocument class.

First, let’s load the document from storage:​

public XWPFDocument loadDocxFromPath(String path) {  
    try {  
        Path file = Paths.get(path);  
        if (!Files.exists(file)) {  
            throw new FileNotFoundException("File not found: " + path);  
        }  
        XWPFDocument document = new XWPFDocument(Files.newInputStream(file));  
        boolean hasParagraphs = !document.getParagraphs().isEmpty();  
        boolean hasTables = !document.getTables().isEmpty();  
        if (!hasParagraphs && !hasTables) {  
            document.close();  
            throw new IllegalArgumentException("Document is empty: " + path);  
        }  
        return document;  
    } catch (IOException ex) {  
        throw new UncheckedIOException("Cannot load document: " + path, ex);  
    }  
}

In the code above, we load the docx file from a path and reject empty documents.

Next, we configure XHTMLOptions for the generated HTML. XDocReport supports ImageManager, which stores extracted images in an images directory in the same directory as the one containing the final HTML output:

private XHTMLOptions configureHtmlOptions(Path outputDir) {  
    XHTMLOptions options = XHTMLOptions.create();  
    options.setImageManager(new ImageManager(outputDir.toFile(), "images"));  
    return options;  
}

Now, we can convert the document and save the HTML file next to the input document:

public void convertDocxToHtml(String docxPath) throws IOException {  
    Path input = Paths.get(docxPath);  
    String htmlFileName = input.getFileName().toString().replaceFirst("\\.[^.]+$", "") + ".html";  
    Path output = input.resolveSibling(htmlFileName);  
    try (XWPFDocument document = loadDocxFromPath(docxPath);
      OutputStream out = Files.newOutputStream(output)) {  
        XHTMLConverter.getInstance().convert(document, out, configureHtmlOptions(output.getParent()));  
    }  
}

Next, let’s write a test to verify the conversion:

@Test  
void givenSimpleDocx_whenConverting_thenHtmlFileIsCreated() throws IOException {  
    DocxToHtml converter = new DocxToHtml();  
    Path docx = Paths.get(this.getClass().getResource("/sample.docx").getPath());  
    converter.convertDocxToHtml(docx.toString());  
    Path html = docx.resolveSibling("sample.html");  
    assertTrue(Files.exists(html));  
    String content = Files.lines(html, StandardCharsets.UTF_8)  
        .collect(Collectors.joining("\n"));  
    assertTrue(content.contains("<html"));  
}

As we can see, we’re reading the file in UTF-8, which the Apache library correctly saved. But there could be other things to consider as well.

4.2. Handling Large Documents

For large documents, resource management matters. Using try-with-resources helps release streams and document data as soon as the conversion finishes.

If needed, run conversions asynchronously to prevent large files from blocking request threads.

4.3. Handling Nested Tables and Complex Formats

Lastly, we’re not considering nested tables and complex layouts in here. Thus, the input Word document may not always match perfectly with HTML visually. The function works best with regular paragraphs, tables, and basic formatting, but it has limitations. For example, the sample docx file contains a graph that isn’t converted in the output document. To keep things simple, we won’t convert it here.

Still, in production systems, it’s a good idea to add regression tests with real sample documents that reflect the layouts that the application might have to support.

5. Legacy doc Conversion

The older doc format is handled by POI’s HWPF APIs rather than XWPF. Apache POI provides WordToHtmlConverter for this use case:​

public void convertDocToHtml(String docPath) throws Exception {  
    Path input = Paths.get(docPath);  
    String htmlFileName = input.getFileName().toString().replaceFirst("\\.[^.]+$", "") + ".html";  
    Path output = input.resolveSibling(htmlFileName);  
    Path imagesDir = input.resolveSibling("images");  
    Files.createDirectories(imagesDir);
      
    try (InputStream in = Files.newInputStream(Paths.get(docPath));
      OutputStream out = Files.newOutputStream(output)) {  
        HWPFDocumentCore document = WordToHtmlUtils.loadDoc(in);  
        Document htmlDocument = DocumentBuilderFactory.newInstance()  
            .newDocumentBuilder()  
            .newDocument();
              
        WordToHtmlConverter converter = new WordToHtmlConverter(htmlDocument);  
        converter.setPicturesManager((content, pictureType, suggestedName, widthInches, heightInches) -> {  
            Path imageFile = imagesDir.resolve(suggestedName);  
            try {  
                Files.write(imageFile, content);  
            } catch (IOException e) {  
                throw new RuntimeException(e);  
            }  
            return "images/" + suggestedName;  
        });
        
        converter.processDocument(document);  
        
        Transformer transformer = TransformerFactory.newInstance()  
            .newTransformer();  
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");  
        transformer.setOutputProperty(OutputKeys.METHOD, "html");  
        transformer.transform(new DOMSource(converter.getDocument()), new StreamResult(out));  
    }  
}

This flow is different internally, but the overall idea is the same: load the Word document, convert it, and write HTML to storage.​ Additionally, for the doc format, we have to specify the encoding of the document explicitly.

Additionally, the image conversion part takes a little more configuration.

6. Conclusion

In this tutorial, we learned how to convert Word documents to HTML using Apache POI.

We covered modern docx files with XWPFDocument and XHTMLConverter, then looked at legacy doc files with WordToHtmlConverter.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.
Baeldung Pro – NPI EA (cat = Baeldung)
announcement - icon

Baeldung Pro comes with both absolutely No-Ads as well as finally with Dark Mode, for a clean learning experience:

>> Explore a clean Baeldung

Once the early-adopter seats are all used, the price will go up and stay at $33/year.

eBook – HTTP Client – NPI EA (cat=HTTP Client-Side)
announcement - icon

The Apache HTTP Client is a very robust library, suitable for both simple and advanced use cases when testing HTTP endpoints. Check out our guide covering basic request and response handling, as well as security, cookies, timeouts, and more:

>> Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

Course – LS – NPI EA (cat=REST)

announcement - icon

Get started with Spring Boot and with core Spring, through the Learn Spring course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (tag=Refactoring)
announcement - icon

Modern Java teams move fast — but codebases don’t always keep up. Frameworks change, dependencies drift, and tech debt builds until it starts to drag on delivery. OpenRewrite was built to fix that: an open-source refactoring engine that automates repetitive code changes while keeping developer intent intact.

The monthly training series, led by the creators and maintainers of OpenRewrite at Moderne, walks through real-world migrations and modernization patterns. Whether you’re new to recipes or ready to write your own, you’ll learn practical ways to refactor safely and at scale.

If you’ve ever wished refactoring felt as natural — and as fast — as writing code, this is a good place to start.

eBook Jackson – NPI EA – 3 (cat = Jackson)