Course – Black Friday 2025 – NPI EA (cat= Baeldung)
announcement - icon

Yes, we're now running our Black Friday Sale. All Access and Pro are 33% off until 2nd December, 2025:

>> EXPLORE ACCESS NOW

Partner – Orkes – NPI EA (cat=Spring)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Partner – Orkes – NPI EA (tag=Microservices)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

eBook – Guide Spring Cloud – NPI EA (cat=Spring Cloud)
announcement - icon

Let's get started with a Microservice Architecture with Spring Cloud:

>> Join Pro and download the eBook

eBook – Mockito – NPI EA (tag = Mockito)
announcement - icon

Mocking is an essential part of unit testing, and the Mockito library makes it easy to write clean and intuitive unit tests for your Java code.

Get started with mocking and improve your application tests using our Mockito guide:

Download the eBook

eBook – Reactive – NPI EA (cat=Reactive)
announcement - icon

Spring 5 added support for reactive programming with the Spring WebFlux module, which has been improved upon ever since. Get started with the Reactor project basics and reactive programming in Spring Boot:

>> Join Pro and download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Jackson – NPI EA (cat=Jackson)
announcement - icon

Do JSON right with Jackson

Download the E-book

eBook – HTTP Client – NPI EA (cat=Http Client-Side)
announcement - icon

Get the most out of the Apache HTTP Client

Download the E-book

eBook – Maven – NPI EA (cat = Maven)
announcement - icon

Get Started with Apache Maven:

Download the E-book

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

eBook – RwS – NPI EA (cat=Spring MVC)
announcement - icon

Building a REST API with Spring?

Download the E-book

Course – LS – NPI EA (cat=Jackson)
announcement - icon

Get started with Spring and Spring Boot, through the Learn Spring course:

>> LEARN SPRING
Course – RWSB – NPI EA (cat=REST)
announcement - icon

Explore Spring Boot 3 and Spring 6 in-depth through building a full REST API with the framework:

>> The New “REST With Spring Boot”

Course – LSS – NPI EA (cat=Spring Security)
announcement - icon

Yes, Spring Security can be complex, from the more advanced functionality within the Core to the deep OAuth support in the framework.

I built the security material as two full courses - Core and OAuth, to get practical with these more complex scenarios. We explore when and how to use each feature and code through it on the backing project.

You can explore the course here:

>> Learn Spring Security

Partner – Orkes – NPI EA (cat=Java)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Course – LSD – NPI EA (tag=Spring Data JPA)
announcement - icon

Spring Data JPA is a great way to handle the complexity of JPA with the powerful simplicity of Spring Boot.

Get started with Spring Data JPA through the guided reference course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (cat=Spring Boot)
announcement - icon

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

Course – Black Friday 2025 – NPI (cat=Baeldung)
announcement - icon

Yes, we're now running our Black Friday Sale. All Access and Pro are 33% off until 2nd December, 2025:

>> EXPLORE ACCESS NOW

1. Introduction

In this tutorial, we’ll explore how to extract the schema from an Apache Avro file in Java. Furthermore, we’ll cover how to read data from Avro files. This is a common requirement in big data processing systems.

Apache Avro is a data serialization framework that provides a compact, fast binary data format. As such, it’s popular in the big data ecosystem, particularly with Apache Hadoop. Therefore, understanding how to work with Avro files is crucial for tasks involving data processing.

2. Maven Dependencies

To get Avro up and running in Java, we need to add the Avro core library to our Maven project:

<dependency>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro</artifactId>
    <version>1.12.0</version>
</dependency>

For testing purposes, we’ll use JUnit Jupiter. If we’re using Spring Boot Starter Test dependency, we don’t have to add the JUnit one. This module automatically brings it. As a side note, this module also brings the Mockito framework.

For JUnit, let’s use the latest available version:

<dependency>
    <groupId>org.junit.jupiter</groupId>
    <artifactId>junit-jupiter-api</artifactId>
    <version>5.11.2</version>
    <scope>test</scope>
</dependency>

Whenever we start a new project, it’s good to make sure we’re using the latest stable versions of the respective dependencies.

3. Understanding and Extracting Avro Schema

Before we dive into the code for extracting schemas, let’s briefly recap the structure of an Avro file:

  • File header – contains metadata about the file, including the schema.
  • Data blocks – the actual serialized data.
  • File footer – contains additional metadata and synchronization markers.

The schema of an Avro file describes the structure of the data inside it. In addition, the data is stored in JSON format and includes information about fields, their names, and data types.

Now, let’s write a method to extract the schema from an Avro file:

public static Schema extractSchema(String avroFilePath) throws IOException {
    File avroFile = new File(avroFilePath);
    DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
    try (DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(avroFile, datumReader)) {
        return dataFileReader.getSchema();
    }
}

First, we create a File object representing the Avro file. Next, we instantiate a GenericDatumReader. Instantiating this class without specifying a schema allows it to read any Avro file.

Next, we create a DataFileReader using the Avro file and the GenericDatumReader as arguments.

We use the getSchema() method of DataFileReader to extract the schema. The DataFileReader is wrapped in a try-with-resources block to ensure proper resource management.

This approach allows us to extract the schema without needing to know its structure beforehand. This way, it’s a versatile option for working with various Avro files.

4. Reading Data from Avro File

Once we have obtained the schema, we can read the data from the Avro file.

Let’s write a reading method:

public static List<GenericRecord> readAvroData(String avroFilePath) throws IOException { 
    
    File avroFile = new File(avroFilePath);
    DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
    List<GenericRecord> records = new ArrayList<>();
    
    try (DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(avroFile, datumReader)) {
        GenericRecord record = null;
        while (dataFileReader.hasNext()) {
            record = dataFileReader.next(record);
            records.add(record);
        }
    }
    return records;
}

First, we create a File from the avroFilePath. Next, we create a GenericDatumReader object, which is used to read Avro data. By creating it without specifying a schema, it can read any Avro file without knowing the schema in advance.

Then, we create a DataFileReader which is the main tool we’ll use to extract information from the Avro file. Finally, we iterate through the file using the hasNext() and next() methods and add the records to the list.

In addition, it’s good to note that we’re reusing the GenericRecord object in the next() method call. This is an optimization that helps reduce object creation and garbage collection overhead.

5. Testing

To make sure our code works correctly, let’s write some unit tests. To start with our setup, let’s create a tempDir. Using the @TempDir annotation Junit automatically creates a temporary directory for use in tests.

As such, this is useful for creating temporary files during tests without worrying about cleanup. JUnit creates it before tests run and deletes it after:

@TempDir
Path tempDir;

private File avroFile;
private Schema schema;

Next, we’re going to set up some things before each test:

@BeforeEach
void setUp() throws IOException {
    schema = new Schema.Parser().parse("""
                                    {
                                        "type": "record",
                                        "name": "User",
                                        "fields": [
                                            {"name": "name", "type": "string"},
                                            {"name": "age", "type": "int"}
                                        ]
                                    }
                                    """);
    avroFile = tempDir.resolve("test.avro").toFile();

    GenericRecord user1 = new GenericData.Record(schema);
    user1.put("name", "John Doe");
    user1.put("age", 30);

    try (DataFileWriter<GenericRecord> dataFileWriter = 
      new DataFileWriter<>(new GenericDatumWriter<>(schema))) {
        dataFileWriter.create(schema, avroFile);
        dataFileWriter.append(user1);
    }
}

Finally, let’s test our functionality:

@Test
void whenSchemaIsExistent_thenItIsExtractedCorrectly() throws IOException {
    Schema extractedSchema = AvroSchemaExtractor.extractSchema(avroFile.getPath());

    assertEquals(schema, extractedSchema);
}
@Test
void whenAvroFileHasContent_thenItIsReadCorrectly() throws IOException {
    List<GenericRecord> records = AvroSchemaExtractor.readAvroData(avroFile.getPath());

    assertEquals("John Doe", records.get(0).get(0).toString());
}

These tests create a temporary Avro file with a sample schema and data. Then, they verify that our methods correctly extract the schema and read the data.

6. Conclusion

In this article, we’ve explored how to extract the schema from an Avro file and read its data using Java. In addition, we’ve demonstrated how to use GenericDatumReader and DataFileReader to handle Avro files without prior knowledge of the schema.

Furthermore, these techniques are crucial for working with Avro in various Java applications, such as data analytics or big data processing. By applying these methods we can manage Avro files in a flexible way.

Finally, we should remember to correctly handle exceptions and manage resources properly in our projects. This way, we’ll be able to work with serialized data in an efficient way, especially in Avro-centric ecosystems.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.
Course – Black Friday 2025 – NPI EA (cat= Baeldung)
announcement - icon

Yes, we're now running our Black Friday Sale. All Access and Pro are 33% off until 2nd December, 2025:

>> EXPLORE ACCESS NOW

Partner – Orkes – NPI EA (cat = Spring)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Partner – Orkes – NPI EA (tag = Microservices)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

eBook – HTTP Client – NPI EA (cat=HTTP Client-Side)
announcement - icon

The Apache HTTP Client is a very robust library, suitable for both simple and advanced use cases when testing HTTP endpoints. Check out our guide covering basic request and response handling, as well as security, cookies, timeouts, and more:

>> Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

Course – LS – NPI EA (cat=REST)

announcement - icon

Get started with Spring Boot and with core Spring, through the Learn Spring course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (tag=Refactoring)
announcement - icon

Modern Java teams move fast — but codebases don’t always keep up. Frameworks change, dependencies drift, and tech debt builds until it starts to drag on delivery. OpenRewrite was built to fix that: an open-source refactoring engine that automates repetitive code changes while keeping developer intent intact.

The monthly training series, led by the creators and maintainers of OpenRewrite at Moderne, walks through real-world migrations and modernization patterns. Whether you’re new to recipes or ready to write your own, you’ll learn practical ways to refactor safely and at scale.

If you’ve ever wished refactoring felt as natural — and as fast — as writing code, this is a good place to start.

Course – Black Friday 2025 – NPI (All)
announcement - icon

Yes, we're now running our Black Friday Sale. All Access and Pro are 33% off until 2nd December, 2025:

>> EXPLORE ACCESS NOW

eBook Jackson – NPI EA – 3 (cat = Jackson)