Yes, we're now running our Black Friday Sale. All Access and Pro are 33% off until 2nd December, 2025:
How to Serialize and Deserialize Dates in Avro
Last updated: April 15, 2025
1. Introduction
In this tutorial, we’ll explore different approaches to serializing and deserializing Date objects in Java, using Apache Avro. This framework is a data serialization system that provides a compact, fast, binary data format along with schema-based data definition.
When working with dates in Avro, we face challenges because Avro doesn’t natively support the Java Date class in its type structure. Now, let’s look at the challenge with Date serialization in more detail.
2. The Challenge With Date Serialization
Before we get started, let’s add the Avro dependency to our Maven project:
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.12.0</version>
</dependency>
Avro’s type system consists of primitive types: null, boolean, int, long, float, double, bytes, and string. In addition, the supported complex types are: record, enum, array, map, union, and fixed.
Now, let’s look at an example of why date serialization can be problematic in Avro:
public class DateContainer {
private Date date;
// Constructors, getters, and setters
}
When we try to directly serialize this class using Avro’s reflection-based serialization, the default behavior converts internally the Date object to a long value (millisecond since epoch).
Unfortunately, this process can lead to precision issues. For example, the deserialized value could be off by a few milliseconds from the original counterpart.
3. Implementing Date Serialization
Next, we’ll implement Date serialization and deserialization using two methods: using logical types with GenericRecord and using Avro’s Conversion API.
3.1. Using Logical Types With GenericRecord
Since Avro 1.8, the framework provides logical types. These add the necessary and appropriate meaning to the underlying primitive types.
As such, for dates we have three logical types:
- date: represents a date without time. It’s stored as an int (days since epoch).
- timestamp-millis: represents a timestamp with millisecond precision, stored as a long
- timestamp-micros: represents a timestamp with microsecond precision, stored as a long
Now, let’s see how to use these logical types in an Avro schema:
public static Schema createDateSchema() {
String schemaJson =
"{"
+ "\"type\": \"record\","
+ "\"name\": \"DateRecord\","
+ "\"fields\": ["
+ " {\"name\": \"date\", \"type\": {\"type\": \"int\", \"logicalType\": \"date\"}},"
+ " {\"name\": \"timestamp\", \"type\": {\"type\": \"long\", \"logicalType\": \"timestamp-millis\"}}"
+ "]"
+ "}";
return new Schema.Parser().parse(schemaJson);
}
Notably, we’ve applied the logical type to the underlying primitive type, not directly to the field.
Now, let’s look at how we can implement Date serialization using logical types:
public static byte[] serializeDateWithLogicalType(LocalDate date, Instant timestamp) {
Schema schema = createDateSchema();
GenericRecord record = new GenericData.Record(schema);
record.put("date", (int) date.toEpochDay());
record.put("timestamp", timestamp.toEpochMilli());
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(schema);
Encoder encoder = EncoderFactory.get().binaryEncoder(baos, null);
datumWriter.write(record, encoder);
encoder.flush();
return baos.toByteArray();
}
Let’s go over the above logic. We convert the LocalDate to days since epoch and the timestamp to milliseconds since epoch. This way, we’re able to use the logical types.
Now, let’s implement the method that handles the deserialization:
public static Pair<LocalDate, Instant> deserializeDateWithLogicalType(byte[] bytes) {
Schema schema = createDateSchema();
DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(schema);
Decoder decoder = DecoderFactory.get().binaryDecoder(bytes, null);
GenericRecord record = datumReader.read(null, decoder);
LocalDate date = LocalDate.ofEpochDay((int) record.get("date"));
Instant timestamp = Instant.ofEpochMilli((long) record.get("timestamp"));
return Pair.of(date, timestamp);
}
Finally, let’s test our implementation:
@Test
void whenSerializingDateWithLogicalType_thenDeserializesCorrectly() {
LocalDate expectedDate = LocalDate.now();
Instant expectedTimestamp = Instant.now();
byte[] serialized = serializeDateWithLogicalType(expectedDate, expectedTimestamp);
Pair<LocalDate, Instant> deserialized = deserializeDateWithLogicalType(serialized);
assertEquals(expectedDate, deserialized.getLeft());
assertEquals(expectedTimestamp.toEpochMilli(), deserialized.getRight().toEpochMilli(),
"Timestamps should match exactly at millisecond precision");
}
As we can see from the test, the timestamp-millis logical type maintains precision, and the timestamps match as expected. Furthermore, using logical types makes our data format explicit in the schema definition, which is valuable for schema development and documentation.
3.2. Using Avro’s Conversion API
Avro provides a conversion API that can handle logical types automatically. This API isn’t a separate approach. In fact, it’s built on top of logical types and helps speed up the conversion process.
As such, it saves us from manually converting between Java types and Avro’s internal representation. Furthermore, it adds type safety to the conversion process.
Now, let’s implement the solution that handles logical types automatically:
public static byte[] serializeWithConversionApi(LocalDate date, Instant timestamp) {
Schema schema = createDateSchema();
GenericRecord record = new GenericData.Record(schema);
Conversion<LocalDate> dateConversion = new org.apache.avro.data.TimeConversions.DateConversion();
LogicalTypes.date().addToSchema(schema.getField("date").schema());
Conversion<Instant> timestampConversion =
new org.apache.avro.data.TimeConversions.TimestampMillisConversion();
LogicalTypes.timestampMillis().addToSchema(schema.getField("timestamp").schema());
record.put("date", dateConversion.toInt(date,
schema.getField("date").schema(),
LogicalTypes.date()));
record.put("timestamp",
timestampConversion.toLong(timestamp, schema.getField("timestamp").schema(),
LogicalTypes.timestampMillis()));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(schema);
Encoder encoder = EncoderFactory.get().binaryEncoder(baos, null);
datumWriter.write(record, encoder);
encoder.flush();
return baos.toByteArray();
}
Differently from the previous approach, this time we use LogicalTypes.date() and LogicalTypes.timestampMillis() for conversion.
Next, let’s implement the method that handles the deserialization:
public static Pair<LocalDate, Instant> deserializeWithConversionApi(byte[] bytes) {
Schema schema = createDateSchema();
DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(schema);
Decoder decoder = DecoderFactory.get().binaryDecoder(bytes, null);
GenericRecord record = datumReader.read(null, decoder);
Conversion<LocalDate> dateConversion = new DateConversion();
LogicalTypes.date().addToSchema(schema.getField("date").schema());
Conversion<Instant> timestampConversion = new TimestampMillisConversion();
LogicalTypes.timestampMillis().addToSchema(schema.getField("timestamp").schema());
int daysSinceEpoch = (int) record.get("date");
long millisSinceEpoch = (long) record.get("timestamp");
LocalDate date = dateConversion.fromInt(
daysSinceEpoch,
schema.getField("date").schema(),
LogicalTypes.date()
);
Instant timestamp = timestampConversion.fromLong(
millisSinceEpoch,
schema.getField("timestamp").schema(),
LogicalTypes.timestampMillis()
);
return Pair.of(date, timestamp);
}
Finally, let’s verify the implementation:
@Test
void whenSerializingWithConversionApi_thenDeserializesCorrectly() {
LocalDate expectedDate = LocalDate.now();
Instant expectedTimestamp = Instant.now();
byte[] serialized = serializeWithConversionApi(expectedDate, expectedTimestamp);
Pair<LocalDate, Instant> deserialized = deserializeWithConversionApi(serialized);
assertEquals(expectedDate, deserialized.getLeft());
assertEquals(expectedTimestamp.toEpochMilli(), deserialized.getRight().toEpochMilli(),
"Timestamps should match at millisecond precision");
}
4. Handling Legacy Code That Uses Date
Currently, many existing Java applications still use the legacy java.util.Date class. For such codebases, we’ll need a strategy to handle these objects when serializing with Avro.
A good approach is to convert legacy dates to the modern Java time API before we serialize the information:
public static byte[] serializeLegacyDateAsModern(Date legacyDate) {
Instant instant = legacyDate.toInstant();
LocalDate localDate = instant.atZone(ZoneId.systemDefault()).toLocalDate();
return serializeDateWithLogicalType(localDate, instant);
}
Then, we can serialize the date using one of the previous methods. This approach allows us to take advantage of Avro’s logical types while still working with legacy Date objects.
Let’s also test our implementation:
@Test
void whenSerializingLegacyDate_thenConvertsCorrectly() {
Date legacyDate = new Date();
LocalDate expectedLocalDate = legacyDate.toInstant()
.atZone(ZoneId.systemDefault())
.toLocalDate();
byte[] serialized = serializeLegacyDateAsModern(legacyDate);
LocalDate deserialized = deserializeDateWithLogicalType(serialized).getKey();
assertEquals(expectedLocalDate, deserialized);
}
5. Conclusion
In this article, we’ve explored different ways to serialize Date objects using Avro. We’ve learned how to use Avro’s logical types to properly represent date and timestamp values.
For most modern applications, using Avro’s Conversion API to handle its logical types with java.time classes provide the best approach. Through this combination, we obtain type safety, maintain proper semantics, and compatibility with Avro’s schema expansion capabilities.
The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.















