Serialization and deserialization are two important concepts in programming that allow objects to be easily stored, transmitted, and reconstructed. They’re used in various scenarios, such as storing objects in a database, sending objects over a network, or caching objects in memory.
In this tutorial, we’ll discuss both of those concepts.
An object has three primary characteristics: identity, state, and behavior. The state represents the value or data of the object.
Serialization is the process of converting an object’s state to a byte stream. This byte stream can then be saved to a file, sent over a network, or stored in a database. The byte stream represents the object’s state, which can later be reconstructed to create a new copy of the object.
Serialization allows us to save the data associated with an object and recreate the object in a new location.
Here’s a depiction of the serialization process:
To serialize an object, the programmer must first decide on a format and then use the appropriate tools to convert the object to that format.
2.1. Serialization Formats
Many different formats can be used for serialization, such as JSON, XML, and binary. JSON and XML are popular formats for serialization because they are human-readable and can be easily parsed by other systems. Binary formats are often used for performance reasons, as they’re typically faster to read and write than text-based formats.
Deserialization is the reverse process of serialization. It involves taking a byte stream and converting it back into an object. This is done using the appropriate tools to parse the byte stream and create a new object. In Java, the ObjectInputStream class can be used to deserialize a binary format, and the Jackson library can be used to parse a JSON format.
Here’s a view of how deserialization looks like:
4. Storage and Transmission
Serialization and deserialization are important in programming because they allow objects to be easily stored and transmitted between different systems. This is especially useful in distributed systems where objects must be transmitted between different machines or in web applications where objects must be sent between a web server and a web browser.
For example, a web application allows users to create and save documents. When a user saves a document, the application needs to store the document’s state in a database.
To do this, the application must first serialize the document object to a byte stream and then store the byte stream in the database. Later, when the user wants to open the document, the application must retrieve the byte stream from the database. It first deserializes the byte stream back into a document object and then displays the document to the user.
Serialization and deserialization can be computationally expensive, especially for large or complex objects. Converting objects to bytes and back can take a significant amount of time and resources, which can impact the system’s performance.
5.2. Platform and Language Dependencies
Serialization and deserialization can be platform and language-dependent. Different programming languages and platforms may have their own implementations of serialization and deserialization, which can lead to compatibility issues when transmitting data between different systems.
If the format of the serialized data changes over time, it can lead to versioning issues. For example, suppose a new version of an application is released with a different data format. In that case, older versions may not be able to deserialize the data correctly, leading to errors or data loss.
5.4. Unserializable Objects
It’s also worth noting that not all objects can be serialized. Some objects may contain resources that cannot be serialized, like file handles or network sockets.
It’s important to be aware of the security risks associated with serialization and deserialization, as maliciously crafted byte streams can be used to exploit vulnerabilities in a program.
To mitigate this risk, it’s best practice to use a well-vetted library for serialization and deserialization and only to deserialize data from trusted sources.
5.6. Limited human-readability
Serialized data is usually not human-readable, making it difficult to troubleshoot or debug issues arising during the transmission or storage of data.
We’ve looked at serialization and deserialization and how they work. We saw that they are useful in distributed systems, web applications, caching, and in storing objects in databases or other data stores.
However, it’s important to know the security risks of serialization and deserialization. Remember also to use well-vetted libraries and only deserialize data from trusted sources.