
Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: August 4, 2025
Change Data Capture (CDC) enables real-time data synchronization across distributed systems by capturing database changes as they occur. Maxwell is one such open-source tool that captures data changes from MySQL databases.
In this tutorial, we’ll learn about the concept of change data capture. Then, we’ll demonstrate a working example of capturing data and schema changes from a MySQL database using Maxwell.
Many databases maintain sequential logs to track every modification made to the data and schemas before actually applying the changes. These logs serve as the source of truth for every mutation done to the state of the database. Ultimately, the logs ensure that the system can restart reliably in the event of system failures. For example, PostgreSQL implements the transaction logging mechanism through its write-ahead log (WAL). Similarly, MySQL has a related mechanism, known as the binary logs (binlog).
Change Data Capture (CDC) is a design pattern that leverages the same transaction log in the database, providing real-time data change capture. Traditionally, to obtain the delta, we set up a periodic job to query the table and compare the latest state against what we have since the last query. This often introduces significant latency and load to the source database. CDC solves this problem by emitting the delta to interested systems without the need for polling and querying.
Maxwell is a CDC tool purpose-built for MySQL databases. On a high level, Maxwell reads and parses the MySQL database’s binlog for any changes to the database tables. Then, it transforms the delta into a JSON message and subsequently sends it over to any configured destinations. These destinations are also known as producers in Maxwell terminology. They can include platforms like Apache Kafka, RabbitMQ, or simple key-value datastores like Redis.
Before we proceed, it’s important to understand the content of the MySQL binlog. Essentially, the MySQL binlog records all data modifications and schema changes that happen on the database. These logs capture both DML operations (INSERT, UPDATE, DELETE) and DDL statements (CREATE TABLE, ALTER TABLE, DROP TABLE).
Importantly, the log stores the changes in a binary format that requires a purpose-built parser to read. Maxwell abstracts this away by reading and parsing the binary-formatted logs, and then translates them into a much more widely understood JSON format.
Maxwell maintains schema metadata in the maxwell database, tracking table structures and column definitions to correctly interpret binlog events. When DDL statements occur, Maxwell updates its schema representation and continues processing data changes with the new structure. For DML, Maxwell translates the binary log and publishes the changes to any connected producers.
For example, when adding a new column to a table, Maxwell processes the ALTER TABLE statement first, then correctly interprets subsequent operations that include the new column.
Let’s see an example of the JSON output Maxwell generates for a new record inserted into the products table within the ecommerce database:
{
"database": "ecommerce",
"table": "products",
"type": "insert",
"ts": 1691234567,
"data": {
"id": 1001,
"name": "Wireless Headphones",
"price": 99.99,
"category": "Electronics"
}
}
The type records the modification type with several possible values:
Besides that, the ts field records the timestamp of the change in epoch seconds.
For the update operation, Maxwell produces a JSON message that includes the previous value of the field that got changed:
{
"database": "ecommerce",
"table": "products",
"type": "update",
"ts": 1691234890,
"data": {
"id": 1001,
"name": "Wireless Headphones",
"price": 89.99,
"category": "Electronics"
},
"old": {
"price": 99.99
}
}
The example above shows the JSON message produced by Maxwell when updating the price of the record id=1001 to 89.99. Notably, we see the previous value under the old object, which is separated from the data object that carries the latest state.
Now that we understand the idea and functionality of Maxwell, we can go through a real-life example by setting up a MySQL database that acts as the source. Then, we install and run Maxwell to capture data changes from MySQL’s binlog. Later, we set up a Kafka instance as the producer to observe the change. Finally, we trigger some changes in the MySQL database and observe the data produced by the Kafka producer.
To begin with, we need to create a custom configuration file to set the binlog configuration for the MySQL container to support the CDC use case. Concretely, we create a custom mysql.cnf to enable row-based binary logging:
$ cat mysql.cnf
[mysqld]
server-id = 1
log-bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
Importantly, we must set the binlog_format to ROW, and binlog_row_image to FULL for Maxwell to capture changes.
Then, we start the MySQL container using the docker run command:
$ docker run -d \
--name mysql-cdc \
--network host \
-e MYSQL_ROOT_PASSWORD=password \
-e MYSQL_DATABASE=ecommerce \
-v $(pwd)/mysql.cnf:/etc/mysql/conf.d/mysql.cnf \
mysql:8.0
After the container is healthy and running, we populate the ecommerce database with a products table:
$ docker exec -i mysql-cdc mysql -u root -ppassword <<EOF
USE ecommerce;
CREATE TABLE products (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
price DECIMAL(10,2) NOT NULL,
category VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
EOF
In the command above, we use the docker exec command to execute the mysql CLI in the Docker container. After that, we leverage the mysql CLI within the Docker container to send the CREATE TABLE DDL to the database, resulting in the creation of the products table in the ecommerce database.
Before setting up Maxwell, we need to create a configuration file. The configuration file contains the connection string and credentials to the source MySQL database.
Additionally, we configure the producers in the same configuration file:
$ cat config.properties
# MySQL connection
host=localhost
port=3306
user=root
password=password
producer=kafka
# Kafka configuration
kafka.bootstrap.servers=localhost:9092
kafka_topic=maxwell
kafka_partition_by=database
kafka_key_format=hash
# Output configuration
output_format=json
include_schema_id=true
include_commit_transaction=true
For brevity and convenience, we use the root user to connect to the MySQL database. However, as a good security measure, we should always create a separate database user to be used solely by Maxwell.
Notably, we set the partitioning strategy for the database through the kafka_partition_by configuration key. This strategy routes the change events to different partitions using the source database as the key.
Then, we start the Maxwell Docker container using the docker run command:
$ docker run -it -d --rm \
--name maxwell \
--network host \
-v $(pwd)/config.properties:/app/config.properties \
zendesk/maxwell:v1.40.2 \
bin/maxwell --config=/app/config.properties
At this point, we have both a MySQL and a Maxwell container running.
Lastly, we want to set up a Kafka node to serve as the producer.
We start a Kafka Docker container to run in KRaft mode, so we don’t have to set up a Zookeeper instance:
$ docker run -d \
--name=kafka \
--network=host \
-e KAFKA_NODE_ID=1 \
-e CLUSTER_ID=uXke6Yw_Q6u4T0S6b1zVzw \
-e KAFKA_PROCESS_ROLES=broker,controller \
-e KAFKA_CONTROLLER_QUORUM_VOTERS=1@localhost:9093 \
-e KAFKA_LISTENERS=PLAINTEXT://localhost:9092,CONTROLLER://localhost:9093 \
-e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:9092 \
-e KAFKA_CONTROLLER_LISTENER_NAMES=CONTROLLER \
-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
-e KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT \
-e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT \
confluentinc/cp-server:8.0.0
Once Kafka is up and running, we’re ready to test out the CDC.
We can now send some DML statements to the MySQL database and observe the output produced in Kafka.
For instance, let’s create records in the products table using the INSERT statement:
$ docker exec -it mysql-cdc mysql -u root -ppassword -e "USE ecommerce; INSERT INTO products
(name, price, category) VALUES ('Wireless Mouse', 29.99, 'Electronics');"
The command above uses docker exec to execute the mysql CLI in the MySQL container to run the INSERT statement. This results in a record being inserted into the products table.
Based on what we learned, this insertion should be captured by Maxwell and then published to the maxwell topic in the Kafka node.
We can verify the existence of the change event in Kafka by running the kafka-console-consumer CLI in the container:
$ docker exec -it kafka kafka-console-consumer --bootstrap-server localhost:9092 --topic maxwell --from-beginning
{"database":"ecommerce","table":"products","type":"insert","ts":1754123961,"xid":322,"commit":true,"data":{"id":1,"name":"Wireless Mouse","price":29.99,"category":"Electronics","created_at":"2025-08-02 08:39:21","updated_at":"2025-08-02 08:39:21"}}
As expected, there’s an insert event in the maxwell Kafka topic.
In this article, we learned that Maxwell is an open-source software for CDC on MySQL databases. Specifically, we saw that Maxwell leverages the MySQL binary logging mechanism to capture changes with minimal performance impact on the source database.
Subsequently, we demonstrated an actual example of streaming delta from MySQL to Kafka utilizing Maxwell. Furthermore, we set up the three components as Docker containers, and then triggered data changes on the MySQL database. Thus, we demonstrated that Maxwell can capture the delta, translate to the JSON message, and publish to the Kafka topic.