Guide to Simple Binary Encoding

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

1. Introduction

Efficiency and performance are two important aspects of modern data services, especially when we stream high amounts of data. Certainly, reducing the message size with a performant encoding is the key to achieving it.

However, in-house encoding/decoding algorithms could be cumbersome and fragile, which makes them hard to maintain in the long run.

Luckily, Simple Binary Encoding can help us implement and maintain a tailor-cut encoding/decoding system in a practical way.

In this tutorial, we’ll discuss what Simple Binary Encoding (SBE) is for and how to use it alongside code samples.

2. What Is SBE?

SBE is a binary representation for encoding/decoding messages to support low-latency streaming. It’s also the reference implementation of the FIX SBE standard, which is a standard for the encoding of financial data.

2.1. The Message Structure

In order to preserve streaming semantics, a message must be capable of being read or written sequentially, with no backtrack. This eliminates extra operations — like dereferencing, handling location pointers, managing additional states, etc. – and utilizes hardware support better to keep maximum performance and efficiency.

Let’s have a peek at how the message is structured in SBE:

Header: It contains mandatory fields like the version of the message. It can also contain more fields when necessary.
Root Fields: Static fields of the message. Their block size is predefined and cannot be changed. They can also be defined as optional.
Repeating Groups: These represent collection-type presentations. Groups can contain fields and also inner groups to be able to represent more complex structures.
Variable Data Fields: These are fields for which we can’t determine their sizes ahead. String and Blob data types are two examples. They’ll be at the end of the message.

Next, we’ll see why this message structure’s important.

2.2. When Is SBE (Not) Useful?

The power of SBE originates from its message structure. It’s optimized for sequential access to data. Hence, SBE is well suited for fixed-size data like numbers, bitsets, enums, and arrays.

A common use case for SBE is financial data streaming — mostly containing numbers and enums — which SBE is specifically designed for.

On the other hand, SBE isn’t well suited for variable-length data types like string and blob. The reason for that is we most likely don’t know the exact data size ahead. Accordingly, this will end up with additional calculations at the streaming time to detect the boundaries of data in a message. Not surprisingly, this can bite our business if we’re talking about milliseconds latency.

Although SBE still supports String and Blob data types, they’re always placed at the end of the message to keep the impact of variable length calculations at a minimum.

3. Setting Up the Library

To use the SBE library, let’s add the following Maven dependency to our pom.xml file:

<dependency>
    <groupId>uk.co.real-logic</groupId>
    <artifactId>sbe-all</artifactId>
    <version>1.27.0</version>
</dependency>

4. Generating Java Stubs

Before we generate our Java stubs, clearly, we need to form our message schema. SBE provides the ability to define our schemas via XML.

Next, we’ll see how to define a schema for our message, which transfers sample market trade data.

4.1. Creating the Message Schema

Our schema will be an XML file based on a special XSD of FIX protocol. It will define our message format.

So, let’s create our schema file:

<?xml version="1.0" encoding="UTF-8"?>
<sbe:messageSchema xmlns:sbe="http://fixprotocol.io/2016/sbe"
  package="com.baeldung.sbe.stub" id="1" version="0" semanticVersion="5.2"
  description="A schema represents stock market data.">
    <types>
        <composite name="messageHeader" 
          description="Message identifiers and length of message root.">
            <type name="blockLength" primitiveType="uint16"/>
            <type name="templateId" primitiveType="uint16"/>
            <type name="schemaId" primitiveType="uint16"/>
            <type name="version" primitiveType="uint16"/>
        </composite>
        <enum name="Market" encodingType="uint8">
            <validValue name="NYSE" description="New York Stock Exchange">0</validValue>
            <validValue name="NASDAQ" 
              description="National Association of Securities Dealers Automated Quotations">1</validValue>
        </enum>
        <type name="Symbol" primitiveType="char" length="4" characterEncoding="ASCII" 
          description="Stock symbol"/>
        <composite name="Decimal">
            <type name="mantissa" primitiveType="uint64" minValue="0"/>
            <type name="exponent" primitiveType="int8"/>
        </composite>
        <enum name="Currency" encodingType="uint8">
            <validValue name="USD" description="US Dollar">0</validValue>
            <validValue name="EUR" description="Euro">1</validValue>
        </enum>
        <composite name="Quote" 
          description="A quote represents the price of a stock in a market">
            <ref name="market" type="Market"/>
            <ref name="symbol" type="Symbol"/>
            <ref name="price" type="Decimal"/>
            <ref name="currency" type="Currency"/>
        </composite>
    </types>
    <sbe:message name="TradeData" id="1" description="Represents a quote and amount of trade">
        <field name="quote" id="1" type="Quote"/>
        <field name="amount" id="2" type="uint16"/>
    </sbe:message>
</sbe:messageSchema>

If we look at the schema in detail, we’ll notice that it has two main parts, <types> and <sbe:message>. We’ll start defining <types> first.

As our first type, we create the messageHeader. It’s mandatory and also has four mandatory fields:

<composite name="messageHeader" description="Message identifiers and length of message root.">
    <type name="blockLength" primitiveType="uint16"/>
    <type name="templateId" primitiveType="uint16"/>
    <type name="schemaId" primitiveType="uint16"/>
    <type name="version" primitiveType="uint16"/>
</composite>

blockLength: represents total space reserved for the root fields in a message. It doesn’t count repeated fields or variable-length fields, like string and blob.
templateId: an identifier for the message template.
schemaId: an identifier for the message schema. A schema always contains a template.
version: the version of the message schema when we define the message.

Next, we define an enumeration, Market:

<enum name="Market" encodingType="uint8">
    <validValue name="NYSE" description="New York Stock Exchange">0</validValue>
    <validValue name="NASDAQ" 
      description="National Association of Securities Dealers Automated Quotations">1</validValue>
</enum>

We aim to hold some well-known exchange names, which we can hard-code in the schema file. They don’t change or increase often. Therefore, type <enum> is a good fit here.

By setting encodingType=”uint8″, we reserve 8 bits of space for storing the market name in a single message. This allows us to support 2^8 = 256 different markets (0 to 255) — the size of an unsigned 8-bit integer.

Right after, we define another type, Symbol. This will be a 3 or 4-character string that identifies a financial instrument like AAPL (Apple), MSFT (Microsoft), etc.:

<type name="Symbol" primitiveType="char" length="4" characterEncoding="ASCII" description="Instrument symbol"/>

As we see, we limit the characters with characterEncoding=”ASCII” – 7 bits, 128 characters maximum – and we set a cap with length=”4″ to not allow more than 4 characters. Thus, we can reduce the size as much as possible.

After that, we need a composite type for price data. So, we create the type Decimal:

<composite name="Decimal">
    <type name="mantissa" primitiveType="uint64" minValue="0"/>
    <type name="exponent" primitiveType="int8"/>
</composite>

Decimal is composed of two types:

mantissa: the significant digits of a decimal number
exponent: the scale of a decimal number

For example, the values mantissa=98765 and exponent=-3 represent the number 98.765.

Next, very similar to Market, we create another <enum> to represent Currency whose values are mapped as uint8:

<enum name="Currency" encodingType="uint8">
    <validValue name="USD" description="US Dollar">0</validValue>
    <validValue name="EUR" description="Euro">1</validValue>
</enum>

Lastly, we define Quote via composing the other types we created before:

<composite name="Quote" description="A quote represents the price of an instrument in a market">
    <ref name="market" type="Market"/>
    <ref name="symbol" type="Symbol"/>
    <ref name="price" type="Decimal"/>
    <ref name="currency" type="Currency"/>
</composite>

Finally, we completed the type definitions.

However, we still need to define a message. So, let’s define our message, TradeData:

<sbe:message name="TradeData" id="1" description="Represents a quote and amount of trade">
    <field name="quote" id="1" type="Quote"/>
    <field name="amount" id="2" type="uint16"/>
</sbe:message>

Certainly, in terms of types, there are more details we can find from the specification.

In the next two sections, we’ll discuss how to use our schema to generate the Java code that we eventually use to encode/decode our messages.

4.2. Using SbeTool

A straightforward way to generate Java stubs is using the SBE jar file. This runs the utility class SbeTool automatically:

java -jar -Dsbe.output.dir=target/generated-sources/java 
  <local-maven-directory>/repository/uk/co/real-logic/sbe-all/1.26.0/sbe-all-1.26.0.jar 
  src/main/resources/schema.xml

We should pay attention that we must adjust the placeholder <local-maven-directory> with our local Maven path to run the command.

After successful generation, we’ll see the generated Java code in the folder target/generated-sources/java.

4.3. Use SbeTool With Maven

Using SbeTool is easy enough, but we can even make it more practical by integrating it into Maven.

So, let’s add the following Maven plugins to our pom.xml:

<build>
    <plugins>
        <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>exec-maven-plugin</artifactId>
            <version>3.1.0</version>
            <executions>
                <execution>
                    <phase>generate-sources</phase>
                    <goals>
                        <goal>java</goal>
                    </goals>
                </execution>
            </executions>
            <configuration>
                <includeProjectDependencies>false</includeProjectDependencies>
                <includePluginDependencies>true</includePluginDependencies>
                <mainClass>uk.co.real_logic.sbe.SbeTool</mainClass>
                <systemProperties>
                    <systemProperty>
                        <key>sbe.output.dir</key>
                        <value>${project.build.directory}/generated-sources/java</value>
                    </systemProperty>
                </systemProperties>
                <arguments>
                    <argument>${project.basedir}/src/main/resources/schema.xml</argument>
                </arguments>
                <workingDirectory>${project.build.directory}/generated-sources/java</workingDirectory>
            </configuration>
            <dependencies>
                <dependency>
                    <groupId>uk.co.real-logic</groupId>
                    <artifactId>sbe-tool</artifactId>
                    <version>1.27.0</version>
                </dependency>
            </dependencies>
        </plugin>
        <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>build-helper-maven-plugin</artifactId>
            <version>3.0.0</version>
            <executions>
                <execution>
                    <id>add-source</id>
                    <phase>generate-sources</phase>
                    <goals>
                        <goal>add-source</goal>
                    </goals>
                    <configuration>
                        <sources>
                            <source>${project.build.directory}/generated-sources/java/</source>
                        </sources>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

As a result, a typical Maven clean install command generates our Java stubs automatically.

Additionally, we can always have a look at the SBE’s Maven documentation for more configuration options.

5. Basic Messaging

As we have our Java stubs ready, let’s have a look at how we use them.

First of all, we need some data for testing. Thus, we create a class, MarketData:

public class MarketData {

    private int amount;
    private double price;
    private Market market;
    private Currency currency;
    private String symbol;

    // Constructor, getters and setters
}

We should notice that our MarketData composes the Market and Currency classes that SBE generated for us.

Next, let’s define a MarketData object to use in our unit test later on:

private MarketData marketData;

@BeforeEach
public void setup() {
    marketData = new MarketData(2, 128.99, Market.NYSE, Currency.USD, "IBM");
}

Since we have a MarketData ready, we’ll see how to write and read it into our TradeData in the next sections.

5.1. Writing a Message

Mostly, we’d like to write our data into a ByteBuffer, so we create a ByteBuffer with an initial capacity alongside our generated encoders, MessageHeaderEncoder, and TradeDataEncoder:

@Test
public void givenMarketData_whenEncode_thenDecodedValuesMatch() {
    // our buffer to write encoded data, initial cap. 128 bytes
    UnsafeBuffer buffer = new UnsafeBuffer(ByteBuffer.allocate(128));
    MessageHeaderEncoder headerEncoder = new MessageHeaderEncoder();
    TradeDataEncoder dataEncoder = new TradeDataEncoder();
    
    // we'll write the rest of the code here
}

Before writing the data, we need to parse our price data into two parts, mantissa and exponent:

BigDecimal priceDecimal = BigDecimal.valueOf(marketData.getPrice());
int priceMantissa = priceDecimal.scaleByPowerOfTen(priceDecimal.scale()).intValue();
int priceExponent = priceDecimal.scale() * -1;

We should notice that we used BigDecimal for this conversion. It’s always a good practice to use BigDecimal when dealing with monetary values because we don’t want to lose precision.

Finally, let’s encode and write our TradeData:

TradeDataEncoder encoder = dataEncoder.wrapAndApplyHeader(buffer, 0, headerEncoder);
encoder.amount(marketData.getAmount());
encoder.quote()
  .market(marketData.getMarket())
  .currency(marketData.getCurrency())
  .symbol(marketData.getSymbol())
  .price()
    .mantissa(priceMantissa)
    .exponent((byte) priceExponent);

5.2. Reading a Message

To read a message, we’ll use the same buffer instance in which we wrote data. However, we need decoders, MessageHeaderDecoder and TradeDataDecoder, this time:

MessageHeaderDecoder headerDecoder = new MessageHeaderDecoder();
TradeDataDecoder dataDecoder = new TradeDataDecoder();

Next, we decode our TradeData:

dataDecoder.wrapAndApplyHeader(buffer, 0, headerDecoder);

Similarly, we need to decode our price data from two parts, mantissa, and exponent, in order to get the price data into a double value. Surely, we make use of BigDecimal again:

double price = BigDecimal.valueOf(dataDecoder.quote().price().mantissa())
  .scaleByPowerOfTen(dataDecoder.quote().price().exponent())
  .doubleValue();

Finally, let’s ensure our decoded values match the original ones:

Assertions.assertEquals(2, dataDecoder.amount());
Assertions.assertEquals("IBM", dataDecoder.quote().symbol());
Assertions.assertEquals(Market.NYSE, dataDecoder.quote().market());
Assertions.assertEquals(Currency.USD, dataDecoder.quote().currency());
Assertions.assertEquals(128.99, price);