## 1. Overview

In this tutorial, we’re going to see how we can useÂ *BitSet*s to represent a vector of bits.

First, we’ll start with the rationale behind not using theÂ *boolean[]*. Then after getting familiar with theÂ *BitSetÂ *internals, we’ll take a closer look at its API.

## 2. Array of Bits

To store and manipulate arrays of bits, one might argue that we should useÂ *boolean[]Â *as our data structure. At first glance, that might seem a reasonable suggestion.

**However,Â eachÂ ***booleanÂ *member in aÂ *boolean[]Â *usually consumes one byte instead of just one bit. So when we have tight memory requirements, or we’re just aiming for a reduced memory footprint, *boolean[]Â *are far from being ideal.

To make matters more concrete, let’s see how much space aÂ *boolean[]Â *with 1024 elements consumes:

```
boolean[] bits = new boolean[1024];
System.out.println(ClassLayout.parseInstance(bits).toPrintable());
```

Ideally, we expect a 1024-bit memory footprint from this array. However, the Java Object Layout (JOL) reveals an entirely different reality:

```
[Z object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 7b 12 07 00 (01111011 00010010 00000111 00000000) (463483)
12 4 (object header) 00 04 00 00 (00000000 00000100 00000000 00000000) (1024)
16 1024 boolean [Z. N/A
Instance size: 1040 bytes
```

If we ignore the overhead of object header, the array elements are consuming 1024 bytes, instead of the expected 1024 bits. That’s 700% more memory than what we expected.

**The**Â **addressability issues and word tearing are the main reasons why ***boolean*sÂ are more than just one single bit.

To solve this problem, we can use a combination of numeric data types (such as *long*) and bit-wise operations. That’s where the *BitSetÂ *comes in.

## 3. HowÂ *BitSetÂ *Works

As we mentioned earlier, to achieve the one bit per flag memory usage, theÂ *BitSetÂ *API uses a combination of basic numeric data types and bit-wise operations.

For the sake of simplicity, let’s suppose we’re going to represent eight flags with one *byte*. At first, we initialize all bits of this single *byte* with zero:

Now if we want to set the bit at position three toÂ *true*, **we should first left-shift the number 1 by three:**

**And thenÂ ***orÂ *its result with the current *byte* value:

The same process will happen if decide to set the bit at the index seven:

As shown above, we perform a left-shift by seven bits and combine the result with the previous *byte* value usingÂ theÂ *orÂ *operator.

### 3.1. Getting a Bit Index

**To check if a particular bit index is set toÂ ***trueÂ *or not, we’ll use theÂ *andÂ *operator. For instance, here’s how we check if index three is set:

- Performing a left-shift by three bits on the value one
*AndingÂ *the result with the current *byte* value
- If the result is greater than zero, then we found a match, and that bit index is actually set. Otherwise, the requested index is clear or is equal toÂ
*false*

The above diagram shows the get operation steps for index three. If we inquire about a clear index, however, the result will be different:

Since theÂ *andÂ *result is equal to zero, index four is clear.

### 3.2. Growing the Storage

Currently, we can only store a vector of 8 bits. To go beyond this limitation, **we just have to use an array ofÂ ***byte*s, instead of a singleÂ *byte*, that’s it!

Now, every time we need to set, get, or clear a specific index, we should find the corresponding array element, first. For instance, let’s suppose we’re going to set index 14:

As shown in the above diagram, after finding the right array element, we did set the appropriate index.

Also, if we want to set an index beyond 15 here, theÂ *BitSetÂ *will expand its internal array, first. Only after expanding the array and copying the elements will it set the requested bit. This is somewhat similar to how *ArrayListÂ *works internally.

So far, we used theÂ *byteÂ *data typeÂ for the sake of simplicity. **TheÂ ***BitSetÂ *API, however, is usingÂ an array ofÂ *longÂ *values internally.

## 4. TheÂ *BitSetÂ *API

Now that we know enough about the theory, it’s time to see what theÂ *BitSetÂ *API looks like.

For starters, let’s compare the memory footprint of aÂ *BitSetÂ *instance with 1024 bits with theÂ *boolean[]Â *we saw earlier:

```
BitSet bitSet = new BitSet(1024);
System.out.println(GraphLayout.parseInstance(bitSet).toPrintable());
```

This will print both the shallow size of theÂ *BitSetÂ *instance and the size of its internal array:

```
java.util.BitSet@75412c2fd object externals:
ADDRESS SIZE TYPE PATH VALUE
70f97d208 24 java.util.BitSet (object)
70f97d220 144 [J .words [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
```

As shown above, it uses aÂ *long[]Â *with 16 elements (16 * 64 bits = 1024 bits) internally. Anyway, **this instance is using 168 bytes in total, while theÂ ***boolean[]Â *were using 1024 bytes.

The more bits we have, the more the footprint difference increases. For example, to store 1024 * 1024 bits, the *boolean[]Â *consumes 1 MB, and theÂ *BitSetÂ *instance consumes around 130 KB.

### 4.1. Constructing *BitSet*s

The simplest way to create aÂ *BitSetÂ *instance is to use the no-arg constructor:

`BitSet bitSet = new BitSet();`

**This will create aÂ ***BitSetÂ *instance with aÂ *long[]Â *of size one. Of course, it can automatically grow this array if needed.

It’s also possible to create aÂ *BitSetÂ *with an initial number of bits:

`BitSet bitSet = new BitSet(100_000);`

Here, the internal array will have enough elements to hold 100,000 bits. This constructor comes in handy when we already have a reasonable estimate on the number of bits to store. In such use cases,** it can prevent or decrease the unnecessary copying of array elements while growing it**.

It’s even possible to create aÂ *BitSetÂ *from an existingÂ *long[]*,Â *byte[]*,Â *LongBuffer*, andÂ *ByteBuffer*. For instance, here we’re creating aÂ *BitSetÂ *instance from a givenÂ *long[]*:

`BitSet bitSet = BitSet.valueOf(new long[] { 42, 12 });`

There are three more overloaded versions of theÂ *valueOf()Â *static factory method to support the other mentioned types.

### 4.2. Setting Bits

We can set the value of a particular index toÂ *trueÂ *using theÂ *set(index)Â *method:

```
BitSet bitSet = new BitSet();
bitSet.set(10);
assertThat(bitSet.get(10)).isTrue();
```

As usual, the indices are zero-based.** It’s even possible to set a range of bits toÂ ***trueÂ *using theÂ *set(fromInclusive, toExclusive)Â *method:

```
bitSet.set(20, 30);
for (int i = 20; i <= 29; i++) {
assertThat(bitSet.get(i)).isTrue();
}
assertThat(bitSet.get(30)).isFalse();
```

As is evident from the method signature, the beginning index is inclusive, and the ending one is exclusive.

When we say setting an index, we usually mean setting it toÂ *true*. Despite this terminology, we can set a particular bit index toÂ *falseÂ *using theÂ *set(index, boolean)Â *method:

```
bitSet.set(10, false);
assertThat(bitSet.get(10)).isFalse();
```

This version also supports setting a range of values:

```
bitSet.set(20, 30, false);
for (int i = 20; i <= 29; i++) {
assertThat(bitSet.get(i)).isFalse();
}
```

### 4.3. Clearing Bits

Instead of setting a specific bit index toÂ *false*, we can simply clear it using theÂ *clear(index)Â *method:

```
bitSet.set(42);
assertThat(bitSet.get(42)).isTrue();
bitSet.clear(42);
assertThat(bitSet.get(42)).isFalse();
```

Moreover, we can also clear a range of bits with theÂ *clear(fromInclusive, toExclusive)Â *overloaded version:

```
bitSet.set(10, 20);
for (int i = 10; i < 20; i++) {
assertThat(bitSet.get(i)).isTrue();
}
bitSet.clear(10, 20);
for (int i = 10; i < 20; i++) {
assertThat(bitSet.get(i)).isFalse();
}
```

Interestingly, **if we call this method without passing any arguments, it’ll clear all the set bits**:

```
bitSet.set(10, 20);
bitSet.clear();
for (int i = 0; i < 100; i++) {
assertThat(bitSet.get(i)).isFalse();
}
```

As shown above, after calling theÂ *clear()Â *method, all bits are set to zero.

### 4.4. Getting Bits

So far, we used theÂ *get(index)Â *method quite extensively. **When the requested bit index is set, then this method will returnÂ ***true*. Otherwise, it’ll returnÂ *false*:

```
bitSet.set(42);
assertThat(bitSet.get(42)).isTrue();
assertThat(bitSet.get(43)).isFalse();
```

Similar toÂ *setÂ *andÂ *clear*, we can get a range of bit indices using theÂ *get(fromInclusive, toExclusive)Â *method:

```
bitSet.set(10, 20);
BitSet newBitSet = bitSet.get(10, 20);
for (int i = 0; i < 10; i++) {
assertThat(newBitSet.get(i)).isTrue();
}
```

As shown above, this method returns anotherÂ *BitSet* in the [20, 30) range of the current one. That is, index 20 of the *bitSetÂ *variable is equivalent to index zero of theÂ *newBitSetÂ *variable.

### 4.5. Flipping Bits

**To negate the current bit index value, we can use theÂ ***flip(index)Â *method. That is, it’ll turnÂ *trueÂ *values toÂ *falseÂ *and vice versa:

```
bitSet.set(42);
bitSet.flip(42);
assertThat(bitSet.get(42)).isFalse();
bitSet.flip(12);
assertThat(bitSet.get(12)).isTrue();
```

Similarly, we can achieve the same thing for a range of values using theÂ *flip(fromInclusive, toExclusive)Â *method:

```
bitSet.flip(30, 40);
for (int i = 30; i < 40; i++) {
assertThat(bitSet.get(i)).isTrue();
}
```

### 4.6. Length

There are three length-like methods for aÂ *BitSet*. **TheÂ ***size()Â *method returns the number of bits the internal array can represent. For instance, since the no-arg constructor allocates aÂ *long[]Â *array with one element, then theÂ *size()Â *willÂ return 64 for it:

```
BitSet defaultBitSet = new BitSet();
assertThat(defaultBitSet.size()).isEqualTo(64);
```

With one 64-bit number, we can only represent 64 bits. Of course, this will change if we pass the number of bits explicitly:

```
BitSet bitSet = new BitSet(1024);
assertThat(bitSet.size()).isEqualTo(1024);
```

**Moreover, theÂ ***cardinality()Â *method represents the number of set bits in aÂ *BitSet*:

```
assertThat(bitSet.cardinality()).isEqualTo(0);
bitSet.set(10, 30);
assertThat(bitSet.cardinality()).isEqualTo(30 - 10);
```

At first, this method returns zero as all bits areÂ *false*. After setting the [10, 30) range toÂ *true*, then theÂ *cardinality()Â *method callÂ returns 20.

**Also, theÂ ***length()Â *method returns the one index after the index of the last set bit:

```
assertThat(bitSet.length()).isEqualTo(30);
bitSet.set(100);
assertThat(bitSet.length()).isEqualTo(101);
```

At first, the last set index is 29, so this method returns 30. When we set the index 100 to true, then theÂ *length()Â *method returns 101. **It’s also worth mentioning that this method will return zero if all bits are clear**.

Finally, theÂ *isEmpty()Â *method returnsÂ *falseÂ *when there is at least one set bit in theÂ *BitSet*. Otherwise, it’ll returnÂ *true*:

```
assertThat(bitSet.isEmpty()).isFalse();
bitSet.clear();
assertThat(bitSet.isEmpty()).isTrue();
```

### 4.7. Combining With OtherÂ *BitSet*s

**TheÂ ***intersects(BitSet)Â *method takes anotherÂ *BitSetÂ *and returnsÂ *trueÂ *when twoÂ *BitSet*s have something in common. That is, they have at least one set bit in the same index:

```
BitSet first = new BitSet();
first.set(5, 10);
BitSet second = new BitSet();
second.set(7, 15);
assertThat(first.intersects(second)).isTrue();
```

The [7, 9] range is set in bothÂ *BitSet*s, so this method returnsÂ *true*.

**It’s also possible to perform the logicalÂ ***andÂ *operation on twoÂ *BitSet*s:

```
first.and(second);
assertThat(first.get(7)).isTrue();
assertThat(first.get(8)).isTrue();
assertThat(first.get(9)).isTrue();
assertThat(first.get(10)).isFalse();
```

This will perform a logicalÂ *andÂ *between the two *BitSet*sÂ and modifies theÂ *firstÂ *variable with the result. Similarly, we can perform a logicalÂ *xorÂ *on twoÂ *BitSet*s, too:

```
first.clear();
first.set(5, 10);
first.xor(second);
for (int i = 5; i < 7; i++) {
assertThat(first.get(i)).isTrue();
}
for (int i = 10; i < 15; i++) {
assertThat(first.get(i)).isTrue();
}
```

There are other methods such as theÂ *andNot(BitSet) *or theÂ *or(BitSet),**Â *which can perform other logical operations on twoÂ *BitSet*s.

### 4.8. Miscellaneous

As of Java 8, **there is aÂ ***stream()Â *method to stream all set bits of aÂ *BitSet*. For instance:

```
BitSet bitSet = new BitSet();
bitSet.set(15, 25);
bitSet.stream().forEach(System.out::println);
```

This will print all set bits to the console. Since this will return anÂ *IntStream*, we can perform common numerical operations such as summation, average, counting, and so on. For instance, here we’re counting the number of set bits:

`assertThat(bitSet.stream().count()).isEqualTo(10);`

Also, **theÂ ***nextSetBit(fromIndex)Â *method will return the next set bit index starting from theÂ *fromIndex*:

`assertThat(bitSet.nextSetBit(13)).isEqualTo(15);`

TheÂ *fromIndexÂ *itself is included in this calculation.Â When there isn’t any *trueÂ *bitÂ left in theÂ *BitSet*, it’ll return -1:

`assertThat(bitSet.nextSetBit(25)).isEqualTo(-1);`

Similarly, **theÂ ***nextClearBit(fromIndex)Â *returns the next clear index starting from theÂ *fromIndex*:

`assertThat(bitSet.nextClearBit(23)).isEqualTo(25);`

On the other hand, theÂ *previousClearBit(fromIndex)Â *returns the index of the nearest clear index in the opposite direction:

`assertThat(bitSet.previousClearBit(24)).isEqualTo(14);`

Same is true forÂ *previousSetBit(fromIndex)*:

```
assertThat(bitSet.previousSetBit(29)).isEqualTo(24);
assertThat(bitSet.previousSetBit(14)).isEqualTo(-1);
```

Moreover, we can convert aÂ *BitSetÂ *to aÂ *byte[]Â *or aÂ *long[]Â *using theÂ *toByteArray()Â *orÂ *toLongArray()Â *methods, respectively:

```
byte[] bytes = bitSet.toByteArray();
long[] longs = bitSet.toLongArray();
```

## 5. Conclusion

In this tutorial, we saw how we can useÂ *BitSet*s to represent a vector of bits.

At first, we got familiar with the rationale behind not usingÂ *boolean[]* to represent a vector of bits. Then we saw how aÂ *BitSetÂ *works internally and what its API looks like.

As usual, all the examples are available over on GitHub.