Bloom Filter in Java using Guava

Azure Spring Apps is a fully managed service from Microsoft (built in collaboration with VMware), focused on building and deploying Spring Boot applications on Azure Cloud without worrying about Kubernetes.

And, the Enterprise plan comes with some interesting features, such as commercial Spring runtime support, a 99.95% SLA and some deep discounts (up to 47%) when you are ready for production.

>> Learn more and deploy your first Spring Boot app to Azure.

You can also ask questions and leave feedback on the Azure Spring Apps GitHub page.

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

The Jet Profiler was built for MySQL only, so it can do things like real-time query performance, focus on most used tables or most frequent queries, quickly identify performance issues and basically help you optimize your queries.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

Accelerate Your Jakarta EE Development with Payara Server!

With best-in-class guides and documentation, Payara essentially simplifies deployment to diverse infrastructures.

Beyond that, it provides intelligent insights and actions to optimize Jakarta EE applications.

The goal is to apply an opinionated approach to get to what's essential for mission-critical applications - really solid scalability, availability, security, and long-term support:

>> Download and Explore the Guide (to learn more)

The AI Assistant to boost Boost your productivity writing unit tests - Machinet AI.

AI is all the rage these days, but for very good reason. The highly practical coding companion, you'll get the power of AI-assisted coding and automated unit test generation.
Machinet's Unit Test AI Agent utilizes your own project context to create meaningful unit tests that intelligently aligns with the behavior of the code.
And, the AI Chat crafts code and fixes errors with ease, like a helpful sidekick.

Simplify Your Coding Journey with Machinet AI:

>> Install Machinet AI in your IntelliJ

Looking for the ideal Linux distro for running modern Spring apps in the cloud?

Meet Alpaquita Linux: lightweight, secure, and powerful enough to handle heavy workloads.

This distro is specifically designed for running Java apps. It builds upon Alpine and features significant enhancements to excel in high-density container environments while meeting enterprise-grade security standards.

Specifically, the container image size is ~30% smaller than standard options, and it consumes up to 30% less RAM:

>> Try Alpaquita Containers now.

DbSchema is a super-flexible database designer, which can take you from designing the DB with your team all the way to safely deploying the schema.

The way it does all of that is by using a design model, a database-independent image of the schema, which can be shared in a team using GIT and compared or deployed on to any database.

And, of course, it can be heavily visual, allowing you to interact with the database using diagrams, visually compose queries, explore the data, generate random data, import data or build HTML5 database reports.

>> Take a look at DBSchema

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

1. Overview

In this article, we’ll be looking at the Bloom filter construct from the Guava library. A Bloom filter is a memory-efficient, probabilistic data structure that we can use to answer the question of whether or not a given element is in a set.

There are no false negatives with a Bloom filter, so when it returns false, we can be 100% certain that the element is not in the set.

However, a Bloom filter can return false positives, so when it returns true, there is a high probability that the element is in the set, but we can not be 100% sure.

For a more in-depth analysis of how a Bloom filter works, you can go through this tutorial.

2. Maven Dependency

We will be using Guava’s implementation of the Bloom filter, so let’s add the guava dependency:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>32.1.3-jre</version>
</dependency>

The latest version can be found on Maven Central.

3. Why Use Bloom Filter?

The Bloom filter is designed to be space-efficient and fast. When using it, we can specify the probability of false positive responses which we can accept and, according to that configuration, the Bloom filter will occupy as little memory as it can.

Due to this space-efficiency, the Bloom filter will easily fit in memory even for huge numbers of elements. Some databases, including Cassandra and Oracle, use this filter as the first check before going to disk or cache, for example, when a request for a specific ID comes in.

If the filter returns that the ID is not present, the database can stop further processing of the request and return to the client. Otherwise, it goes to the disk and returns the element if it is found on disk.

4. Creating a Bloom Filter

Suppose we want to create a Bloom filter for up to 500 Integers and that we can tolerate a one-percent (0.01) probability of false positives.

We can use the BloomFilter class from the Guava library to achieve this. We need to pass the number of elements that we expect to be inserted into the filter and the desired false positive probability:

BloomFilter<Integer> filter = BloomFilter.create(
  Funnels.integerFunnel(),
  500,
  0.01);

Now let’s add some numbers to the filter:

filter.put(1);
filter.put(2);
filter.put(3);

We added only three elements, and we defined that the maximum number of insertions will be 500, so our Bloom filter should yield very accurate results. Let’s test it using the mightContain() method:

assertThat(filter.mightContain(1)).isTrue();
assertThat(filter.mightContain(2)).isTrue();
assertThat(filter.mightContain(3)).isTrue();

assertThat(filter.mightContain(100)).isFalse();

As the name suggests, we cannot be 100% sure that a given element is actually in the filter when the method returns true.

When mightContain() returns true in our example, we can be 99% certain that the element is in the filter, and there is a one-percent probability that the result is a false positive. When the filter returns false, we can be 100% certain that the element is not present.

5. Over-Saturated Bloom Filter

When we design our Bloom filter, it is important that we provide a reasonably accurate value for the expected number of elements. Otherwise, our filter will return false positives at a much higher rate than desired. Let’s see an example.

Suppose that we created a filter with a desired false-positive probability of one percent and an expected some elements equal to five, but then we inserted 100,000 elements:

BloomFilter<Integer> filter = BloomFilter.create(
  Funnels.integerFunnel(),
  5,
  0.01);

IntStream.range(0, 100_000).forEach(filter::put);

Because the expected number of elements is so small, the filter will occupy very little memory.

However, as we add more items than expected, the filter becomes over-saturated and has a much higher probability of returning false positive results than the desired one percent:

assertThat(filter.mightContain(1)).isTrue();
assertThat(filter.mightContain(2)).isTrue();
assertThat(filter.mightContain(3)).isTrue();
assertThat(filter.mightContain(1_000_000)).isTrue();

Note that the mightContatin() method returned true even for a value that we didn’t insert into the filter previously.

6. Conclusion

In this quick tutorial, we looked at the probabilistic nature of the Bloom filter data structure – making use of the Guava implementation.

You can find the implementation of all these examples and code snippets in the GitHub project.

This is a Maven project, so it should be easy to import and run as it is.

Bloom Filter in Java using Guava

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Overview

2. Maven Dependency

3. Why Use Bloom Filter?

4. Creating a Bloom Filter

5. Over-Saturated Bloom Filter

6. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course:

REST with Spring

Learn Spring Security ▼▲

Learn Spring Security Core

Learn Spring Security OAuth

Learn Spring

Learn Spring Data JPA

Persistence

REST

Security

Full Archive

Baeldung Ebooks

About Baeldung

Write for Baeldung

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Overview

2. Maven Dependency

3. Why Use Bloom Filter?

4. Creating a Bloom Filter

5. Over-Saturated Bloom Filter

6. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course: