1. Introduction

In this tutorial, we’re going to walk through the theoretical background of the ABA problem in concurrent programming. We’ll see the root cause of it as well as a solution.

2. Compare and Swap

To understand the root cause, let’s briefly review the concept of Compare and Swap.

Compare and Swap (CAS) is a common technique in lock-free algorithms to ensure that updates to shared memory by one thread fail if another thread has modified the same space in the meantime.

We achieve this by using two pieces of information in each update: The updated value as well as the original value. Then Compare and Swap will first compare the existing value to the original. If equal, it then swaps the existing value with the updated one.

Of course, this scenario can happen with references as well.

3. The ABA Problem

Now, the ABA problem is an anomaly where Compare and Swap alone fails us.

Say, for example, that one activity reads some shared memory (A), in preparation for updating it. Then, another activity temporarily modifies that shared memory (B) and then restores it (A). Following that, once the first activity performs Compare and Swap, it will appear as if no change has been made, invalidating the integrity of the check.

While in many scenarios this doesn’t cause a problem, at times, A is not as equal to A as we might think. Let’s see how this looks in practice.

3.1. An Example Domain

To demonstrate the problem via a practical example, let’s consider a simple bank account class, where an integer variable holds the amount of the actual balance. We also have two functions: one for withdrawals and one for deposits. These operations use CAS to decrease and increase the balance of the account.

3.2. Where Is the Problem?

Let’s think about a multithreaded scenario when Thread 1 and Thread 2 are operating on the same bank account.

When Thread 1 wants to withdraw some money, it reads the actual balance to use that value for comparing the amount in the CAS operation later. However, for some reason, Thread 1 is a bit slow — maybe it’s blocked.

In the meantime, Thread 2 performs two operations on the account using the same mechanism while Thread 1 is suspended. First, it changes the original value, which has already been read by Thread 1, but then, it changes it back to the original value.

Once Thread 1 resumes, it will appear as if nothing has changed, and CAS will succeed:

4. Java Example

To better visualize this, let’s look at some code. Here, we’ll use Java, but the problem itself is not language-specific.

4.1. The Account

First, our Account class holds the balance amount in an AtomicInteger, which gives us CAS for integers in Java. Also, there’s another AtomicInteger to count the number of succeeded transactions. Finally, we have a ThreadLocal variable to capture the CAS operation failures for a given thread.

public class Account {
    private AtomicInteger balance;
    private AtomicInteger transactionCount;
    private ThreadLocal<Integer> currentThreadCASFailureCount;
    ...
}

4.2. Deposits

Next, we can implement the deposit method for our Account class:

public boolean deposit(int amount) {
    int current = balance.get();
    boolean result = balance.compareAndSet(current, current + amount);
    if (result) {
        transactionCount.incrementAndGet();
    } else {
        int currentCASFailureCount = currentThreadCASFailureCount.get();
        currentThreadCASFailureCount.set(currentCASFailureCount + 1);
    }
    return result;
}

Note, the AtomicInteger.compareAndSet(…) is nothing but a wrapper around the AtomicInteger.compareAndSwap() method to reflect the boolean result of the CAS operation.

4.3. Withdrawals

Similarly, the withdraw method can be created as:

public boolean withdraw(int amount) {
    int current = getBalance();
    maybeWait();
    boolean result = balance.compareAndSet(current, current - amount);
    if (result) {
        transactionCount.incrementAndGet();
    } else {
        int currentCASFailureCount = currentThreadCASFailureCount.get();
        currentThreadCASFailureCount.set(currentCASFailureCount + 1);
    }
    return result;
}

To be able to demonstrate the ABA problem, we created a maybeWait() method to hold some time-consuming operation that provides some extra time frame for the other thread to perform modifications on the balance.

Now, we just suspend Thread 1 for two seconds:

private void maybeWait() {
    if ("thread1".equals(Thread.currentThread().getName())) {
        TimeUnit.SECONDS.sleep(2);
    }
}

4.4. The ABA Scenario

Finally, we can write a unit test to check whether the ABA problem is possible or not.

What we’ll do is have two threads, our Thread 1 and Thread 2 from before. Thread 1 will read the balance and get delayed. Thread 2, while Thread 1 is sleeping, will change the balance, and then change it back.

Once Thread 1 wakes up, it’ll be none the wiser, and its operation will still succeed.

After some initialization, we can create Thread 1 that will need some extra time to perform the CAS operation. After finishing that, it won’t realize the internal state has been changed, thus the CAS failure count will be zero instead of the expected 1 in an ABA scenario:

@Test
public void abaProblemTest() {
    // ...
    Thread thread1 = new Thread(() -> {
        assertTrue(account.withdraw(amountToWithdrawByThread1));

        assertTrue(account.getCurrentThreadCASFailureCount() > 0); // test will fail!
    }, "thread1");
    // ...
}

Similarly, we can create Thread 2 that will finish before Thread 1 and change the balance of the account and change it back to the original value. In this case, we are not expecting any CAS failure.

@Test
public void abaProblemTest() {
    // ...
    Thread thread2 = new Thread(() -> {
        assertTrue(account.deposit(amountToDepositByThread2));
        assertEquals(defaultBalance + amountToDepositByThread2, account.getBalance());
        assertTrue(account.withdraw(amountToWithdrawByThread2));

        assertEquals(defaultBalance, account.getBalance());

        assertEquals(0, account.getCurrentThreadCASFailureCount());
    }, "thread2");
    // ...
}

After running the threads, Thread 1 will get the expected balance, although it is not expecting for the extra two transactions from Thread 2:

@Test
public void abaProblemTest() {
    // ...

    assertEquals(defaultBalance - amountToWithdrawByThread1, account.getBalance());
    assertEquals(4, account.getTransactionCount());
}

5. Value- vs. Reference-Based Scenarios

In the above example, we can spot an important fact — the AtomicInteger that we got back at the end of the scenario is entirely the same as the one we started with. Other than failing to capture the two extra transactions made by Thread 2, no anomaly happened in this particular example.

The reason behind this is the fact that we basically used a value type rather than a reference type.

5.1. Reference-Based Anomaly

We might encounter the ABA problem using reference types with the purpose of reusing them. In this case, at the end of the ABA scenario, we get the matching reference back, so the CAS operation succeeds, however, the reference might point to a different object than it did originally. This can lead to ambiguity.

6. Solutions

Now that we’ve got a good picture of the problem, let’s dig into some possible solutions.

6.1. Garbage Collection

For reference types, Garbage Collection (GC) can protect us against the ABA problem most of the time.

When Thread 1 has an object reference at the given memory address that we are using, then nothing that Thread 2 does can cause another object to use that same address. The object is still alive, and its address won’t be reused until no references to it remain.

And while this works for reference types, the issue is when we rely on GC in lock-free data structures.

Of course, it’s also true that some languages don’t offer GC.

6.2. Hazard Pointers

Hazard pointers are somewhat connected to the previous one – we can use them in languages that don’t have an automatic garbage collection mechanism.

Simply put, threads keep track of the questioned pointers in a shared data structure. In this way, each thread knows that the object on the given memory address defined by the pointer might have been modified by another thread.

So now, let’s look at a couple of hardier solutions.

6.3. Immutability

Of course, the usage of immutable objects solves this problem, as we don’t reuse objects across the application. Whenever something changes, a new object is created, so the CAS will fail for sure.

However, our final solution also allows for mutable objects.

6.4. Double Compare and Swap

The idea behind the double compare and swap method is to keep track of one more variable, which is the version number, then use that in the comparison as well. In this case, the CAS operation will fail if we have the old version number, which is only possible when another thread modified our variable in the meantime.

In Java, AtomicStampedReference and AtomicMarkableReference are the standard implementations of this method.

7. Conclusion

In this article, we learned about the ABA problem and some techniques to prevent it.

As always, the full source code, as well as more examples, are all available over on GitHub.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments