Java Top

The early-bird price of the new Learn Spring Security OAuth course packages will increase by $50 on Wednesday:

>> CHECK OUT THE COURSE

1. Overview

In this tutorial, we're going to explore the Caesar cipher, an encryption method that shifts letters of a message to produce another, less readable one.

First of all, we'll go through the ciphering method and see how to implement it in Java.

Then, we'll see how to decipher an encrypted message, provided we know the offset used to encrypt it.

And finally, we'll learn how to break such a cipher and thus retrieving the original message from the encrypted one without knowing the offset used.

2. Caesar Cipher

2.1. Explanation

First of all, let's define what a cipher is. A cipher is a method for encrypting a message, intending to make it less readable. As for the Caesar cipher, it's a substitution cipher that transforms a message by shifting its letters by a given offset.

Let's say we want to shift the alphabet by 3, then letter A would be transformed to letter D, B to E, C to F, and so on.

Here is the complete matching between original and transformed letters for an offset of 3:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
D E F G H I J K L M N O P Q R S T U V W X Y Z A B C

As we can see, once the transformation goes beyond the letter Z, we go back to the start of the alphabet, so that X, Y and Z are transformed into A, B and C, respectively.

Therefore, if we choose an offset greater or equal to 26, we loop, at least one time, over the entire alphabet. Let's imagine we shift a message by 28, that really means we're shifting it by 2. Indeed, after shifting by 26, all letters are matching themselves.

Really, we can transform any offset into a simpler offset by performing a modulo 26 operation on it:

offset = offset % 26

2.2. Algorithm in Java

Now, let's see how to implement the Caesar cipher in Java.

First, let's create a class CaesarCipher that will hold a cipher() method taking a message and an offset as parameters:

public class CaesarCipher {
    String cipher(String message, int offset) {}
}

That method will encrypt the message using the Caesar cipher.

We'll suppose here that offsets are positive and messages only contain lower case letters and spaces. Then, what we want is to shift all alphabetic characters by the given offset:

StringBuilder result = new StringBuilder();
for (char character : message.toCharArray()) {
    if (character != ' ') {
        int originalAlphabetPosition = character - 'a';
        int newAlphabetPosition = (originalAlphabetPosition + offset) % 26;
        char newCharacter = (char) ('a' + newAlphabetPosition);
        result.append(newCharacter);
    } else {
        result.append(character);
    }
}
return result;

As we can see, we rely on the ASCII codes of the alphabet letters to achieve our goal.

First, we compute the position of the current letter in the alphabet, and for that, we take its ASCII code and subtract the ASCII code of letter a from it. Then we apply the offset to this position, carefully using the modulo to remain in the alphabet range. And finally, we retrieve the new character by adding the new position to the ASCII code of letter a.

Now, let's try this implementation on the message “he told me i could never teach a llama to drive” with an offset of 3:

CaesarCipher cipher = new CaesarCipher();

String cipheredMessage = cipher.cipher("he told me i could never teach a llama to drive", 3);

assertThat(cipheredMessage)
  .isEqualTo("kh wrog ph l frxog qhyhu whdfk d oodpd wr gulyh");

As we can see, the ciphered message respects the matching defined earlier for an offset of 3.

Now, this particular example has the specificity not to exceed the letter z during the transformation, therefore not having to go back to the start of the alphabet. Thus, let's try again with an offset of 10 so that some letters will be mapped to letters at the beginning of the alphabet, like t which will be mapped to d:

String cipheredMessage = cipher.cipher("he told me i could never teach a llama to drive", 10);

assertThat(cipheredMessage)
  .isEqualTo("ro dyvn wo s myevn xofob dokmr k vvkwk dy nbsfo");

It works as expected, thanks to the modulo operation. That operation also takes care of larger offsets. Let's say we want to use 36 as offset, which is equivalent to 10, the modulo operation ensures that the transformation will give the same result.

3. Decipher

3.1. Explanation

Now, let's see how to decipher such a message when we know the offset used to encrypt it.

As a matter of fact, deciphering a message encrypted with Caesar cipher can be seen as ciphering it with a negative offset, or also ciphering it with a complementary offset.

So, let's say we have a message encrypted with an offset of 3. Then, we can either encrypt it with an offset of -3 or encrypt it with an offset of 23. Either way, we retrieve the original message.

Unfortunately, our algorithm doesn't handle negative offset out of the box. We'll have problems converting letters looping back to the end of the alphabet (for example transforming the letter a into the letter z with an offset of -1). But, we can compute the complementary offset, which is positive, and then use our algorithm.

So, how to obtain this complementary offset? The naïve way of doing this would be to subtract the original offset from 26. Of course, this will work for offsets between 0 and 26 but will give negative results otherwise.

That's where we'll make use of the modulo operator again, directly on the original offset, before doing the subtraction. That way, we ensure always returning a positive offset.

3.2. Algorithm in Java

Let's now implement it in Java. First, we'll add a decipher() method to our class:

String decipher(String message, int offset) {}

Then, let's call the cipher() method with our calculated complementary offset:

return cipher(message, 26 - (offset % 26));

That's it, our deciphering algorithm is set up. Let's try it on the example with offset 36:

String decipheredSentence = cipher.decipher("ro dyvn wo s myevn xofob dokmr k vvkwk dy nbsfo", 36);
assertThat(decipheredSentence)
  .isEqualTo("he told me i could never teach a llama to drive");

As we can see, we retrieve our original message.

4. Breaking the Ceasar Cipher

4.1. Explanation

Now that we've covered ciphering and deciphering messages using the Caesar cipher, we can dive into how to break it. That is, decipher a ciphered message without knowing the used offset at first.

To do that, we'll make use of the probabilities to find English letters in a text. The idea will be to decipher the message using offsets 0 to 25 and check what shift presents a letter distribution similar to that of English texts.

In order to determine the similarity of two distributions, we'll use the Chi-squared statistic.

The Chi-squared statistic will provide a number telling us whether two distributions are similar or not. The smaller the number, the more similar they are.

So, we'll compute the Chi-square for every offset and then return the one with the smallest Chi-square. This should give us the offset used to cipher the message.

However, we must keep in mind that this technique is not bulletproof and should the message be too short or using words unfortunately non-representative of a standard English text, it could return a wrong offset.

4.2. Define the Base Letters Distribution

Let's now see how to implement the breaking algorithm in Java.

First of all, let's create a breakCipher() method in our CaesarCipher class, which will return the offset used to encrypt a message:

int breakCipher(String message) {}

Then, let's define an array containing the probabilities to find a certain letter in an English text:

double[] englishLettersProbabilities = {0.073, 0.009, 0.030, 0.044, 0.130, 0.028, 0.016, 0.035, 0.074,
  0.002, 0.003, 0.035, 0.025, 0.078, 0.074, 0.027, 0.003,
  0.077, 0.063, 0.093, 0.027, 0.013, 0.016, 0.005, 0.019, 0.001};

From this array, we'll be able to calculate the expected frequencies of the letters in a given message, by multiplying the probabilities by the length of the message:

double[] expectedLettersFrequencies = Arrays.stream(englishLettersProbabilities)
  .map(probability -> probability * message.getLength())
  .toArray();

For example, in a message of length 100, we should expect the letter a to appear 7.3 times, and the letter e to appear 13 times.

4.3. Calculate the Chi-squares

Now, we're going to calculate the Chi-squares of deciphered message letters distribution and standard English letters distribution.

To achieve that, we'll need to import the Apache Commons Math3 library that contains a utility class to compute Chi-squares:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-math3</artifactId>
    <version>3.6.1</version>
</dependency>

What we need to do now is to create an array that'll contain the calculated Chi-squares for each offset between 0 and 25.

Thus, we'll decipher the encrypted message using each offset, and then count the letters in that message.

Finally, we'll use the ChiSquareTest#chiSquare method to calculate the Chi-square between the expected and observed letters distribution:

double[] chiSquares = new double[26];

for (int offset = 0; offset < chiSquares.length; offset++) {
    String decipheredMessage = decipher(message, offset);
    long[] lettersFrequencies = observedLettersFrequencies(decipheredMessage);
    double chiSquare = new ChiSquareTest().chiSquare(expectedLettersFrequencies, lettersFrequencies);
    chiSquares[offset] = chiSquare;
}

return chiSquares;

The observedLettersFrequencies() method simply realizes a count of letters a to z in the passed message:

long[] observedLettersFrequencies(String message) {
    return IntStream.rangeClosed('a', 'z')
      .mapToLong(letter -> countLetter((char) letter, message))
      .toArray();
}

long countLetter(char letter, String message) {
    return message.chars()
      .filter(character -> character == letter)
      .count();
}

4.4. Find the Most Probable Offset

Once all the Chi-squares calculated, we can return the offset matching the smallest Chi-square:

int probableOffset = 0;
for (int offset = 0; offset < chiSquares.length; offset++) {
    System.out.println(String.format("Chi-Square for offset %d: %.2f", offset, chiSquares[offset]));
    if (chiSquares[offset] < chiSquares[probableOffset]) {
        probableOffset = offset;
    }
}

return probableOffset;

Although it's not necessary to enter the loop with offset 0 as we consider it to be the minimum before starting the loop, we do it to print its Chi-square value.

Let's try this algorithm on the message encrypted using offset 10:

int offset = algorithm.breakCipher("ro dyvn wo s myevn xofob dokmr k vvkwk dy nbsfo");
assertThat(offset).isEqualTo(10);

assertThat(algorithm.decipher("ro dyvn wo s myevn xofob dokmr k vvkwk dy nbsfo", offset))
  .isEqualTo("he told me i could never teach a llama to drive");

As we can see, the method retrieves the correct offset, which can then be used to decipher the message an retrieve the original one.

Here are the different Chi-squares calculated for this particular break:

Chi-Square for offset 0: 210.69
Chi-Square for offset 1: 327.65
Chi-Square for offset 2: 255.22
Chi-Square for offset 3: 187.12
Chi-Square for offset 4: 734.16
Chi-Square for offset 5: 673.68
Chi-Square for offset 6: 223.35
Chi-Square for offset 7: 111.13
Chi-Square for offset 8: 270.11
Chi-Square for offset 9: 153.26
Chi-Square for offset 10: 23.74
Chi-Square for offset 11: 643.14
Chi-Square for offset 12: 328.83
Chi-Square for offset 13: 434.19
Chi-Square for offset 14: 384.80
Chi-Square for offset 15: 1206.47
Chi-Square for offset 16: 138.08
Chi-Square for offset 17: 262.66
Chi-Square for offset 18: 253.28
Chi-Square for offset 19: 280.83
Chi-Square for offset 20: 365.77
Chi-Square for offset 21: 107.08
Chi-Square for offset 22: 548.81
Chi-Square for offset 23: 255.12
Chi-Square for offset 24: 458.72
Chi-Square for offset 25: 325.45

As we can see, the one for offset 10 is clearly smaller than the others.

5. Conclusion

In this article, we covered the Caesar cipher. We learned how to cipher and decipher a message by shifting its letters by a given offset. We also learned how to break the cipher. And we saw all the Java implementations that allow us to do that.

The code of this article can be found over on GitHub.

Java bottom

The early-bird price of the new Learn Spring Security OAuth course packages will increase by $50 on Wednesday:

>> CHECK OUT THE COURSE

Leave a Reply

avatar
  Subscribe  
Notify of