MD5 vs. SHA Algorithms | Baeldung on Computer Science

1. Introduction

In this article, we’ll elaborate on two cryptographic algorithms, namely MD5 (message-digest algorithm) and SHA (Secure Hash Algorithm). We’ll discuss them in detail, and after that, we’ll compare them.

2. Cryptographic Hash Functions

To begin with, let’s define a cryptographic hash function, a fundamental element of both mentioned algorithms. A cryptographic hash function takes a variable-length input and produces fixed-size output called a hash. In other words, it maps an arbitrarily large input into a fixed-size array of bits (hash).

A cryptographic hash function should be a one-way operation. Therefore, retrieving the data using its hash should be impossible. In general, one shouldn’t be able to guess or retrieve any useful information from the hash. Therefore, pseudorandomness of cryptographic hash functions is demanded. Moreover, a cryptographic hash function needs to be collision-resistant. There shouldn’t be two different messages that produce the same hash.

Cryptographic hash functions are often used to check data integrity and identify files. It’s easier and faster to compare hashes than to compare the data itself. Further, they are used for authentication purposes, storing confidential data (e.g., passwords) in databases, or for password verification. As we can see, cryptographic hash functions are strongly related to an application or data security. Therefore, they should be secure and reliable.

3. MD5

MD5 is a cryptographic hash function that takes arbitrarily long data and produces a 128-bit hash. Although it’s considered to be cryptographically broken, it’s still widely used for some purposes. One of the most common uses is validating the integrity of publicity shared files. The MD5 algorithm processes data in 512-bit chunks split into 16 words composed of 32 bits each. The result is a 128-bit hash.

Let’s see the MD5 hashing in practice. Consider the following example:

MD5("The grass is always greener on the other side of the fence.") = d78298e359ac826549e3030104241a57

Just a simple change in the input (replacing dot with exclamation mark) produces an entirely different hash:

MD5("The grass is always greener on the other side of the fence!") = 2e51f2f8daec292839411955bd77183d

Such a property is called an avalanche effect.

As we mentioned earlier, the MD5 is considered to be cryptographically broken. Let’s talk in detail about its security.

3.1. Security

Let’s recall one of the most essential attributes of the cryptographic hash function: a cryptographic hash function needs to be collision-resistant. In simple words, two inputs should never produce the same hash.

In 2011, Internet Engineering Task Force (IETF) published RFC 6151, describing possible attacks on MD5. Some attacks could generate collisions in less than a minute on an average computer. The research stated that:

the aforementioned results have provided sufficient reason to eliminate MD5 usage in applications where collision resistance is required such as digital signatures.

Thus, the MD5 is no longer recommended for solutions requiring a high level of security. However, as we mentioned earlier, it’s widely used as a checksum for files. Let’s consider an example. An indie developer publishes a game free of charge. The game file has a specific hash value assigned. You’re downloading the game from a third-party site. If the hash of the downloaded file differs, it isn’t the original one. Thus, it can be a virus, or files may have been damaged while downloading (e.g., due to network issues).

To sum up, the MD5 algorithm has security vulnerabilities, and it’s considered cryptographically broken. Nowadays, there are more secure algorithms like SHA-2. Let’s introduce it.

4. SHA-2

SHA is a widely used family of hash algorithms. There are currently three main versions, namely SHA-1, SHA-2, SHA-3. In this article, we’ll focus on a popular SHA-2 algorithm. SHA-2 consists of different variants which use the same algorithm but different constants. Therefore, they produce an output of different lengths, e.g., 224, 256, or 512 bits. The variants are often referred to as SHA-224, SHA-256, SHA-512, etc. Although, they are all subversions of SHA-2. Let’s use examples from the MD5 section and see SHA-256 in practice:

SHA256("The grass is always greener on the other side of the fence.") = d017bcafd6aa208df913d92796f670df44cb8d7f7b548d6f9eddcccf214ac08a

SHA256("The grass is always greener on the other side of the fence!") = a8c655db7f4d0a3a0b34209f3b89d4466332bbf2745e759e01567ac74b23a349

SHA2- is known for its security. It is used for multiple purposes like cryptocurrencies, TLS, SSL, SSH, password hashing, digital signature verification. Moreover, SHA-2 is required to be used by law in some U.S. government applications, primarily to protect confidential data.

4.1. Security

Let’s analyze the security of the SHA-256 algorithm. It’s one of the most secure and popular hashing algorithms. First of all, it’s a one-way operation. Therefore, it’s almost impossible to reconstruct the input from the hash. Theoretically, a brute force attack would need $2^{256}$ attempts to achieve this.

Secondly, SHA-256 is collision-resistant. This is because there are $2^{256}$ possible hash values. Therefore, there is almost no chance of collision in practice.

Finally, the SHA-256 follows the avalanche effect. A small change in the input produces a completely different hash.

To sum up, SHA-256 meets all of the important requirements of the cryptographic hash function. Thus, it’s very often used in applications requiring a high level of security.

5. MD5 vs. SHA-2

Now we know the fundamentals of MD5 and SHA-2. Let’s compare them. First of all, MD5 produces 128-bit hashes. SHA-2 contains subversion that can produce hashes of different lengths. The most common is SHA-256 that produces 256-bit hashes.

Secondly, the SHA-2 is more secure than MD5, especially in terms of collision resistance. Therefore, the MD5 isn’t recommended to use for high-security purposes. On the other hand, the SHA-2 is used for high-security purposes, e.g., digital signature or SSL handshake. Moreover, there are fewer reported attacks on SHA-2 than on MD5. The MD5 is considered to be cryptographically broken and can be attacked by an average computer.

In terms of speed, the MD5 is slightly faster than SHA-2. Therefore, the MD5 is often used as a checksum for verifying files integrity.

To sum up, in most cases, SHA-2 will do better than MD5. It’s more secure, reliable, and less likely to be broken. It doesn’t really matter that SHA-2 is slightly slower than the MD5 until the speed is the main criteria. The SHA-2 has subversion that produces different length hashes. The longer hash means that the algorithm is slower. Thus, SHA-256 seems to be the best balance between security and speed.

6. Conclusion

In this article, we discussed the MD5 and SHA-2 algorithms in detail. Then, we compared both. The conclusion is that SHA-2 does better than MD5 in most cases, especially regarding security. On the other hand, MD5 can be used in solutions that don’t require a high level of security and when speed is the main criteria.

Full Archive

About Baeldung

Core Concepts

Operating Systems

Artificial Intelligence

Graph Theory

Latex