security issue behind the use of the MD5 algorithm for passwords

Note: this was a module assessment, thus expect formal language and research paper-like formatting. Oh, and this post is extremely n00b friendly 🙂

Disclaimer: the information contained in this page is for educational purposes only.

Abstract

Cryptographic hash functions are commonly used to convert passwords to a fixed size string that cannot be reverted. This paper analyses and explains the reasons why the MD5 hashing algorithm must not be used for password hashing. To understand that, a brief background of storing passwords and basics of cryptographic hashing is provided. Furthermore, common cracking techniques against the MD5 password hashes are briefly explained. A dictionary attack is provided as a practical example to demonstrate the main flaw of the MD5 hashing algorithm. The dictionary attack is performed using the password recovery tool hashcat and the RockYou wordlist. The last section contains countermeasures such as adding salt and using bcrypt hashing algorithm for hashing passwords.

Contents

1 Introduction

1.1 Background

With the rapid development and implementation of computer technology in modern life, human-computer interaction is certainly frequent. User authentication is usually conducted in almost any human-computer interaction. Commonly, a combination of a username and a password is used to authenticate a user. However, in consideration of implementing passwords into applications, storing passwords is a prevalent problem in computer security. In such systems, the input password is compared to the password stored in a private database, and when matched, the user is authenticated (See Figure 1). Although the process appears to be straightforward, it is highly likely for such an application to be insecure.

Figure 1: Basic password-based user authentication

There are several methods applied for storing passwords in databases, certain techniques being considerably more secure than the others. The simplest, however, the most insecure approach is storing passwords as plain text. This means that passwords are kept in a human-readable format inside a database. If the database is ever compromised, user passwords would become easily obtainable and abused which is what makes this technique highly unsafe.

1.2 The Hashing

A cryptographic hash function is a one-way function, meaning that the converted data cannot be reverted (See Figure 2).

Figure 2: One-way hashing process cannot be reverted

One-way hash function f(x) is a function that turns a string of data x (a pre-image) to a fixed-length output y (a hash or a hash value).

There are numerous applications of cryptographic hash functions in Computer Security. For instance, hash functions are implemented in digital signatures, file integrity verification, password protection and in many other protocols.

Traditionally, a secure cryptographic hash function has at least these properties:

  1. Pre-image resistance. It must be infeasible to reverse a string of data x from a fixed-size hash value y.
  2. Collision attack resistance. It must be infeasible to produce the same hash value from two distinct strings of data x.

1.3 Hashing Passwords

A much safer approach is to use a secure one-way cryptographic function to hash the password and save the representation, i.e. the hash of the password in the database instead of the password. Consequently, hashing passwords make it more complicated for criminals to obtain the users’ passwords if the database is compromised. During the user authentication, the input password is hashed and then compared to a hash stored inside of a database. The process of user authentication in a hash-based system (See Figure 3):

  1. The user creates an account with a username and a password.
  2. Their password is hashed and the hash x is stored in the database.
  3. When attempting to log in, the input password is hashed, then the produced hash y is compared to the hash x stored in the database.
  4. If the hashes x and y match, the user is authenticated.
Figure 3: User authentication in a hash-based system

There are countless of different hash functions being used for hashing passwords. However, specific algorithms are considered safer and more practical to use. It is fundamental to understand that the security of the password database depends heavily on the speed of the selected cryptographic hashing algorithm. For instance, a fast hash function is executed faster when running on modern hardware. Thus, it increases the speed of the brute-forcing which is explained later in the paper. Nevertheless, both, broken and unsuitable algorithms for password hashing are still being used — one of them being the MD5 (Message-Digest Algorithm 5) hashing algorithm.

1.4 The MD5 Hashing Algorithm

This project is focused on the security of the implementation of the MD5 hash function for passwords. Therefore, it is necessary to provide a brief description of the MD5 hash algorithm.

The 128-bit MD5 hash function was developed in 1992 by Ronald Rivest. It was designed to be fast and implemented in digital signature applications (Rivest, 1992). However, MD5 has also been used for password hashing until this day. Since 1992, MD5 has been extensively analysed, and cryptographic weaknesses were found.

It is considered that the MD5 algorithm is simply too fast to be used for hashing passwords. With modern technology and available computational power, it takes milliseconds to break a password hash. A hashing algorithm must be computationally expensive for the purpose of lengthening the time of the hash breaking process. Despite that, the MD5 hash function is continued to be used improperly for hashing passwords.

1.5 Breaking The Hash

The most popular methods used to crack hashed passwords are various types of brute-force and rainbow table attacks. It is important to note that these password attacks are commonly performed against the MD5 hashes. Therefore, it is fundamental to have a basic knowledge of password security attacks.

A brute-force attack essentially repeatedly attempts to guess the password. A basic brute-force attack is to generate every possible combination of a password, hash that string and try matching it with the password hash (See Figure 4: Basic brute-force attack). However, such method requires a substantial amount of time and computational power. Therefore, a basic brute-force attack is impractical, and other attacks are more common. For instance, it is more feasible for a perpetrator to perform a rule-based brute-force attack, i.e. set up rules to generate various strings and try to match them with the password hash.

Figure 4: Basic brute-force attack

There are several methods that are used for breaking password hashes. Two of the most common techniques are dictionary attacks and rainbow tables.

  1. A dictionary attack is one of the more widely used techniques to crack hashes. Basically, it is a type of brute-force attack where an attacker instead of generating a wordlist, can use a text file already filled with common passwords. A wordlist or a dictionary file is a text file filled with plaintext words and phrases that are commonly used as a password. In most situations, a dictionary attack is considered a far more effective approach than generating random strings of data because users tend to choose memorable keywords rather than random strings of characters.
  2. It is fundamental to have basic knowledge of hash tables for the purpose of understanding rainbow tables. Hash table attack is a lookup attack that uses a pre-computed table containing keys that map to values. The hash table attack is considered faster than a brute-force attack because it removes a computationally sophisticated part of the traditional brute-force attack – the process of generating hashes. However, hash tables usually use a large amount of disk space and there is a time memory-trade off when using them, meaning that memory is traded for time in an effort to speed up the password breaking process.
  3. Rainbow tables store the same information as hash tables but use less disk space. However, they are much more complex and require not only a hash function but also a reduction function. The fundamental part of a rainbow table is the reduction function. A hash function maps a plaintext to a hash (See Figure 6: One-way hashing process. In comparison, a reduction function maps a hash to a plaintext. It is important to understand that a reduction function does not revert the hash as it is non-revertible (See Figure 2: One-way hashing process cannot be reverted); rather, it returns a plaintext that produces the same hash. In other words, a reduction function does not always return the original plaintext value. For instance, a simple reduction function is to take the first six letters from a produced hash (See Figure 6: Reduction process)
Figure 5: One-way hashing process
 
Figure 6: Reduction process

A rainbow table contains multiple chains of a certain length. A chain consists of an arbitrary plaintext and ends with a hash. For instance, to produce a 50,000-length chain, pick an arbitrary plaintext and iterate these steps 50,000 times:

  1. Hash the plaintext;
  2. Reduce the produced hash;
  3. Another plaintext is returned.

The chain only stores the starting plaintext and the last hash rather than a complete table of password hashes. Despite that, each link of a hash and a plaintext is retained.

As mentioned, rainbow tables contain multiple chains. Consider that a password of hash x is wanted. In order to discover the password (plaintext that produced x), it is needed to repeat these steps until the hash is found:

  1. Look up for hash x in a table of the last hashes from every chain, return if found;
  2. Otherwise, apply reduction function to the hash to produce another plaintext;
  3. Hash the produced plaintext.

If the hash x matches any chain’s last hash, that chain contains the original hash. By applying hash and reduction functions to that chain, it is now feasible to find a plaintext that produces the wanted hash x.

In comparison to hash tables, rainbow tables store a pair of keys rather than a complete table of password hashes. Besides that, rainbow tables retain each plaintext and hash, albeit it takes up much less space than hash tables. However, it is important to note that rainbow tables require more computational power for hashing and reducing.

1.6 Aim

The aim of the project is to explain the reasons why it is insecure to use the MD5 algorithm for hashing passwords. To achieve that, a background of storing passwords and basics of hashing was provided in the introduction section. Moreover, the MD5 hashing algorithm was briefly explained and its crucial flaw was also addressed. Finally, the most common password attacks against the MD5 hashes were explained in the last section of the introduction.

To demonstrate the main weakness of the MD5 hashes, this paper contains a practical example of breaking the MD5 hash algorithm by performing a dictionary attack. In the countermeasures section, better techniques for securely hashing passwords are given.

In order to meet the aim, it is necessary to achieve the following objectives:

  1. Briefly address the issue of storing passwords.
  2. Explain the basics of one-way hashing and password hashing.
  3. Provide a concise introduction of the MD5 algorithm and its main weakness for password hashing.
  4. Describe the process of breaking password hashes.
  5. Demonstrate the cracking of the MD5 hashing algorithm using password cracking software.
  6. Provide better alternatives for securely hashing passwords.

2 Procedure

2.1 Overview of Procedure

The purpose of the procedure is to demonstrate the insecurity of the MD5 hashing algorithm for password hashing. As explained earlier in the paper, the speed of the MD5 algorithm is too high for securely hashing passwords. In this section, a dictionary attack is performed as a practical example. For the demonstration purposes, the selected different strength passwords are insecure. The selected passwords are hashed using the MD5 hashing algorithm.

A password cracking tool hashcat (v5.1.0) and the RockYou wordlist are going to be used to perform a dictionary attack against the MD5 hashes. Hashcat is a password recovery tool, commonly used for performing various password attacks. RockYou is one of the most popular wordlists that contains a huge collection of leaked passwords. Both, cracking tool hashcat and the RockYou wordlist are publicly available online.

The hardware used in the example is Intel i7-2600k 4 GHz Processor using 4 cores out of 8.

2.2 Procedure

  1. 10 selected passwords are saved in a file called “PlaintextPasswords”.
Figure 7: Selected password list
  1. Using a simple shell script, hash each password from the “PlaintextPasswords” wordlist using the MD5 hashing algorithm and save it in a file “MD5hashes”.
file://c:\users\pc\appdata\local\temp\tmpo_2wt4\1.png
Figure 8: Shell script “PlaintextToMD5Hash”

By using the for loop, the command below is executed the number of words in the wordlist.

 do echo -n "$i"| md5sum | tr -d " -" >> MD5hashes 

Explanation of the command:

do echo -n "$i"| md5sum | tr -d " -" >> MD5hashes
  • The -n part of the command removes the new line added after a word; it is important to remove the new line because it is not a part of the actual password.
  • The “$i” accesses the value of the word in the wordlist.
  • The command md5sum performs the hashing.
  • The tr -d ” -” removes unwanted characters such as spaces and hyphens.
Figure 9: Hashes of selected passwords
  1. The following command is entered to start the dictionary attack:
hashcat -m 0 MD5hashes /usr/share/wordlists/rockyou.txt -o CrackedPasswords

Explanation:

  • The option -m 0 specifies to use MD5 as the hash type.
  • The part -o CrackedPasswords commands to save the recovered hashes in a file “CrackedPasswords”.

2.3 Results

  1. The dictionary attack was successful and hashcat cracked 10 out of total 10 passwords. It only took about 6 seconds to crack every selected password in the list. More impressively, 6 seconds was enough to go through 11,489,280 different passwords.
file://c:\users\pc\appdata\local\temp\tmpo_2wt4\1.png
Figure 10: Cracking process finished
  1. The cracked passwords are saved in a hash:password format as it can be seen from Figure 11.
Figure 11: List of cracked passwords and password hashes

3 Discussion

3.1 General Discussion

The practical example demonstrates how simple and effortless it is to break MD5 password hashes, especially if the wanted password is in a wordlist of leaked passwords. The main weakness of the MD5 is that it is a fast and memory-conserving hashing algorithm. The speed of a hashing algorithm has a significant impact on how secure the stored passwords are. It means that faster hashing leads to faster brute-force and rainbow table attacks. As seen from the practical example, with modern CPUs and GPUs it is feasible to compute millions, or even billions, MD5 hashes per second. For these reasons, it is insecure to use the MD5 algorithm for hashing and storing users’ passwords, albeit some companies continue to implement it to their databases. In conclusion, the conducted research approves that the MD5 hashing algorithm must not be used for password storing.

3.2 Countermeasures

As explained in the paper, the MD5 hashing algorithm is considered broken and must not be used for hashing passwords. Furthermore, there are several ways to improve the security of storing passwords. In this section, two common methods are explained.

3.2.1 Adding Salt

When hashing passwords, if two users use the same password, their password hash is identical (See Table 1: Alice and Bob use the same password, as a result, both users’ hash is identical). There are a few reasons why hashing passwords without salt is a bad practice. In such a system, a perpetrator can compare hash x with each hash in the database and if matched, there is a great chance that the found password hash x is going to reoccur again because users tend to use the same passwords. Moreover, hash tables and rainbow tables are effective because each password is hashed the exact same way.

Table 1: Alice and Bob use the same password, as a result, both users’ hash is identical

However, it is imperative to note that adding salt to a fast hashing algorithm such as the MD5 does not make it a great option for password hashing because of its high-speed. Nevertheless, it is important to understand the basics of salting, as it is recommended to add salt to any hash function.

A great way to minimise the threat of password attacks is to add a unique string of characters, called salt, to each user’s password. By doing that, none of the users will share the same hash. Consequently, hash and rainbow tables become useless. Adding salt makes pre-computing rainbow tables highly impractical, as adding salt forces an attacker to recompute hashes for every keyword. Moreover, using salt leads to brute-force attacks becoming infeasible. Normally, to speed up a brute-force attack, a perpetrator guesses a password and then tries matching the found password hash to each hash in the list of wanted password hashes. However, with added salt, it forces an attacker to guess each password which makes the brute-force attacks more time and power consuming.

Fortunately, adding salt to the hashing process does not complicate human-computer interaction. The user does not have to remember the salt, instead, the system adds the salt to the password and then hashes it (See Figure 12: Basic password-based user authentication with implementation of salt).

Figure 12: Basic password-based user authentication with implementation of salt

The salt could be stored in plain text, as the system looks up the salt every time a user tries to log in. In summary, it is great practice to use a unique salt for each user; that way it is impossible for users to share the same hash.

3.2.2 Benefits of Bcrypt

As explained and demonstrated in the paper, the MD5 hashing algorithm was designed to be computationally fast. Previously mentioned, with modern computer hardware it is feasible to compute millions of MD5 hashes per second. Thus, even if salt is added to the MD5 hashing process, it is too fast for modern hardware. Consequently, a slow hash function should be used to make the attacks impractical and exceedingly slow. As stated earlier in the paper, the safety of the password highly depends on the speed of the hash function. Therefore, a secure hashing algorithm should have a feature to alter it to make the hashing slower. By being able to control the speed of the hash function, it can continually be adapted in password security system despite the evolution of computer hardware.

Bcrypt – password hashing algorithm based on Blowfish cipher – was designed by Niels Provos and David Mazières. The main benefit of bcrypt is that the hashing process can be slowed down. Bcrypt uses a key element of the Blowfish cipher which is a very slow key exchange. Because of that, bcrypt has a feature to alter the computational power cost to calculate a hash, called a work or a cost factor. To clarify, a work factor is the number of rounds the hashing is performed. For instance, a work factor of 10 means that the data is going to be hashed 10 times. This feature allows to continue using the same hashing algorithm alongside the advancement of computer hardware. Therefore, by using bcrypt it is possible to slow down password attacks despite the evolving hardware. Besides the crucial cost factor feature, another key part of bcrypt is its requirement to add salt by default. Implementing bcrypt in a password system results in a requirement of additional computational power. However, the payoff is certainly worth it because the password attacks will also require a substantial amount of time and computational power.

3.3 Future Work

In this paper, a security issue of MD5 hashing algorithm for password hashing was discussed. Unfortunately, there are more hashing algorithms that should not be adapted in a modern password security platform. Thus, it would be useful to research and analyse other insecure cryptographic hashing algorithms that are used for hashing passwords.

As demonstrated in the paper, performed dictionary attack was successful and cracked 10 out of total 10 passwords. However, the selected passwords were weak and insecure for demonstration purposes. It would be interesting to look at the results if strong passwords were selected.

As mentioned earlier, the speed of the hash function has a significant impact on how secure the hashed passwords are. Moreover, it would be interesting to carry out a research on the speed of other password hashing algorithms with modern high-end hardware.

Lastly, in comparison to the MD5 hashing algorithm, it would be beneficial to analyse secure password hashing algorithms such as PBKDF2, bcrypt and scrypt.

References

Abadi, M., Lomas, T.M. and Needham, R. (1997). ‘Strenghtening Passwords.’ Digital Equipment Corporation Systems Research Center [SRC].

Ferguson, N., Schneier, B. and Kohno, T. (2010) ‘Cryptography Engineering – Design Principles and Practical Applications’. Indianapolis: Wiley Publishing.

H. Kumar (2013) ‘Rainbow table to crack password using MD5 hashing algorithm’. doi:10.1109/CICT.2013.6558135.

Horalek, J. (2016) ‘Analysis of the use of Rainbow Tables to break hash’, Analysis of the use of Rainbow Tables to break hash. doi:10.3233/JIFS-169147.

Kim, P. (2015) ‘The Hacker Playbook 2: Practical Guide To Penetration Testing’. North Charleston: Secure Planet LLC.

Kioon, M. (2013) ‘Security Analysis of MD5 algorithm in Password Storage’.

Manber, U. (1996). ‘A Simple Scheme to Make Passwords Based on One-Way Functions Much Harder to Crack’, pp. 171-176. doi:10.1109/TIT.1980.1056220.

Oechslin, P. (2003) ‘Making a Faster Cryptanalytic Time-Memory Trade-Off’. doi:0.1007/978-3-540-45146-4_36.

Provos, N., Mazières, D. (1999) ‘A Future-Adaptable Password Scheme’. Available at: https://www.usenix.org/legacy/publications/library/proceedings/usenix99/provos/provos.pdf (Accessed 19 March 2019).

Rivest, R. (1992) ‘The MD5 Message-Digest Algorithm’. doi:10.17487/RFC1321.

Schneier, B. (2004) ‘Cryptanalysis of MD5 and SHA: Time for a New Standard’. Available at: https://www.schneier.com/essays/archives/2004/08/cryptanalysis_of_md5.html (Accessed 8 March 2019).

Schneier, B. (2015) ‘Applied Cryptography, Second Edition: Protocols, Algorithms, and Source Code in C, 20th Anniversary Edition’. Chichester: Wiley Publishing.

Z. Yong-Xia (2010) ‘MD5 Research’. doi:10.1109/MMIT.2010.186.