The most surprising thing about password hashing is that the goal isn’t to make it impossible to crack passwords, but to make cracking them impractically expensive for an attacker.
Imagine a user registers for your service. They type their password, "P@$$wOrd123!", into a form. Your web server receives this plaintext password. If your database is compromised, that’s the end of the world for that user’s password. So, instead of storing the password directly, you run it through a one-way function called a "hash function." This function produces a fixed-length string of characters that looks random, like 2a$12$Xg50R9gVl29j/s4v7T9hVe7g31Y3jF5G/kP8j9k2b0a2l9z8q5.5. This is the "hash."
Here’s a quick look at a user login flow. Let’s say a user tries to log in with their username and password.
- User Enters Credentials: The user submits their username and password through the login form.
- Server Receives: Your backend server gets the username and the plaintext password.
- Fetch User Record: The server queries the database for the user record associated with the provided username. It retrieves the stored password hash.
- Hash Input Password: The server takes the plaintext password the user just entered and runs it through the same hashing algorithm with the same parameters (like salt and cost factor) that were used when the password was originally set.
- Compare Hashes: The server compares the newly generated hash with the hash stored in the database.
- Authentication: If the hashes match, the user is authenticated. If not, they’re denied access.
The key is that even if an attacker gets the database, they only get the hashes, not the original passwords. To get the original passwords, they’d have to reverse the hash function, which is computationally infeasible by design.
Now, why do we need different hashing algorithms like bcrypt, Argon2, and scrypt? Because the "computationally infeasible" part is a moving target. Attackers have powerful hardware, including GPUs and ASICs, that are very good at brute-forcing simple hashes. Password hashing algorithms are designed to be slow. They incorporate a "cost factor" (often called "rounds" or "iterations") that makes them take a noticeable amount of time to compute, even on fast hardware. This cost factor is the primary lever you have to control the security.
Let’s look at the contenders:
bcrypt This is a classic and still very solid choice. It’s based on the Blowfish cipher and has been around for a long time, meaning it’s well-tested and understood. Its main strength is its resistance to GPU acceleration.
- How it works: bcrypt uses a salt (random data added to the password before hashing) and a cost factor (number of rounds). The cost factor is expressed as a power of 2, like
10for 2^10 rounds. A higher cost factor means more computation. - Example: When you hash a password, you might specify a cost of 12.
This would output something like:# Using the 'bcrypt' command-line tool for demonstration echo -n "P@$$wOrd123!" | bcrypt -a 2a -b 12 -o$2a$12$Xg50R9gVl29j/s4v7T9hVe7g31Y3jF5G/kP8j9k2b0a2l9z8q5.5The$2a$12$part indicates the algorithm (2a), the cost factor (12), and the salt is embedded within the hash itself. - Why it’s good: It’s designed to be slow and makes it difficult for attackers to use specialized hardware like GPUs to speed up cracking significantly. The cost factor can be increased over time as hardware gets faster.
- Common mistake: Using a cost factor that’s too low, like 4 or 6. For modern systems, a cost factor of
10or12is a good starting point, and you should aim to increase it over time.
scrypt scrypt was designed to be even more resistant to hardware acceleration than bcrypt. It does this by using a lot of memory in addition to CPU time.
- How it works: scrypt has three main parameters:
N(CPU/memory cost factor),r(block size), andp(parallelization factor). IncreasingNmakes it more computationally expensive and memory-intensive. - Example: A common configuration might be
N=16384,r=8,p=1.
The output would be a hash string containing these parameters and the salt.# Using a hypothetical scrypt CLI tool (actual tools may vary) # scrypt -N 16384 -r 8 -p 1 -o P@$$wOrd123! - Why it’s good: The memory-hard nature of scrypt makes it very expensive for attackers to build specialized hardware (like ASICs) that can perform many scrypt operations in parallel, as such hardware would require a massive amount of memory.
- Common mistake: Not understanding the trade-offs between
N,r, andp. IncreasingNsignificantly increases memory usage, which can be a concern for servers with limited RAM.N=16384is a common starting point.
Argon2 Argon2 is the winner of the Password Hashing Competition and is generally considered the current state-of-the-art. It’s highly configurable and offers excellent resistance to various attack vectors, including GPU and ASIC acceleration. It comes in three variants: Argon2d, Argon2i, and Argon2id.
- How it works: Argon2 has parameters for memory cost (
m), time cost (t- number of iterations), and parallelism (p).- Argon2d: Data-dependent, best against GPU cracking.
- Argon2i: Data-independent, best against side-channel attacks.
- Argon2id: A hybrid approach, recommended for general use as it offers resistance to both GPU and side-channel attacks.
- Example (Argon2id): Let’s say
m=65536(64MB),t=3(iterations),p=4(parallelism).
The output would be an Argon2 hash string.# Using a hypothetical Argon2 CLI tool (actual tools may vary) # argon2id -m 65536 -t 3 -p 4 -o P@$$wOrd123! - Why it’s good: It’s designed to be tunable for different hardware and threat models. Its memory-hard and computationally intensive nature makes it very difficult and expensive for attackers to crack passwords at scale.
- Common mistake: Choosing Argon2d or Argon2i when Argon2id is generally preferred for most web applications due to its balanced security. Also, setting the memory or time costs too low. For a typical web server,
m=65536(64MB),t=3, andp=4are good starting points, but these should be tuned based on your server’s capabilities and security requirements.
When implementing password hashing, you should store the salt and the algorithm parameters along with the hash itself. This is because when a user logs in, you need to know how to re-hash their entered password to compare it to the stored hash. All modern libraries and formats (like the ones shown in the examples) embed this information directly into the resulting hash string.
The next hurdle you’ll face is managing password policies, such as enforcing complexity requirements and handling password resets securely.