The most surprising truth about password hashing is that it’s not about making passwords unreadable in the traditional sense; it’s about making them unrecoverable even if you have the hashed version.
Imagine you’re building a login system. A user provides a username and password. To verify them, you need to store something that represents their password. Storing it in plain text is obviously a terrible idea – if your database gets breached, every user’s password is immediately exposed.
So, you hash it. Hashing is a one-way mathematical function that takes an input (your password) and produces a fixed-size output (the hash). Crucially, it’s designed to be computationally infeasible to reverse.
Here’s a simplified look at what happens when a user logs in:
- User Enters Password: Alice wants to log in. She types "s3cr3tP@ssw0rd" into the login form.
- System Hashes Input: Your server-side code takes "s3cr3tP@ssw0rd" and runs it through a hashing algorithm like bcrypt. This produces a hash string, let’s say
\$2b\$12\$abcdefghijklmnopqrstuv.wxyzABCDEFGH. - System Compares Hashes: Your system then looks up Alice’s record in the database. It finds the stored hash for her account, which might be
\$2b\$12\$zyxw.vutsrqponmlkjihgfedcba.ABCDEFGH. - Verification: The system compares the newly generated hash from her input with the stored hash. If they match, Alice is authenticated.
The "Criminal" Part: Why Plain Text is a Death Sentence
If you stored "s3cr3tP@ssw0rd" directly and your database was compromised, attackers would have Alice’s password. They could then try to log in as her on your site, or more dangerously, try that same password against other sites she might use (password reuse is rampant).
When you store only the hash, even if an attacker steals the database, they don’t have the original password. They have \$2b\$12\$abcdefghijklmnopqrstuv.wxyzABCDEFGH. This string looks like gibberish, but it’s the result of a specific, deterministic process.
How Hashing Actually Works (The Good Stuff)
Modern password hashing isn’t just a simple substitution cipher. It involves several key components to make it robust:
-
Salt: This is a unique, random string generated for each password. Before hashing, the salt is appended to the password. So, Alice’s password "s3cr3tP@ssw0rd" might be combined with a salt like
aBcDeFgHiJkLmNoPqRsTuVwXyZ0123456789. The combined string is then hashed. The salt is then stored alongside the hash in the database.- Why it matters: If two users have the same password (e.g., "password123"), they will produce completely different hashes because their salts will be different. This prevents attackers from using pre-computed "rainbow tables" – large databases of common passwords and their hashes – to quickly crack many passwords at once.
-
Work Factor (Cost/Rounds): Hashing algorithms like bcrypt, scrypt, and Argon2 are designed to be computationally expensive. This means they take a noticeable amount of time to compute (e.g., 100-500 milliseconds). This "slowness" is controlled by a configurable parameter, often called the "cost" or "rounds."
- Why it matters: For an attacker trying to brute-force passwords from a stolen hash database, slow hashing is their enemy. If it takes them 500ms to crack one password hash, cracking a million hashes could take days or weeks on powerful hardware. For your legitimate users, a slight delay during login is imperceptible. You can tune this work factor over time as hardware gets faster.
-
Algorithm Choice: You should use modern, well-vetted algorithms.
- MD5 and SHA-1: DO NOT USE THESE. They are too fast and have known cryptographic weaknesses, making them trivial to crack with modern hardware.
- bcrypt: A long-standing, robust choice. It’s a derivative of Blowfish and includes built-in salting.
- scrypt: Designed to be memory-hard, making it even more resistant to GPU-based attacks than bcrypt.
- Argon2: The winner of the Password Hashing Competition. It offers resistance to various attack vectors and is highly configurable.
Putting It Together: A Real-World Example (using bcrypt)
Let’s say Alice’s password is "pa$$w0rd!@#".
- Generate Salt: The system generates a random salt, for example,
\$2b\$12\$thisisareallylongsaltingredients. The\$2b\$indicates the bcrypt version, and\$12\$is the cost factor (log base 2 of the number of iterations). - Combine and Hash: The password and salt are combined:
pa$$w0rd!@#+thisisareallylongsaltingredients. This is then processed by the bcrypt algorithm with 12 rounds. - Store: The resulting hash might look like:
\$2b\$12\$thisisareallylongsaltingredients.andhereistheactualhashstringpart. Notice how the algorithm’s version, cost factor, and the salt are all embedded within the final hash string itself. This is a feature of bcrypt.
When Alice logs in again with "pa$$w0rd!@#":
- The system extracts the salt and cost factor from the stored hash (
\$2b\$12\$thisisareallylongsaltingredients.andhereistheactualhashstringpart). - It takes Alice’s entered password ("pa$$w0rd!@#") and the extracted salt.
- It re-hashes them using bcrypt with the same cost factor (12 rounds).
- It compares the newly generated hash to the stored hash. If they match, authentication succeeds.
The crucial part is that the attacker, seeing \$2b\$12\$thisisareallylongsaltingredients.andhereistheactualhashstringpart in a data breach, cannot easily get "pa$$w0rd!@#" from it. They would have to guess a password, combine it with the extracted salt, and then spend hundreds of milliseconds (or more, depending on the cost factor) re-hashing it to see if it matches the stored hash. Doing this for millions of users is prohibitively expensive for attackers.
The core idea is that password hashing is a race against computing power. By making the hashing process intentionally slow and unique per password, you ensure that even if your database is compromised, the attacker faces an insurmountable wall of computational work to recover individual user credentials.
The next logical step is understanding how to manage user sessions securely after they’ve logged in.