The most surprising thing about storing passwords securely is that the "secure" part isn’t about keeping the password itself secret; it’s about making it impossible to recover the original password even if you have the stored representation.
Let’s watch this in action. Imagine a user, Alice, signs up for our service. She picks a password, "P@$$wOrd123!".
import bcrypt
password_plaintext = b"P@$$wOrd123!"
hashed_password = bcrypt.hashpw(password_plaintext, bcrypt.gensalt())
print(f"Plaintext password: {password_plaintext.decode()}")
print(f"Hashed password: {hashed_password.decode()}")
This outputs:
Plaintext password: P@$$wOrd123!
Hashed password: $2b$12$aBcDeFgHiJkLmNoPqRsTu.vWxyzABCDEFGHijklmnopqrstuvw.
Alice’s password is now transformed into that long, seemingly random string. When Alice logs in later, she enters "P@$$wOrd123!" again. Our system takes her entered password, hashes it using the same method and salt, and compares the new hash to the one stored in our database.
# Assume Alice logs in and enters her password again
login_attempt_plaintext = b"P@$$wOrd123!"
# The salt is part of the stored hash, so we extract it
salt_from_db = hashed_password[:29] # For bcrypt, salt is first 29 chars
login_hash = bcrypt.hashpw(login_attempt_plaintext, salt_from_db)
if login_hash == hashed_password:
print("Login successful!")
else:
print("Login failed.")
This outputs:
Login successful!
The system works by using a one-way cryptographic function. This function takes an input (the password) and produces an output (the hash). Crucially, it’s computationally infeasible to reverse the process – to take the hash and get back the original password. This is the core of secure password storage.
The problem this solves is the catastrophic data breach. If your database is compromised and attackers get their hands on the stored password representations, they can’t directly log in as your users. They have what’s called a "hash," not the password itself. While they might try to "crack" these hashes using brute-force or dictionary attacks (especially if you used weak hashing methods), the goal is to make this process so computationally expensive and time-consuming that it’s not practical.
Internally, modern password hashing functions like bcrypt, scrypt, or Argon2 involve two key components: a cryptographic hash function and a "salt." The salt is a unique, randomly generated string that’s added to the password before hashing. This is critical. If two users have the same password (e.g., "password123"), their stored hashes will be different because each will have a unique salt. This defeats pre-computed rainbow tables, which are massive databases of common passwords and their pre-calculated hashes. The hash function itself is designed to be computationally intensive, often with a "work factor" or "cost parameter" that can be increased over time as computing power grows, making brute-force attacks harder.
The exact levers you control are the hashing algorithm and its parameters. For bcrypt, this is typically the "cost factor," which is the number after $2b$ or $2y$. A higher number means more rounds of hashing, which takes longer and is more secure. For example, $2b$12$ is a common default. Increasing this to $2b$14$ or $2b$16$ significantly increases the computational cost for attackers. Argon2 offers even more knobs to turn, controlling memory usage, parallelism, and iterations.
The one thing most people don’t realize is that the salt isn’t a secret you need to protect separately. It’s actually stored alongside the hash, usually prepended to it. When you retrieve the hash from the database, you’re also retrieving the salt that was used to create it. This is why bcrypt.hashpw can take both the plaintext password and the stored hash (which contains the salt) as arguments for verification.
Once you’ve implemented secure hashing with salts and appropriate work factors, the next challenge you’ll face is managing password policies and user education around strong, unique passwords.