The most surprising truth about hashing passwords in Python is that you’re probably doing it wrong if you’re using the built-in hash() function.
Let’s see it in action. Imagine a user Alice wants to log in. Our application needs to verify her password without ever storing it in plain text.
import hashlib
import os
def hash_password(password):
# Generate a salt: a random string unique to each password
salt = os.urandom(16)
# Hash the password using SHA-256 and the salt
hashed_password = hashlib.pbkdf2_hmac('sha256', password.encode('utf-8'), salt, 100000)
# Return the salt and the hashed password, both as hex strings for storage
return salt.hex(), hashed_password.hex()
def verify_password(stored_password_hex, salt_hex, provided_password):
# Convert hex strings back to bytes
salt = bytes.fromhex(salt_hex)
stored_password = bytes.fromhex(stored_password_hex)
# Hash the provided password with the stored salt
hashed_provided_password = hashlib.pbkdf2_hmac('sha256', provided_password.encode('utf-8'), salt, 100000)
# Compare the hashes. If they match, the password is correct.
return stored_password == hashed_provided_password
# --- Alice's registration ---
alice_password = "supersecretpassword123"
alice_salt_hex, alice_hashed_password_hex = hash_password(alice_password)
print(f"Stored Salt (hex): {alice_salt_hex}")
print(f"Stored Hashed Password (hex): {alice_hashed_password_hex}")
# --- Alice's login attempt ---
# Scenario 1: Correct password
provided_password_correct = "supersecretpassword123"
is_correct = verify_password(alice_hashed_password_hex, alice_salt_hex, provided_password_correct)
print(f"Login with correct password: {is_correct}") # Output: True
# Scenario 2: Incorrect password
provided_password_incorrect = "wrongpassword456"
is_correct = verify_password(alice_hashed_password_hex, alice_salt_hex, provided_password_incorrect)
print(f"Login with incorrect password: {is_correct}") # Output: False
This code demonstrates the core mechanics: generating a unique salt for each password, then using a strong, slow hashing algorithm (PBKDF2 with SHA-256) to create the final hash. When a user tries to log in, we retrieve their salt and hash, and re-hash the provided password using that same salt and algorithm. If the resulting hash matches the stored one, the password is correct.
The problem this solves is obvious: storing passwords in plain text is a catastrophic security risk. If your database is breached, every user’s account is immediately compromised. Hashing, when done correctly, means that even if an attacker gets your database, they can’t easily get the original passwords.
Internally, hashlib.pbkdf2_hmac is a key derivation function. It’s designed to be computationally expensive, meaning it takes a significant amount of time and CPU resources to compute a single hash. This is crucial. Simple hashing algorithms like SHA-256 are very fast. If an attacker gets a list of hashes, they can use brute-force or dictionary attacks to try millions of password guesses per second against these fast algorithms. PBKDF2, by introducing a high iteration count (100,000 in our example) and a unique salt for each hash, dramatically slows down these attacks. For every password guess, the attacker has to perform 100,000 SHA-256 computations with a specific salt. This makes brute-forcing infeasible for strong passwords.
The salt is equally important. Without it, if two users happen to have the same password, their stored hashes would be identical. An attacker could then use pre-computed "rainbow tables" for common hashes to quickly identify those passwords. A unique salt ensures that even identical passwords result in vastly different stored hashes, defeating rainbow table attacks and making each hash unique to its user.
The correct way involves using a cryptographically secure, slow hashing algorithm with a unique salt for each password. The hashlib module in Python provides pbkdf2_hmac, which is a widely recommended standard. You should never use the built-in hash() function for password storage; it’s designed for in-memory hashing and is not cryptographically secure or salted. The iteration count (the 100000 argument in pbkdf2_hmac) is a tunable parameter. You should set it as high as you can tolerate without impacting your login performance. On modern hardware, this might be 100,000 or even higher. The algorithm type ('sha256') is also important; SHA-256 is a good choice, but newer algorithms like SHA-512 are also viable.
The next step after correctly hashing passwords is to consider how you’ll manage user sessions and prevent common web vulnerabilities like Cross-Site Scripting (XSS) and Cross-Site Request Forgery (CSRF).