Key derivation is how you turn a password into a strong encryption key, but the real magic is that it’s designed to be slow and expensive.

Let’s see what that looks like in practice. Imagine we have a password, "mysecretpassword123", and we want to derive a key from it. We can use Python’s hashlib for PBKDF2 and bcrypt for, well, bcrypt.

import hashlib
import os
import bcrypt

password = b"mysecretpassword123"
salt = os.urandom(16) # A unique salt for each password

# PBKDF2
iterations_pbkdf2 = 100000
key_length_pbkdf2 = 32 # 32 bytes for AES-256
dk_pbkdf2 = hashlib.pbkdf2_hmac('sha256', password, salt, iterations_pbkdf2, dklen=key_length_pbkdf2)
print(f"PBKDF2 Derived Key (hex): {dk_pbkdf2.hex()}")
print(f"PBKDF2 Salt (hex): {salt.hex()}")

# bcrypt
# bcrypt automatically handles salting and rounds.
# The 'rounds' parameter controls the computational cost.
# A common starting point is 12.
rounds_bcrypt = 12
hashed_bcrypt = bcrypt.hashpw(password, bcrypt.gensalt(rounds=rounds_bcrypt))
print(f"bcrypt Hash: {hashed_bcrypt.decode()}")
# To verify:
# bcrypt.checkpw(password, hashed_bcrypt)

This code shows us generating a derived key from PBKDF2 and a hash from bcrypt. Notice how bcrypt.hashpw takes a salt generated by bcrypt.gensalt and produces a single output string that includes the salt and the cost factor (rounds). PBKDF2 requires us to manage the salt and iteration count separately.

The core problem these algorithms solve is password hashing. If you store passwords directly, or even just hash them with a fast algorithm like SHA-256, an attacker who steals your database can quickly try millions of passwords against those hashes using specialized hardware. Key derivation functions (KDFs) and password hashing functions (PHFs) like PBKDF2, bcrypt, and Argon2 are designed to resist this by being intentionally slow. They achieve this slowness through computational work (hashing many times) and by using more memory or parallelism, making brute-force attacks prohibitively expensive.

Let’s break down how they work.

PBKDF2 (Password-Based Key Derivation Function 2) is a foundational standard. It takes a password, a salt, and an iteration count, and repeatedly applies a pseudorandom function (PRF), typically HMAC-SHA256. The output of one iteration becomes the input for the next. The iteration count is crucial: higher numbers mean more work.

  • Problem it solves: Turns a weak password into a strong, fixed-length cryptographic key suitable for encryption, while also making brute-forcing harder than a simple hash.
  • How it works internally: PBKDF2(Password, Salt, IterationCount, KeyLength) basically computes HMAC(PRF, Password, Salt || INT(1) || ...) for the first block, then HMAC(PRF, Password, Salt || INT(2) || ...) for the second block, and so on, concatenating the results until KeyLength is reached. The INT(i) is a big-endian representation of the block index. Each HMAC call is repeated IterationCount times.
  • Levers you control:
    • Password: The user’s input.
    • Salt: A random value unique to each password. Prevents rainbow table attacks.
    • IterationCount: The number of times the PRF is applied. This is the primary knob for adjusting performance and security. Higher is better, but slower.
    • PRF: The underlying pseudorandom function, commonly HMAC-SHA256.
    • KeyLength: The desired length of the derived key.

bcrypt was one of the first widely adopted algorithms designed specifically for password hashing, built around the Blowfish cipher. It’s known for its adaptive nature, meaning you can increase its computational cost over time as hardware gets faster.

  • Problem it solves: Securely hashes passwords with a cost factor that can be increased, making offline attacks expensive.
  • How it works internally: bcrypt uses a modified Blowfish cipher. It incorporates the salt and the "cost factor" (number of rounds) directly into the hash output. The hashing process involves repeatedly applying the Blowfish cipher in a specific, complex way, using the salt and cost factor to influence the operations. The salt is generated alongside the hash and stored with it.
  • Levers you control:
    • Password: The user’s input.
    • Cost factor (rounds): This is the exponent of 2, representing the number of expensive iterations. A cost of 12 means 2^12 = 4096 iterations. Increasing this doubles the computation time. The salt is automatically generated and embedded in the output.

Argon2 is the winner of the Password Hashing Competition (PHC) and is considered the current state-of-the-art. It offers more flexibility and resistance against various attacks, especially on modern hardware. It has three variants: Argon2d, Argon2i, and Argon2id.

  • Problem it solves: Provides strong password hashing with tunable parameters for memory cost, time cost, and parallelism, making it resistant to GPU and ASIC-based attacks.
  • How it works internally: Argon2 is highly configurable. It uses a core function that involves repeatedly processing data blocks using a Blake2b hash function and a variant of the Blowfish cipher (like SAFER+). It has three main parameters:
    1. Memory Cost (m): The amount of RAM required.
    2. Time Cost (t): The number of passes over the memory.
    3. Parallelism (p): The number of threads that can be used. Argon2d is optimized for resistance to GPU cracking by using data-dependent memory access. Argon2i is designed to resist side-channel attacks by using data-independent memory access. Argon2id is a hybrid that combines the benefits of both.
  • Levers you control:
    • Password: The user’s input.
    • Salt: A unique random value.
    • Argon2 Variant: argon2d, argon2i, or argon2id. argon2id is generally recommended.
    • Memory Cost (m): The amount of memory (in KiB) the algorithm uses.
    • Time Cost (t): The number of passes over the memory.
    • Parallelism (p): The number of threads.

The most surprising thing about these functions is how they leverage the cost of computation and memory to achieve security. Instead of making things fast, they make them deliberately slow and resource-intensive. This directly counters the advantage attackers gain from specialized hardware like GPUs or ASICs, which can perform simple hashing operations millions of times faster than a general-purpose CPU. By forcing an attacker to spend significant time and resources per guess, they make brute-force attacks economically infeasible.

To verify a password against a stored hash (e.g., from bcrypt or Argon2), you would re-run the hashing algorithm with the user’s entered password and the stored salt and parameters, then compare the generated hash with the stored one. For PBKDF2, you’d derive a key and compare it directly if it’s used for encryption, or hash it and compare if it’s used for authentication.

A common mistake is to choose iteration counts or memory costs that are too low, rendering the protection weak. For example, using only 10,000 iterations for PBKDF2 or a cost factor of 10 for bcrypt is insufficient on modern hardware. The recommended values change over time as hardware evolves, so it’s essential to stay updated on best practices.

The next hurdle you’ll likely encounter is managing these parameters effectively across a growing system and understanding the trade-offs between security, performance, and user experience.

Want structured learning?

Take the full Cryptography course →