Hash functions are fundamentally about irreversible data condensation; they turn any input, from a single character to an entire movie, into a fixed-size string of seemingly random bytes, and there’s no going back.

Let’s see SHA-256 in action. Imagine you have a secret message.

import hashlib

message = "This is my super secret message!"
sha256_hash = hashlib.sha256(message.encode()).hexdigest()
print(f"Original message: {message}")
print(f"SHA-256 hash: {sha256_hash}")

Output:

Original message: This is my super secret message!
SHA-256 hash: c23f722175765797f4917f719d8d18f9f0f5b9b2d4f4e9b5f7d5d9f7f2e0d3e4

Now, what if we change just one character?

import hashlib

message_changed = "This is my super secret message?" # Added a question mark
sha256_hash_changed = hashlib.sha256(message_changed.encode()).hexdigest()
print(f"Changed message: {message_changed}")
print(f"SHA-256 hash (changed): {sha256_hash_changed}")

Output:

Changed message: This is my super secret message?
SHA-256 hash (changed): 079568722d4e7a1c3b5f8d0a9e1f2d3c4b5a6e7f8d9c0b1a2e3f4d5c6b7a8e9f

Notice how drastically the hash changed. This is the avalanche effect: a tiny input change results in a completely different output. This property is crucial for integrity checks. If you download a file and its SHA-256 hash matches the one provided by the source, you know the file hasn’t been tampered with during download.

Hash functions are built on a series of complex mathematical operations that mix and transform the input data. For SHA-256, this involves bitwise operations (like AND, OR, XOR), rotations, and additions, all applied in multiple rounds to a fixed-size block of data. The output of one round becomes the input for the next, ensuring that every bit of the input influences every bit of the output. This process is deterministic: the same input will always produce the same hash output.

The primary problem hash functions solve is data integrity and authenticity. They allow us to verify that data hasn’t been altered, either accidentally or maliciously. Beyond simple integrity checks, they are fundamental to digital signatures, password storage (hashing passwords before storing them, so even if the database is breached, the original passwords aren’t exposed), and blockchain technology, where they link blocks together and ensure the immutability of the ledger.

SHA-256 is part of the SHA-2 family, developed by the NSA. It produces a 256-bit (32-byte) hash. It’s widely used, considered secure for most applications today, and is the de facto standard for many security protocols. Its fixed output size makes it efficient for comparing large amounts of data.

SHA-3, on the other hand, is a completely different algorithm family, the result of a public competition run by NIST (National Institute of Standards and Technology). While it also produces fixed-size outputs (e.g., SHA3-256 for a 256-bit hash), its internal structure is based on a "sponging" mechanism called Keccak. This makes it fundamentally different from the Merkle–Damgård construction used in SHA-1 and SHA-2. The primary motivation for SHA-3 was to provide an alternative in case a weakness was discovered in SHA-2, offering cryptographic diversity.

When to use each? For new applications, SHA-256 is still the go-to. It’s well-understood, widely implemented, and has withstood extensive cryptanalysis. If you need a robust, general-purpose cryptographic hash function for integrity checks, digital signatures, or secure password storage, SHA-256 is an excellent choice.

SHA-3 becomes relevant when you need cryptographic diversity or are concerned about potential future vulnerabilities in SHA-2. Some high-security applications or protocols might mandate SHA-3 for this reason. It’s also a good option if you’re designing a system from scratch and want to leverage a newer, architecturally different algorithm. Performance can be a consideration; in some implementations, SHA-3 might be faster or slower than SHA-2 depending on the specific hardware and software.

A common misconception is that hash functions are encryption. They are not. Encryption is a two-way process; you can decrypt an encrypted message back to its original form if you have the key. Hash functions are one-way; you cannot reconstruct the original message from its hash. This irreversibility is their defining characteristic and what makes them useful for integrity and authentication.

The next step in understanding hashing involves exploring the concept of collision resistance and its implications for hash function security.

Want structured learning?

Take the full Cryptography course →