Cryptography is the science of protecting information so that only the intended parties can read it, verify it, or trust where it came from. It powers nearly everything you do online, from logging into your bank to checking that a downloaded file has not been tampered with. This guide walks through the core ideas, with hashing and hash functions front and center, since that is the area where our own research left a mark: in 2017 we produced the first practical SHA-1 collision, proving a widely used hash function was broken in practice and not just in theory.
What Cryptography Actually Does
At its heart, cryptography solves a small set of related problems. It keeps data confidential so outsiders cannot read it. It protects integrity so you can tell when data has been altered. It provides authentication so you know who you are really talking to. And it supports non-repudiation, meaning someone cannot later deny they signed or sent something.
Two families of tools do most of this work: hash functions and encryption. People often confuse them, so it helps to pin down the difference early.
Hashing vs Encryption: The Key Distinction
Encryption is reversible. You scramble data with a key, and anyone holding the right key can unscramble it back to the original. The whole point is that the message can be recovered.
Hashing is one-way. A hash function takes an input of any size and produces a fixed-length fingerprint of it. There is no key, and there is no “unhashing” to get the original back. You use hashing when you want to verify something without storing or transmitting the thing itself.
A quick way to remember it: encryption keeps a secret you intend to reveal later; hashing creates a fingerprint you never plan to reverse.
What a Hash Function Does
A cryptographic hash function takes arbitrary data and returns a short, fixed-size string of bytes, usually shown as hexadecimal. Good ones share several properties.
Deterministic and Fixed-Length
The same input always yields the same output, every time, on every machine. And no matter whether you feed in one byte or a full movie file, the digest is always the same length. SHA-256, for example, always returns 256 bits (64 hex characters).
One-Way (Preimage Resistance)
Given a digest, it should be computationally infeasible to find an input that produces it. You can go forward easily but not backward. This is why password systems store hashes rather than the passwords themselves.
The Avalanche Effect
Change a single bit of the input and roughly half the output bits flip. The new digest looks completely unrelated to the old one. This property means a hash cannot leak hints about how similar two inputs were.
Collision Resistance
A collision is two different inputs that produce the same digest. Because outputs are a fixed size and inputs are unlimited, collisions must exist mathematically. The security promise is only that nobody can find one within any practical amount of computing time. When that promise breaks, the function is considered broken, which is exactly what happened to SHA-1.
The SHA Family and MD5
Most hash functions you will meet in the wild belong to a handful of well-known designs.
MD5 produces a 128-bit digest and was once everywhere. It is now thoroughly broken: collisions can be generated in seconds on a laptop. It still appears as a non-security checksum, but it must never be used where an attacker could benefit from forging a match.
SHA-1 produces a 160-bit digest and was the workhorse of the web for years. Our SHAttered work demonstrated the first real-world collision, using two distinct PDF files that hashed to the same SHA-1 value. That proof pushed the industry to retire it for certificates, signatures, and version control trust anchors.
SHA-256 is part of the SHA-2 family and is the current default for most applications. With a 256-bit output and no known practical attacks, it underpins TLS certificates, Bitcoin, and software signing. Our dedicated SHA-256 explainer digs into how it is built and why it has held up.
SHA-3 is a newer standard based on a completely different internal design (the Keccak sponge construction) rather than the Merkle-Damgard structure used by MD5, SHA-1, and SHA-2. It was standardized as a backup, so that a future weakness in SHA-2 would not leave everyone stranded.
| Hash function | Output size | Status |
|---|---|---|
| MD5 | 128-bit | Broken, avoid |
| SHA-1 | 160-bit | Broken (collision found), retired |
| SHA-256 (SHA-2) | 256-bit | Secure, recommended |
| SHA-3 | 224 to 512-bit | Secure, alternative design |
Encryption: Symmetric vs Asymmetric
Encryption splits into two approaches that solve different parts of the puzzle, and real systems usually combine them.
Symmetric Encryption (AES)
With symmetric encryption, the same secret key both locks and unlocks the data. It is fast and well suited to bulk data, which is why it handles the actual payload in most secure connections. The standard here is AES (Advanced Encryption Standard), available in 128, 192, and 256-bit key sizes and trusted for everything from disk encryption to government secrets. The catch is key distribution: both sides must already share the secret, and getting it to them safely is its own problem. Our AES guide covers how the cipher works in detail.
Asymmetric Encryption (RSA)
Asymmetric, or public-key, cryptography solves the distribution problem with a pair of mathematically linked keys. A public key, which you can share freely, encrypts data that only the matching private key can decrypt. RSA is the classic example, with its security resting on the difficulty of factoring very large numbers. Public-key methods are slower, so in practice they are used to exchange a symmetric key or to sign data, after which fast symmetric encryption takes over. The public-key cryptography overview explains the key-exchange dance more fully.
Digital Signatures
Signatures combine hashing and public-key cryptography to prove authorship and integrity at once. To sign, you hash the message, then encrypt that hash with your private key. Anyone with your public key can decrypt the signature back to the hash, hash the message themselves, and compare. If the two match, the message is genuinely yours and has not been changed.
This is precisely why a broken hash function is dangerous for signatures. If an attacker can craft two documents with the same hash, a signature on the harmless one is also a valid signature on the malicious one. That attack scenario is what made the SHA-1 collision more than an academic curiosity. The digital signatures explainer walks through the full mechanism and its failure modes.
Where This Shows Up in Real Life
These ideas are not abstract. They run quietly under the surface of ordinary computing.
- TLS / HTTPS: The padlock in your browser relies on asymmetric crypto to authenticate the server and agree on a symmetric key, then AES to encrypt the session, with hashes verifying integrity along the way.
- Passwords: Sensible services never store your password. They store a salted hash, so a database breach does not hand attackers your actual credentials.
- Bitcoin and blockchains: SHA-256 chains blocks together and secures mining, while digital signatures authorize every transaction.
- Software integrity: Download pages publish hashes (and signatures) so you can confirm an installer was not swapped out or corrupted in transit.
Each of these gets its own deeper treatment across the cluster, but they all lean on the same building blocks described above.
Frequently Asked Questions
Is hashing a type of encryption?
No. Encryption is reversible with a key, while hashing is a one-way fingerprint with no way back. They are often used together, but they are different tools for different jobs.
Why is SHA-1 considered broken if it still produces a hash?
It still produces output, but researchers (including our team) found a way to generate two different inputs with the same SHA-1 digest. Once collisions are practical, the function can no longer be trusted for signatures or certificates, even though it technically still runs.
Should I use MD5 for anything?
Only as a basic, non-security checksum to catch accidental corruption. Never use it where an attacker could gain from forging a matching hash, since MD5 collisions are trivial to produce today.
What hash function should I use instead?
SHA-256 is the safe default for most needs. SHA-3 is a sound alternative built on a different design, and for password storage specifically you want a purpose-built, slow function such as bcrypt, scrypt, or Argon2 rather than a raw fast hash.




