security: hash OTPs and encrypt TOTP secrets at rest with idempotent migration (#7, #8)#590
Merged
lakhansamani merged 4 commits intomainfrom Apr 7, 2026
Merged
Conversation
…migration (#7, #8) #7 — OTPs stored in plaintext schemas/otp.go.Otp held the raw 6-digit code. An offline DB dump revealed every active password-reset / signup-verification / MFA code, allowing account takeover for any user with an outstanding OTP request. OTPs are short-lived (minutes) so a reversible primitive is unnecessary — the verifier only needs to recompute the digest from the candidate. Add: - crypto.HashOTP(plain, key) — HMAC-SHA256 hex - crypto.VerifyOTPHash(plain, stored, key) — constant-time compare All write sites now hash before UpsertOTP. The plaintext stays in a local variable for the email/SMS body. Affected handlers: - internal/graphql/login.go (generateOTP closure: store hash, restore plaintext on the returned struct so callers can read otpData.Otp) - internal/graphql/signup.go (mobile SMS verification) - internal/graphql/forgot_password.go (mobile reset OTP) - internal/graphql/resend_otp.go (same closure pattern as login) Verify sites accept BOTH hash and legacy plaintext for the duration of a rolling upgrade — old in-flight rows expire within minutes anyway: - internal/graphql/verify_otp.go - internal/graphql/reset_password.go Migration: NONE required. All in-flight plaintext rows naturally expire during the upgrade window. #8 — TOTP secrets stored in plaintext schemas/authenticators.go.Secret held the base32 TOTP shared secret. TOTP secrets are long-lived (enrolled once, used forever) so a DB dump enables permanent MFA bypass. Add a versioned ciphertext format (so migration is idempotent and a future algorithm rotation is straightforward): - crypto.EncryptTOTPSecret(plain, key) → "enc:v1:" + AES-256-GCM ciphertext - crypto.DecryptTOTPSecret(stored, key) — transparent legacy passthrough - crypto.IsEncryptedTOTPSecret(stored) — prefix check used by lazy migration The TOTP authenticator now: - Encrypts the secret in Generate() before storing - Decrypts in Validate() before computing the expected code - Lazy-migrates legacy plaintext rows on the first successful Validate by re-encrypting and updating the row in place. The lazy approach avoids having to add a ListAuthenticators API to all 13 storage providers. DEPRECATED: the lazy migration code is intended to be removed two minor versions after this lands; by then all rolling upgrades will have completed. EncryptionKey is wired into totp.Dependencies from cfg.JWTSecret so the TOTP package does not need direct access to the broader Config struct. Properties: - Idempotent: encrypted rows are recognised by the enc:v1: prefix and never re-encrypted - Safe under partial migration / mid-rollout failure: read path handles both forms; write path always produces the new form - Mixed cluster during a rolling deploy: old nodes still read enc:v1: values successfully because they fail open in DecryptTOTPSecret only if the secret happens to be plaintext-shaped — actual mixed clusters will need both versions to ship together, but the read path is forwards-compatible
Adds dedicated unit and integration tests for the at-rest hardening so
each piece of the new behaviour is exercised explicitly:
internal/crypto/otp_test.go (new)
- HashOTP round-trip with the original plaintext
- VerifyOTPHash rejects: wrong plaintext, wrong server key, mismatched
digest length
- VerifyOTPHash MUST reject the stored digest itself when used as input
(otherwise the digest is a replayable credential)
- HashOTP is deterministic for a fixed (plaintext, key) pair
internal/crypto/totp_test.go (new)
- EncryptTOTPSecret produces enc:v1: ciphertext that decrypts to the
original plaintext
- AES-GCM nonce randomness — same input yields different ciphertext
- DecryptTOTPSecret transparently passes through legacy plaintext rows
- DecryptTOTPSecret detects tampered ciphertext (GCM authentication)
- DecryptTOTPSecret fails on the wrong key
- EncryptTOTPSecret on empty input is a no-op
- IsEncryptedTOTPSecret prefix detection
internal/integration_tests/totp_at_rest_test.go (new)
- Generate() writes the secret as enc:v1:<ciphertext>, never as the raw
base32 string. Decrypted value matches the plaintext returned to the
caller for QR rendering.
- Validate() accepts a code computed from the encrypted secret.
- Validate() lazy-migrates a legacy plaintext row: a row written without
the enc:v1: prefix still validates, AND the row is rewritten into the
enc:v1: form on the first successful Validate.
- Validate() rejects an obviously wrong code on an encrypted row.
- Validate() is idempotent on already-encrypted rows: the underlying
plaintext does not change across multiple validations.
internal/integration_tests/{verify_otp,resend_otp,reset_password}_test.go
- Reworked to seed a known plaintext / digest pair via UpsertOTP and
verify with the plaintext, instead of reading the stored digest back
out of the database and replaying it. The previous flow only passed
because of the legacy plaintext fallback (see below).
- Added new "should fail when stored hash is replayed as input"
subtest to lock the security guarantee in.
internal/integration_tests/test_helper.go
- Expose Config and AuthenticatorProvider on testSetup so tests can
access cfg.JWTSecret and the totp authenticator. Config was already
in the struct but never assigned by initTestSetup; this fixes that
oversight.
internal/graphql/verify_otp.go and reset_password.go
- Drop the legacy plaintext fallback in the OTP comparison. The
fallback was originally added to keep in-flight OTPs verifying during
a rolling upgrade, but it ALSO accepted the stored digest itself as
valid input — turning the digest into a replayable credential for
anyone with DB read access. OTPs expire in minutes so the rolling-
upgrade window is naturally bounded; users on either side simply
request a fresh code.
c18a8de to
18841a6
Compare
The lazy migration that rewrites legacy plaintext TOTP rows into the
enc:v1: ciphertext format was always-on. This is unsafe for a rolling
deploy from a pre-encryption release: replicas still on the old version
do not understand the enc:v1: prefix and call totp.Validate(code, "enc:v1:...")
which treats the prefix as part of a base32 secret and fails to decode,
locking out users who happen to be routed to those replicas after a new
replica has migrated their row.
Changes:
1) New CLI flag --enable-totp-migration (Config.EnableTOTPMigration),
default OFF. Plumbed through internal/authenticators.New() into
totp.Dependencies.EnableLazyMigration. The READ path is unaffected:
DecryptTOTPSecret transparently handles both legacy plaintext and
enc:v1: rows regardless of the flag, so a fresh deploy never breaks
in-flight users.
2) Recommended deploy sequence is documented inline on
totp.Dependencies.EnableLazyMigration:
a. Roll out this release to all replicas with the flag OFF.
b. Confirm every replica is on the new version.
c. Restart with --enable-totp-migration=true.
d. After enough time for active users to validate at least once,
the flag may be left on indefinitely (no-op for already-encrypted
rows).
3) totp.Validate() is now best-effort on the post-validate row write.
A failure to update the authenticator row no longer fails the login —
the user supplied a valid TOTP code, refusing it because of a transient
DB hiccup or a migration encrypt error would be a worse outcome than
a delayed VerifiedAt or a delayed migration. Failures are logged with
structured fields so operators can spot them.
4) Decrypt failure on read is logged loudly with a hint that the most
likely cause is a --jwt-secret rotation since enrollment.
5) Concurrency: two replicas observing the same legacy row may both
decrypt, re-encrypt, and write before either commits. The two writes
carry semantically identical plaintext under different AES-GCM nonces;
last writer wins, the row ends up in enc:v1: form, and the
IsEncryptedTOTPSecret short-circuit prevents any further re-encryption.
This is documented in the function comment.
Tests:
internal/integration_tests/totp_at_rest_test.go gains two subtests:
- "Validate without migration leaves legacy plaintext rows unchanged"
spins up a second test setup with EnableTOTPMigration=false and
asserts that a legacy plaintext row still validates BUT is not
rewritten.
- "Validate succeeds even if the migration write fails (best-effort)"
documents the contract that a successful Validate returns true
regardless of the row update outcome.
The "Validate lazy-migrates a legacy plaintext row" subtest now runs
against a setup with EnableTOTPMigration=true, so both flag states are
covered end-to-end.
…n Validate
Replaces the --enable-totp-migration flag with an in-Validate
branch that handles both forms in one place. The migration is now
always-on and the storage helper is no longer permitted to silently
pass legacy values through.
Why
---
The flag-gated lazy migration was over-engineered for the typical
deploy model. It existed to protect rolling deploys across multiple
replicas, but the protection was conditional: operators had to remember
to flip the flag at the right time, and the implicit "DecryptTOTPSecret
returns plaintext for legacy rows" behaviour was easy to miss when
reading the code.
The new shape is closer to the obvious algorithm: decrypt if you can,
fall back to the raw value if you can't, migrate if the fallback
matched.
Changes
-------
internal/crypto/totp.go
- DecryptTOTPSecret is now strict: it returns the new sentinel
ErrTOTPSecretNotEncrypted on a non-prefixed row instead of silently
echoing the input. The "decrypt" function now does exactly one thing,
which is much easier to reason about.
- Test updated to assert the sentinel error.
internal/authenticators/totp/totp.go::Validate
- Branches explicitly on the decrypt result. Three outcomes:
decErr == nil → enc:v1: row, use plaintext
errors.Is(decErr, ErrTOTPSecretNotEncrypted) → legacy row, use raw value, migrate
other decErr → fail closed, log loudly
- Migration is always attempted on a successful legacy validation; the
write is best-effort (failures logged, login still succeeds).
- The function comment now documents the rolling-deploy tradeoff
explicitly so operators with multi-replica clusters can plan around
it (atomic deploy preferred; brief maintenance window otherwise).
internal/authenticators/totp/provider.go
internal/authenticators/providers.go
internal/config/config.go
cmd/root.go
- Drop EnableLazyMigration / Config.EnableTOTPMigration / the
--enable-totp-migration CLI flag. There is no operator-facing knob
any more.
internal/integration_tests/totp_at_rest_test.go
- Drop the "without migration" subtest (the flag it tested no longer
exists).
- The "lazy migration" subtest is unchanged structurally but now
exercises the always-on path.
Concurrency, write semantics, and idempotency are unchanged from the
previous commit:
- Two replicas migrating the same row write semantically identical
plaintexts under different AES-GCM nonces; last writer wins; the
IsEncryptedTOTPSecret short-circuit prevents re-encryption.
- A failed migration write does NOT fail the login.
- A decrypt failure on a prefixed row (e.g. wrong --jwt-secret) is
fail-closed.
Known tradeoff
--------------
For a rolling deploy from a pre-encryption release: as soon as one
replica handles a TOTP login, that user's row is in enc:v1: form. Any
old replica that subsequently reads the row will treat the prefix as
part of a base32 secret and fail. Either deploy atomically, or accept
a short window during which migrated users may be locked out of old
replicas (TOTP codes regenerate every 30s so the user-visible impact
is bounded).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
#7 — OTPs stored in plaintext
schemas/otp.go.Otpheld the raw 6-digit code. An offline DB dump revealed every active password-reset / signup-verification / MFA code, allowing account takeover for any user with an outstanding OTP request.#8 — TOTP secrets stored in plaintext
schemas/authenticators.go.Secretheld the base32 TOTP shared secret. TOTP secrets are long-lived (enrolled once, used forever) so a DB dump enables permanent MFA bypass.Approach
OTPs only need verification, not recovery, so a one-way hash is enough. TOTP secrets must be decryptable on every Validate, so AES-GCM with the existing HKDF-derived key.
Changes
OTP (#7)
crypto.HashOTP(plain, key)(HMAC-SHA256 hex) andcrypto.VerifyOTPHash(plain, stored, key)(constant-time).UpsertOTP. The plaintext stays in a local variable for the email/SMS body. Affected:internal/graphql/login.go(generateOTP closure: store hash, restore plaintext on the returned struct so callers can readotpData.Otpunchanged)internal/graphql/signup.gointernal/graphql/forgot_password.gointernal/graphql/resend_otp.gointernal/graphql/verify_otp.gointernal/graphql/reset_password.goTOTP (#8)
enc:v1:<base64(nonce||ciphertext)>. The prefix lets the read path transparently handle both new ciphertext rows and legacy plaintext, and lets future migrations recognise already-encrypted rows so they never re-encrypt them.EncryptTOTPSecret,DecryptTOTPSecret,IsEncryptedTOTPSecret.Generate()encrypts before storingValidate()decrypts (transparent legacy passthrough)ListAuthenticatorsAPI to all 13 storage providers.EncryptionKeywired intototp.Dependenciesfromcfg.JWTSecret.Deprecation note
The lazy-migration code is intentionally DEPRECATED for removal two minor versions after this lands. By then all rolling upgrades will have completed. Operators upgrading from a pre-encryption release must deploy this version once and let TOTP users sign in once before upgrading further.
Test plan
go build ./...go test ./internal/crypto/...TEST_DBS=sqlite go test -run "TestSignup|TestLogin|TestVerifyOtp|TestResendOtp|TestForgotPassword|TestResetPassword|TestVerifyEmail" ./internal/integration_tests/enc:v1: