Potential fix for code scanning alert no. 6: Use of a broken or weak cryptographic hashing algorithm on sensitive data#291
Potential fix for code scanning alert no. 6: Use of a broken or weak cryptographic hashing algorithm on sensitive data#291
Conversation
…cryptographic hashing algorithm on sensitive data Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
📝 WalkthroughSummary by CodeRabbit
WalkthroughThe Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
Greptile SummaryThis PR attempts to fix a code-scanning alert about weak cryptographic hashing of API keys by replacing the SHA-256 implementation of Key issues:
Confidence Score: 0/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant Middleware as ApiAuthMiddleware
participant Service as AuthService.hash_api_key
participant DB as persistence.api_keys.get_by_hash
Note over Service: SHA-256 (before PR): deterministic
Note over Service: Argon2id (after PR): random salt each call
Client->>Middleware: GET /protected (Bearer raw_key)
Middleware->>Service: hash_api_key(raw_key)
Service-->>Middleware: hash_B (new Argon2 hash, different salt)
Middleware->>DB: get_by_hash(hash_B)
Note over DB: Stored hash_A ≠ hash_B (different salts)
DB-->>Middleware: None (not found)
Middleware-->>Client: 401 Unauthorized ❌
Note over Middleware,DB: With SHA-256, hash_A == hash_B always<br/>With Argon2, hash_A ≠ hash_B (non-deterministic)
Last reviewed commit: c84e8ab |
| Argon2id hash string. | ||
| """ | ||
| return hashlib.sha256(raw_key.encode()).hexdigest() | ||
| return _hasher.hash(raw_key) |
There was a problem hiding this comment.
Argon2 is non-deterministic — breaks all API key lookups
Argon2 generates a cryptographically random salt on every call, so _hasher.hash("same-key") produces a different string each time. The authentication middleware (middleware.py:195–197) works by:
- Computing
key_hash = AuthService.hash_api_key(token)at request time. - Looking up
persistence.api_keys.get_by_hash(key_hash)using an exactWHERE key_hash = ?SQL query.
Because the Argon2 output at authentication time will never equal the Argon2 string that was stored when the key was created, every API key authentication request will return None (i.e., 401 Unauthorized). This is a complete regression for API key auth.
The existing unit test test_hash_deterministic in tests/unit/api/auth/test_service.py:155–158 also directly asserts h1 == h2 for two calls with the same input — that test will fail immediately with this change.
The correct approach for a hash that must support direct DB lookups is a keyed MAC (HMAC-SHA256) using a server-side secret, which is deterministic but still safe against offline attacks on the stored hash:
import hashlib, hmac
@staticmethod
def hash_api_key(raw_key: str, secret: str) -> str:
return hmac.new(secret.encode(), raw_key.encode(), hashlib.sha256).hexdigest()Argon2 (or bcrypt) is the right tool for password storage where verification is done with verify(hash, plain). For API keys stored as lookup tokens in a database, HMAC is the standard solution (see e.g. GitHub's personal access token model).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/api/auth/service.py
Line: 186
Comment:
**Argon2 is non-deterministic — breaks all API key lookups**
Argon2 generates a cryptographically random salt on every call, so `_hasher.hash("same-key")` produces a **different** string each time. The authentication middleware (`middleware.py:195–197`) works by:
1. Computing `key_hash = AuthService.hash_api_key(token)` at request time.
2. Looking up `persistence.api_keys.get_by_hash(key_hash)` using an exact `WHERE key_hash = ?` SQL query.
Because the Argon2 output at authentication time will never equal the Argon2 string that was stored when the key was created, **every API key authentication request will return `None` (i.e., 401 Unauthorized)**. This is a complete regression for API key auth.
The existing unit test `test_hash_deterministic` in `tests/unit/api/auth/test_service.py:155–158` also directly asserts `h1 == h2` for two calls with the same input — that test will fail immediately with this change.
The correct approach for a hash that must support direct DB lookups is a **keyed MAC (HMAC-SHA256)** using a server-side secret, which is deterministic but still safe against offline attacks on the stored hash:
```python
import hashlib, hmac
@staticmethod
def hash_api_key(raw_key: str, secret: str) -> str:
return hmac.new(secret.encode(), raw_key.encode(), hashlib.sha256).hexdigest()
```
Argon2 (or bcrypt) is the right tool for *password* storage where verification is done with `verify(hash, plain)`. For API keys stored as lookup tokens in a database, HMAC is the standard solution (see e.g. GitHub's personal access token model).
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Pull request overview
This PR aims to address code-scanning alert #6 by replacing fast SHA-256 hashing of API keys with an Argon2id-based hash, aligning API key handling with the project’s password hashing approach in the auth subsystem.
Changes:
- Update
AuthService.hash_api_key()to return an Argon2id hash string instead of a SHA-256 hex digest. - Adjust the method docstring to reflect the new hash format.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def hash_api_key(raw_key: str) -> str: | ||
| """Compute SHA-256 hex digest of a raw API key. | ||
| """Hash an API key using Argon2id. | ||
|
|
||
| Args: | ||
| raw_key: The plaintext API key. | ||
|
|
||
| Returns: | ||
| Lowercase hex digest. | ||
| Argon2id hash string. | ||
| """ | ||
| return hashlib.sha256(raw_key.encode()).hexdigest() | ||
| return _hasher.hash(raw_key) |
There was a problem hiding this comment.
Switching to _hasher.hash(raw_key) makes hash_api_key() non-deterministic (new random salt each call). The API key auth flow currently computes key_hash = AuthService.hash_api_key(token) and then does a DB lookup via api_keys.get_by_hash(key_hash), which will never match a previously stored hash and will break all API-key authentication. To use Argon2 here, the lookup needs to change (e.g., store a key id/prefix for lookup and then verify with _hasher.verify(stored_hash, raw_key)), or keep a deterministic keyed/KDF-based fingerprint specifically for lookup.
| @staticmethod | ||
| def hash_api_key(raw_key: str) -> str: | ||
| """Compute SHA-256 hex digest of a raw API key. | ||
| """Hash an API key using Argon2id. | ||
|
|
||
| Args: | ||
| raw_key: The plaintext API key. | ||
|
|
||
| Returns: | ||
| Lowercase hex digest. | ||
| Argon2id hash string. | ||
| """ | ||
| return hashlib.sha256(raw_key.encode()).hexdigest() | ||
| return _hasher.hash(raw_key) |
There was a problem hiding this comment.
hash_api_key() is used on the request path (called from async middleware). Argon2 hashing is CPU-expensive and this synchronous call will block the event loop / worker thread and can become a DoS vector under load. Consider providing an async/offloaded variant (similar to hash_password_async) or switching to a deterministic construction that’s safe for lookup and cheaper to compute per request (e.g., a keyed HMAC/KDF) while still addressing the scanning alert.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/api/auth/service.py`:
- Around line 176-186: The current hash_api_key(raw_key) uses Argon2 and is
non-deterministic; replace the flow by adding a verify_api_key(raw_key: str,
key_hash: str) -> bool that uses _hasher.verify(key_hash, raw_key) (mirroring
verify_password) and update any middleware that does key_hash =
AuthService.hash_api_key(token) then persistence.api_keys.get_by_hash(key_hash)
to instead fetch candidate API key records (e.g., by token identifier or iterate
stored keys) and call AuthService.verify_api_key(token, stored_hash); also keep
hash_api_key for creating new keys but document that existing SHA-256 hex-digest
keys will need special handling (add a fallback verify path that checks
sha256(raw_key) == stored_hash before trying Argon2.verify to preserve backward
compatibility).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c58c44ff-a062-46e1-bcdd-83fdec6980b7
📒 Files selected for processing (1)
src/ai_company/api/auth/service.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotations— Python 3.14 has PEP 649
Useexcept A, B:(no parentheses) — ruff enforces this on Python 3.14
Type hints: all public functions, mypy strict mode
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules)
Pydantic v2: use@computed_fieldfor derived values instead of storing + validating redundant fields (e.g.TokenUsage.total_tokens).
UseNotBlankStr(fromcore.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Line length: 88 characters (ruff)
Functions: < 50 lines, files < 800 lines
Files:
src/ai_company/api/auth/service.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Every module with business logic MUST have:from ai_company.observability import get_loggerthenlogger = get_logger(__name__)
Never useimport logging/logging.getLogger()/print()in application code
Always useloggeras the variable name (not_logger, notlog)
Always use constants from domain-specific modules underai_company.observability.eventsfor event names (e.g.PROVIDER_CALL_STARTfromevents.provider). Import directly:from ai_company.observability.events.<domain> import EVENT_CONSTANT
Always use structured logging kwargs:logger.info(EVENT, key=value)— neverlogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO
DEBUG logging for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Files:
src/ai_company/api/auth/service.py
{src,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:
example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases.
Files:
src/ai_company/api/auth/service.py
| @staticmethod | ||
| def hash_api_key(raw_key: str) -> str: | ||
| """Compute SHA-256 hex digest of a raw API key. | ||
| """Hash an API key using Argon2id. | ||
|
|
||
| Args: | ||
| raw_key: The plaintext API key. | ||
|
|
||
| Returns: | ||
| Lowercase hex digest. | ||
| Argon2id hash string. | ||
| """ | ||
| return hashlib.sha256(raw_key.encode()).hexdigest() | ||
| return _hasher.hash(raw_key) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Search for code that verifies API keys by comparing hashes directly
# Find usages of hash_api_key that might be doing direct comparison
rg -n -C5 'hash_api_key' --type py
# Find potential API key verification patterns
rg -n -C3 'api.*key.*hash|key_hash|verify.*api' --type py -iRepository: Aureliolo/synthorg
Length of output: 24031
🏁 Script executed:
# Get the actual implementation of hash_api_key and _hasher initialization
sed -n '1,30p' src/ai_company/api/auth/service.py
sed -n '176,190p' src/ai_company/api/auth/service.pyRepository: Aureliolo/synthorg
Length of output: 1092
🏁 Script executed:
# Check if there's any configuration of _hasher that affects determinism
rg -n "PasswordHasher" src/ai_company/api/auth/service.py -A 5 -B 2Repository: Aureliolo/synthorg
Length of output: 242
🏁 Script executed:
# Check the full test implementation to see if it's actually testing what we think
sed -n '154,165p' tests/unit/api/auth/test_service.pyRepository: Aureliolo/synthorg
Length of output: 485
🏁 Script executed:
# Search for verify_api_key method
rg -n "verify_api_key" --type pyRepository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
# Get full middleware verification logic
sed -n '180,220p' src/ai_company/api/auth/middleware.pyRepository: Aureliolo/synthorg
Length of output: 1290
🏁 Script executed:
# Check git history to see if this is a recent change
git log --oneline -20 src/ai_company/api/auth/service.pyRepository: Aureliolo/synthorg
Length of output: 188
🏁 Script executed:
# Look at the test more closely - maybe it's checking something else
cat -n tests/unit/api/auth/test_service.py | sed -n '150,170p'Repository: Aureliolo/synthorg
Length of output: 860
🏁 Script executed:
# Check if there's any custom PasswordHasher or if it's the standard argon2 one
rg -n "class.*Hasher|def hash" src/ai_company/api/auth/service.pyRepository: Aureliolo/synthorg
Length of output: 228
🏁 Script executed:
# Check if tests have any skip markers or if they would fail
rg -n "pytest.mark|@skip|xfail" tests/unit/api/auth/test_service.py -B 2 | grep -A 2 "TestApiKeyHashing"Repository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
# The core question: does argon2.PasswordHasher.hash() produce deterministic output?
# Let's check the argon2 library behavior with a simple test
python3 << 'EOF'
import argon2
hasher = argon2.PasswordHasher(
time_cost=3,
memory_cost=65536,
parallelism=4,
hash_len=32,
salt_len=16,
)
# Test if hash is deterministic
h1 = hasher.hash("test-key")
h2 = hasher.hash("test-key")
print(f"Hash 1: {h1}")
print(f"Hash 2: {h2}")
print(f"Are they equal? {h1 == h2}")
print(f"Can verify h1 with h2? ", end="")
try:
hasher.verify(h1, "test-key")
print("YES")
except:
print("NO")
EOFRepository: Aureliolo/synthorg
Length of output: 180
🏁 Script executed:
# Check what files were modified in the recent commit
git show --name-only c84e8abRepository: Aureliolo/synthorg
Length of output: 38812
🏁 Script executed:
# Check if the test file was modified in that commit
git show c84e8ab -- tests/unit/api/auth/test_service.py | head -100Repository: Aureliolo/synthorg
Length of output: 3416
🏁 Script executed:
# Look at the actual diff for hash_api_key changes
git show c84e8ab -- src/ai_company/api/auth/service.py | head -150Repository: Aureliolo/synthorg
Length of output: 4626
Critical: Argon2 non-determinism breaks API key verification flow.
Argon2 uses a random salt per call, so each _hasher.hash(raw_key) produces a different output for the same input. This breaks the middleware's lookup pattern: key_hash = AuthService.hash_api_key(token) followed by persistence.api_keys.get_by_hash(key_hash) will fail on repeat requests with the same token.
The newly added test_hash_deterministic expects deterministic hashing (assert h1 == h2), but standard Argon2 PasswordHasher cannot provide this guarantee — the test will fail when run.
A verify_api_key(raw_key: str, key_hash: str) -> bool method is required to verify tokens against stored hashes using _hasher.verify(), mirroring the existing verify_password pattern. The middleware must also be updated to call this method instead of re-hashing.
Additionally, this is a breaking change for existing API keys stored as SHA-256 hex digests, which will fail Argon2 verification.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/api/auth/service.py` around lines 176 - 186, The current
hash_api_key(raw_key) uses Argon2 and is non-deterministic; replace the flow by
adding a verify_api_key(raw_key: str, key_hash: str) -> bool that uses
_hasher.verify(key_hash, raw_key) (mirroring verify_password) and update any
middleware that does key_hash = AuthService.hash_api_key(token) then
persistence.api_keys.get_by_hash(key_hash) to instead fetch candidate API key
records (e.g., by token identifier or iterate stored keys) and call
AuthService.verify_api_key(token, stored_hash); also keep hash_api_key for
creating new keys but document that existing SHA-256 hex-digest keys will need
special handling (add a fallback verify path that checks sha256(raw_key) ==
stored_hash before trying Argon2.verify to preserve backward compatibility).
|
Closing: Copilot autofix is broken — Argon2 is non-deterministic (random salt per call), which breaks the WHERE key_hash = ? equality lookup used throughout the middleware. Will replace with a proper HMAC-SHA256 fix in a new cleanup/hardening PR. |
…gelog Replace plain SHA-256 API key hashing with HMAC-SHA256 using the server-side JWT secret, preventing offline brute-force of stored key hashes (resolves CodeQL alert #6). The broken Copilot autofix (PR #291, Argon2 — non-deterministic, broke all API key auth) was closed and replaced with the correct approach. Also removes the duplicate legacy 0.1.1 section from CHANGELOG.md left over from the ai-company → SynthOrg rename. Closes #291
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical code scanning alert by upgrading the API key hashing mechanism. It transitions from the less secure SHA-256 algorithm to the more robust and computationally expensive Argon2id, aligning API key security with existing password hashing practices and enhancing the overall cryptographic posture of sensitive data. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request aims to improve API key security by replacing SHA-256 with Argon2id. However, the current implementation introduces a critical flaw: it uses a non-deterministic hashing method where the application expects deterministic hashes, which will break API key authentication and lead to a Denial of Service for API users. Additionally, this change will break existing unit tests, requires updating outdated documentation for ApiKey.key_hash, and necessitates ensuring the API key verification logic uses a proper verification function like _hasher.verify().
| Argon2id hash string. | ||
| """ | ||
| return hashlib.sha256(raw_key.encode()).hexdigest() | ||
| return _hasher.hash(raw_key) |
There was a problem hiding this comment.
The change to use _hasher.hash is a security improvement, but it introduces a critical functional regression. argon2.PasswordHasher().hash() generates a unique random salt for every call, making the resulting hash non-deterministic. This will cause all API key authentication attempts to fail if the application relies on deterministic hashing for database lookups or direct comparison, leading to a Denial of Service (DoS) for API users. The existing unit test test_hash_deterministic explicitly confirms this functional regression.
To address this, the authentication logic must be updated to:
- Look up the API key by a non-sensitive identifier (e.g., a key ID or prefix).
- Verify the provided raw key against the stored Argon2 hash using
_hasher.verify(stored_hash, raw_key).
The test suite for API key hashing should also be updated. test_hash_deterministic should be replaced with tests that:
- Verify that hashing the same key multiple times produces different hashes.
- Verify that a generated hash can be correctly verified against the original key.
For example:
# In tests/unit/api/auth/test_service.py
def test_different_hashes_for_same_key(self) -> None:
h1 = AuthService.hash_api_key("same-key")
h2 = AuthService.hash_api_key("same-key")
assert h1 != h2
def test_hash_and_verify(self) -> None:
svc = _make_service()
raw_key = "my-api-key"
hashed_key = AuthService.hash_api_key(raw_key)
# Re-using verify_password as it uses the same hasher
assert svc.verify_password(raw_key, hashed_key)
assert not svc.verify_password("wrong-key", hashed_key)
Potential fix for https://github.com/Aureliolo/synthorg/security/code-scanning/6
In general terms, to fix this issue we should ensure that sensitive secrets like API keys are not stored using a single, fast hash such as SHA‑256. Instead, they should be stored using a password‑hashing/KDF algorithm designed to be computationally expensive and salted (e.g., Argon2, bcrypt, PBKDF2). Since this project already uses
argon2.PasswordHasherfor password hashing and it is configured at module level, we can reuse the same mechanism for API keys to avoid new dependencies and keep behavior consistent.The best minimal‑impact fix is:
AuthService.hash_api_keyto use the existing_hasher(Argon2id) instead ofhashlib.sha256. This meanskey_hashbecomes an Argon2 hash string (like passwords) rather than a simple hex digest.key_hashviaAuthService.hash_api_keyand then depend on middleware behavior, not on the internal format ofkey_hash. Thus, switchinghash_api_keyto Argon2 should be transparent to the tests; no change to their assertions is needed.Concretely:
src/ai_company/api/auth/service.py, lines 176–186, replace the SHA‑256 implementation ofhash_api_keywith_hasher.hash(raw_key)and adjust the docstring accordingly (it no longer returns a SHA‑256 hex digest).tests/unit/api/auth/test_middleware.pycan remain structurally the same; they callAuthService.hash_api_key(raw_key)to generatekey_hash, which will now be Argon2 instead of SHA‑256. No imports or new helpers are needed;argon2is already imported and_hasheris already defined at the top ofservice.py.Suggested fixes powered by Copilot Autofix. Review carefully before merging.