Use full hash-seed bytes when deriving SipHash seed#3654
Conversation
The hash-seed config is an sds string and may contain embedded NUL bytes (sdssplitargs preserves \xNN escapes inside double quotes). In the old code getHashSeedFromString() used strlen() on it, so two hash-seed values differing only after a \x00 byte collapsed to the same SipHash seed, can break it. hash-seed was added in valkey-io#2608. Signed-off-by: Binbin <binloveplay1314@qq.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #3654 +/- ##
============================================
+ Coverage 76.66% 76.70% +0.04%
============================================
Files 162 162
Lines 80647 80662 +15
============================================
+ Hits 61825 61870 +45
+ Misses 18822 18792 -30
🚀 New features to boost your workflow:
|
sarthakaggarwal97
left a comment
There was a problem hiding this comment.
Good catch. So basically two nodes with different seeds could have ended with the same one.
Signed-off-by: Binbin <binloveplay1314@qq.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe PR changes getHashSeedFromString to accept an explicit byte-length, updates its header and caller in startup to pass sdslen(server.hash_seed), and adds an integration test that verifies embedded NULL bytes in hash-seed values are preserved and affect behavior. ChangesHash seed explicit length parameter
🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/util.c (1)
1055-1063: ⚡ Quick winClarify the binary-safe rationale in the function comment.
The code now correctly hashes
value_lenbytes, but the comment doesn’t explain that this is required to preserve embedded\x00bytes and avoidstrlen()truncation regressions.Proposed comment update
-/* Populate the provided seed array by hashing the provided string with SHA256 - * and copying the first outlen bytes of the digest into the seed buffer. */ +/* Populate the provided seed array by hashing arbitrary input bytes with SHA256. + * `value` may contain embedded NUL bytes, so callers must pass `value_len` + * explicitly (do not infer length with strlen()). + * Copy the first outlen bytes of the digest into the seed buffer. */As per coding guidelines, "Document why code exists, not just what it does; document all functions in C code".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/util.c` around lines 1055 - 1063, Update the comment for getHashSeedFromString to explain why it hashes exactly value_len bytes (binary-safe) rather than relying on strlen(): explicitly state that the function must preserve embedded '\x00' bytes in the input and avoid truncation regressions, and that it computes SHA-256 over the provided byte length then copies the first outlen bytes of the digest into seed_array; reference the function name getHashSeedFromString and the use of SHA256_CTX/sha256_init/sha256_update/sha256_final to make the rationale and behavior clear.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/util.c`:
- Around line 1055-1063: Update the comment for getHashSeedFromString to explain
why it hashes exactly value_len bytes (binary-safe) rather than relying on
strlen(): explicitly state that the function must preserve embedded '\x00' bytes
in the input and avoid truncation regressions, and that it computes SHA-256 over
the provided byte length then copies the first outlen bytes of the digest into
seed_array; reference the function name getHashSeedFromString and the use of
SHA256_CTX/sha256_init/sha256_update/sha256_final to make the rationale and
behavior clear.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 8e070788-d3c4-4396-9ad0-969aef48533a
📒 Files selected for processing (4)
src/server.csrc/util.csrc/util.htests/integration/scan-family-consistency.tcl
zuiderkwast
left a comment
There was a problem hiding this comment.
Interesting. Do we usually allow null bytes in config values?
Another option is to forbid it. For example, we can allow only printable ascii characters. We could add the check in isValidDbHashSeed.
Usually not, i just found it during a review. Yes, the other way is to forbid it, i am fine with it, i don't have the preference actually. |
|
Does CONFIG REWRITE work with null bytes inside config values? We can merge this as it is. Explicit length is good in these functions anyway. |
Signed-off-by: Binbin <binloveplay1314@qq.com>
Yes, it work, i also added a test to check it. |
The hash-seed config is an sds string and may contain embedded NUL bytes (sdssplitargs preserves \xNN escapes inside double quotes). In the old code getHashSeedFromString() used strlen() on it, so two hash-seed values differing only after a \x00 byte collapsed to the same SipHash seed, can break it. hash-seed was added in #2608. --------- Signed-off-by: Binbin <binloveplay1314@qq.com>
The hash-seed config is an sds string and may contain embedded NUL bytes (sdssplitargs preserves \xNN escapes inside double quotes). In the old code getHashSeedFromString() used strlen() on it, so two hash-seed values differing only after a \x00 byte collapsed to the same SipHash seed, can break it. hash-seed was added in valkey-io#2608. --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Alon Arenberg <alonare@amazon.com>
The hash-seed config is an sds string and may contain embedded NUL bytes (sdssplitargs preserves \xNN escapes inside double quotes). In the old code getHashSeedFromString() used strlen() on it, so two hash-seed values differing only after a \x00 byte collapsed to the same SipHash seed, can break it. hash-seed was added in #2608. --------- Signed-off-by: Binbin <binloveplay1314@qq.com>
The hash-seed config is an sds string and may contain embedded NUL bytes (sdssplitargs preserves \xNN escapes inside double quotes). In the old code getHashSeedFromString() used strlen() on it, so two hash-seed values differing only after a \x00 byte collapsed to the same SipHash seed, can break it. hash-seed was added in #2608. --------- Signed-off-by: Binbin <binloveplay1314@qq.com>
The hash-seed config is an sds string and may contain embedded NUL
bytes (sdssplitargs preserves \xNN escapes inside double quotes).
In the old code getHashSeedFromString() used strlen() on it, so two
hash-seed values differing only after a \x00 byte collapsed to the
same SipHash seed, can break it.
hash-seed was added in #2608.