Skip to content

Use full hash-seed bytes when deriving SipHash seed#3654

Merged
madolson merged 3 commits into
valkey-io:unstablefrom
enjoy-binbin:fix_seed
May 18, 2026
Merged

Use full hash-seed bytes when deriving SipHash seed#3654
madolson merged 3 commits into
valkey-io:unstablefrom
enjoy-binbin:fix_seed

Conversation

@enjoy-binbin

Copy link
Copy Markdown
Member

The hash-seed config is an sds string and may contain embedded NUL
bytes (sdssplitargs preserves \xNN escapes inside double quotes).

In the old code getHashSeedFromString() used strlen() on it, so two
hash-seed values differing only after a \x00 byte collapsed to the
same SipHash seed, can break it.

hash-seed was added in #2608.

The hash-seed config is an sds string and may contain embedded NUL
bytes (sdssplitargs preserves \xNN escapes inside double quotes).

In the old code getHashSeedFromString() used strlen() on it, so two
hash-seed values differing only after a \x00 byte collapsed to the
same SipHash seed, can break it.

hash-seed was added in valkey-io#2608.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@codecov

codecov Bot commented May 11, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.70%. Comparing base (418c98b) to head (7f6b44b).
⚠️ Report is 8 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3654      +/-   ##
============================================
+ Coverage     76.66%   76.70%   +0.04%     
============================================
  Files           162      162              
  Lines         80647    80662      +15     
============================================
+ Hits          61825    61870      +45     
+ Misses        18822    18792      -30     
Files with missing lines Coverage Δ
src/server.c 89.47% <ø> (+0.03%) ⬆️
src/util.c 69.68% <100.00%> (ø)

... and 21 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sarthakaggarwal97 sarthakaggarwal97 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. So basically two nodes with different seeds could have ended with the same one.

Comment thread tests/integration/scan-family-consistency.tcl
Signed-off-by: Binbin <binloveplay1314@qq.com>
@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 434be04d-f5a6-4d02-bab5-26c983ca4045

📥 Commits

Reviewing files that changed from the base of the PR and between 6b68ffc and 7f6b44b.

📒 Files selected for processing (1)
  • tests/integration/scan-family-consistency.tcl
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/integration/scan-family-consistency.tcl

📝 Walkthrough

Walkthrough

The PR changes getHashSeedFromString to accept an explicit byte-length, updates its header and caller in startup to pass sdslen(server.hash_seed), and adds an integration test that verifies embedded NULL bytes in hash-seed values are preserved and affect behavior.

Changes

Hash seed explicit length parameter

Layer / File(s) Summary
Hash seed function signature update
src/util.h
The getHashSeedFromString declaration is updated to add a size_t value_len parameter and the previous shorter declaration is removed.
Hash seed implementation with explicit length
src/util.c
Implementation now accepts value_len and uses it in sha256_update instead of calling strlen(value), preserving digest finalization and output truncation.
Startup call site update
src/server.c
Startup now passes sdslen(server.hash_seed) as the explicit value_len argument to getHashSeedFromString.
Integration test for embedded NULL byte handling
tests/integration/scan-family-consistency.tcl
New test launches two servers with NULL-containing hash-seed overrides, validates differing configured seeds, exercises primary/replica scanning, and verifies config rewrite persistence across restarts.

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main change: updating hash-seed derivation to use full byte sequences instead of stopping at null bytes.
Description check ✅ Passed The description explains the problem (embedded NUL bytes being ignored by strlen) and the solution (using full byte sequences), which aligns with the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/util.c (1)

1055-1063: ⚡ Quick win

Clarify the binary-safe rationale in the function comment.

The code now correctly hashes value_len bytes, but the comment doesn’t explain that this is required to preserve embedded \x00 bytes and avoid strlen() truncation regressions.

Proposed comment update
-/* Populate the provided seed array by hashing the provided string with SHA256
- * and copying the first outlen bytes of the digest into the seed buffer. */
+/* Populate the provided seed array by hashing arbitrary input bytes with SHA256.
+ * `value` may contain embedded NUL bytes, so callers must pass `value_len`
+ * explicitly (do not infer length with strlen()).
+ * Copy the first outlen bytes of the digest into the seed buffer. */

As per coding guidelines, "Document why code exists, not just what it does; document all functions in C code".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/util.c` around lines 1055 - 1063, Update the comment for
getHashSeedFromString to explain why it hashes exactly value_len bytes
(binary-safe) rather than relying on strlen(): explicitly state that the
function must preserve embedded '\x00' bytes in the input and avoid truncation
regressions, and that it computes SHA-256 over the provided byte length then
copies the first outlen bytes of the digest into seed_array; reference the
function name getHashSeedFromString and the use of
SHA256_CTX/sha256_init/sha256_update/sha256_final to make the rationale and
behavior clear.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/util.c`:
- Around line 1055-1063: Update the comment for getHashSeedFromString to explain
why it hashes exactly value_len bytes (binary-safe) rather than relying on
strlen(): explicitly state that the function must preserve embedded '\x00' bytes
in the input and avoid truncation regressions, and that it computes SHA-256 over
the provided byte length then copies the first outlen bytes of the digest into
seed_array; reference the function name getHashSeedFromString and the use of
SHA256_CTX/sha256_init/sha256_update/sha256_final to make the rationale and
behavior clear.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8e070788-d3c4-4396-9ad0-969aef48533a

📥 Commits

Reviewing files that changed from the base of the PR and between ca9dee3 and 6b68ffc.

📒 Files selected for processing (4)
  • src/server.c
  • src/util.c
  • src/util.h
  • tests/integration/scan-family-consistency.tcl

@enjoy-binbin enjoy-binbin requested a review from zuiderkwast May 13, 2026 03:04

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Do we usually allow null bytes in config values?

Another option is to forbid it. For example, we can allow only printable ascii characters. We could add the check in isValidDbHashSeed.

@zuiderkwast zuiderkwast moved this to Needs Review in Valkey 9.1 May 13, 2026
@enjoy-binbin

Copy link
Copy Markdown
Member Author

Do we usually allow null bytes in config values?
Another option is to forbid it. For example, we can allow only printable ascii characters. We could add the check in isValidDbHashSeed.

Usually not, i just found it during a review. Yes, the other way is to forbid it, i am fine with it, i don't have the preference actually.

@zuiderkwast

Copy link
Copy Markdown
Contributor

Does CONFIG REWRITE work with null bytes inside config values?

We can merge this as it is. Explicit length is good in these functions anyway.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin

Copy link
Copy Markdown
Member Author

Does CONFIG REWRITE work with null bytes inside config values?

Yes, it work, i also added a test to check it.

@madolson madolson merged commit 5a604b3 into valkey-io:unstable May 18, 2026
62 checks passed
@github-project-automation github-project-automation Bot moved this from Needs Review to To be backported in Valkey 9.1 May 18, 2026
@enjoy-binbin enjoy-binbin deleted the fix_seed branch May 18, 2026 05:12
@enjoy-binbin enjoy-binbin added the release-notes This issue should get a line item in the release notes label May 18, 2026
valkeyrie-ops Bot pushed a commit that referenced this pull request May 18, 2026
The hash-seed config is an sds string and may contain embedded NUL
bytes (sdssplitargs preserves \xNN escapes inside double quotes).

In the old code getHashSeedFromString() used strlen() on it, so two
hash-seed values differing only after a \x00 byte collapsed to the
same SipHash seed, can break it.

hash-seed was added in #2608.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
alon-arenberg pushed a commit to alon-arenberg/valkey that referenced this pull request May 18, 2026
The hash-seed config is an sds string and may contain embedded NUL
bytes (sdssplitargs preserves \xNN escapes inside double quotes).

In the old code getHashSeedFromString() used strlen() on it, so two
hash-seed values differing only after a \x00 byte collapsed to the
same SipHash seed, can break it.

hash-seed was added in valkey-io#2608.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Alon Arenberg <alonare@amazon.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request May 18, 2026
The hash-seed config is an sds string and may contain embedded NUL
bytes (sdssplitargs preserves \xNN escapes inside double quotes).

In the old code getHashSeedFromString() used strlen() on it, so two
hash-seed values differing only after a \x00 byte collapsed to the
same SipHash seed, can break it.

hash-seed was added in #2608.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
madolson pushed a commit that referenced this pull request May 19, 2026
The hash-seed config is an sds string and may contain embedded NUL
bytes (sdssplitargs preserves \xNN escapes inside double quotes).

In the old code getHashSeedFromString() used strlen() on it, so two
hash-seed values differing only after a \x00 byte collapsed to the
same SipHash seed, can break it.

hash-seed was added in #2608.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
@sarthakaggarwal97 sarthakaggarwal97 moved this from To be backported to Done in Valkey 9.1 May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes This issue should get a line item in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants