Skip to content

Fix checkPrefixCollisionsOrReply returning non-zero on self-overlap#3583

Merged
madolson merged 6 commits into
valkey-io:unstablefrom
beebs-systap:issue_3582_prefix-collison
May 3, 2026
Merged

Fix checkPrefixCollisionsOrReply returning non-zero on self-overlap#3583
madolson merged 6 commits into
valkey-io:unstablefrom
beebs-systap:issue_3582_prefix-collison

Conversation

@beebs-systap

Copy link
Copy Markdown
Contributor

The self-overlap branch in checkPrefixCollisionsOrReply() returns the
outer loop index i instead of 0. When i > 0, the truthy return causes
the caller in networking.c to fall through to enableTracking() and
addReply(c, shared.ok), producing a double reply (protocol violation)
and registering overlapping prefixes.

Change return i to return 0 to match the function's documented boolean
contract and the existing-prefix collision check above it.

Fixes #3582

  The self-overlap branch in checkPrefixCollisionsOrReply() returns the
  outer loop index i instead of 0. When i > 0, the truthy return causes
  the caller in networking.c to fall through to enableTracking() and
  addReply(c, shared.ok), producing a double reply (protocol violation)
  and registering overlapping prefixes.

  Change return i to return 0 to match the function's documented boolean
  contract and the existing-prefix collision check above it.

  Fixes valkey-io#3582

Signed-off-by: Brad Bebee <beebs@amazon.com>
addReplyErrorFormat adds the client to server.clients_pending_write
via prepareClientToWrite. When freeTrackingTestClient frees the client
without unlinking it first, TearDown calls listRelease on the pending
write list which dereferences the freed client memory.

Initialize the embedded list node in createTrackingTestClient and
unlink the client from clients_pending_write in freeTrackingTestClient
before freeing, matching the pattern in networking.c freeClient.

Signed-off-by: Brad Bebee <beebs@amazon.com>
addReplyErrorFormat adds the client to server.clients_pending_write
via prepareClientToWrite. When freeTrackingTestClient frees the client
without unlinking it first, TearDown calls listRelease on the pending
write list which dereferences the freed client memory.

Initialize the embedded list node in createTrackingTestClient and
unlink the client from clients_pending_write in freeTrackingTestClient
before freeing, matching the pattern in networking.c freeClient.

Signed-off-by: Brad Bebee <beebs@amazon.com>
Set reply_off on the test client so prepareClientToWrite returns early
and the client is never added to server.clients_pending_write. This
avoids the heap-use-after-free where freeTrackingTestClient freed the
client struct containing an embedded listNode, and TearDown then called
listRelease which dereferenced the freed node.

Removes the clients_pending_write list from SetUp/TearDown and the
manual unlink logic from freeTrackingTestClient since they are no
longer needed.

Signed-off-by: Brad Bebee <beebs@amazon.com>

@murphyjacob4 murphyjacob4 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Comment thread src/unit/test_tracking.cpp Outdated
}

void TearDown() override {
raxFree(server.errors);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you need raxFreeWithCallback(server.errors, zfree); for ASAN

@codecov

codecov Bot commented Apr 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.70%. Comparing base (8091c6c) to head (8406041).
⚠️ Report is 17 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3583      +/-   ##
============================================
+ Coverage     76.42%   76.70%   +0.28%     
============================================
  Files           159      162       +3     
  Lines         80113    80612     +499     
============================================
+ Hits          61225    61835     +610     
+ Misses        18888    18777     -111     
Files with missing lines Coverage Δ
src/tracking.c 99.34% <100.00%> (ø)

... and 38 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Use raxFreeWithCallback(server.errors, zfree) instead of raxFree()
so that error stat structs allocated by afterErrorReply are properly
freed. Addresses review feedback from murphyjacob4.

Signed-off-by: Brad Bebee <beebs@amazon.com>

@rainsupreme rainsupreme left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice set of UTs too 😁

Comment thread src/unit/test_tracking.cpp Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are good, but now there is quite a bit of overlap with the TCL testing we are doing in unit/tracking. I think in this instance there is quite a bit of whitebox testing, accessing local variables, so I think extending the integration tests is probably the better long term maintainable here. We probably need to add better guidance on when to write unit vs integration.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, I think we only need these three new end tests to cover the existing main permutations:

      test {BCAST self-collision at later index is rejected} {
          set r [valkey_client]
          catch {$r CLIENT TRACKING ON BCAST PREFIX aaa PREFIX bbb PREFIX bbbc} output
          assert_match {ERR*Prefix*overlaps*} $output
          $r close
      }
  
      test {BCAST identical prefix at later index is rejected} {
          set r [valkey_client]
          catch {$r CLIENT TRACKING ON BCAST PREFIX xxx PREFIX yyy PREFIX yyy} output
          assert_match {ERR*Prefix*overlaps*} $output
          $r close
      }
  
      test {BCAST empty prefix collides with any prefix} {
          set r [valkey_client]
          catch {$r CLIENT TRACKING ON BCAST PREFIX {} PREFIX foo} output
          assert_match {ERR*Prefix*overlaps*} $output
          $r close
      }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madolson Do you want me to update to remove the other tests? And/or make them tcl integration tests?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove them. We have good coverage of them on TCL already, and it's a lot code that's effectively redundant. I don't want to discourage unit tests, but in this case it seems unecessary.

Replace the C++ unit tests in src/unit/test_tracking.cpp with 3 TCL
integration tests in tests/unit/tracking.tcl that verify both the
error reply and that tracking remains disabled after a prefix
collision at index > 0.

Fixes valkey-io#3582

Signed-off-by: Brad Bebee <beebs@amazon.com>
@madolson madolson merged commit 8891441 into valkey-io:unstable May 3, 2026
75 of 78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] CLIENT TRACKING ON BCAST enables tracking despite detecting duplicate prefix collision when overlap is not at index 0

4 participants