[MOD-13847] Swap C numeric range tree with Rust implementation#8306
[MOD-13847] Swap C numeric range tree with Rust implementation#8306LukeMathWalker merged 9 commits intomasterfrom
Conversation
5ab7173 to
a95aa68
Compare
69d255a to
2d66944
Compare
a95aa68 to
b514f01
Compare
|
❌ The last analysis has failed. |
b514f01 to
c629c5b
Compare
2d66944 to
1fa3b06
Compare
c629c5b to
5c61e20
Compare
4f38267 to
e873c7f
Compare
151e4d1 to
d9136d3
Compare
e873c7f to
0aa6d28
Compare
d9136d3 to
609478b
Compare
0aa6d28 to
5ec972a
Compare
609478b to
ba9c5b1
Compare
5ec972a to
5de5df1
Compare
ba9c5b1 to
d93c532
Compare
5de5df1 to
494fed2
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #8306 +/- ##
==========================================
- Coverage 82.89% 82.38% -0.52%
==========================================
Files 433 440 +7
Lines 60961 60839 -122
Branches 18662 18998 +336
==========================================
- Hits 50535 50122 -413
- Misses 10239 10528 +289
- Partials 187 189 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
494fed2 to
4c0ce4b
Compare
d93c532 to
a75af40
Compare
a75af40 to
7c2bdfb
Compare
fa9da32 to
91d9770
Compare
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
src/redisearch_rs/rqe_iterators/tests/integration/inverted_index/utils.rs
Outdated
Show resolved
Hide resolved
src/redisearch_rs/c_entrypoint/iterators_ffi/src/inverted_index.rs
Outdated
Show resolved
Hide resolved
src/redisearch_rs/rqe_iterators/tests/integration/inverted_index/mod.rs
Outdated
Show resolved
Hide resolved
src/redisearch_rs/rqe_iterators/tests/integration/inverted_index/numeric.rs
Show resolved
Hide resolved
meiravgri
left a comment
There was a problem hiding this comment.
Exciting! few comments :)
src/fork_gc.c
Outdated
| } | ||
| FGC_SEND_VAR(ctx->gc, info->curPtr); | ||
| if (info->type == RSFLDTYPE_TAG) { | ||
| FGC_SEND_VAR(ctx->gc, info->curPtr); | ||
| FGC_sendBuffer(ctx->gc, info->tagValue, info->tagLen); | ||
| } |
There was a problem hiding this comment.
can we change the if (info->type == RSFLDTYPE_TAG) to an assert now that it's used only for tag fields?
There was a problem hiding this comment.
Yes, I have also renamed it (alongside the other related functions) to reflect their current usage.
src/fork_gc.c
Outdated
| if (!sentHeader) { | ||
| // Send the field header once (field_name + unique_id). | ||
| const char *fieldName = HiddenString_GetUnsafe(numericFields[i]->fieldName, NULL); | ||
| FGC_sendBuffer(gc, fieldName, strlen(fieldName)); | ||
| uint64_t uniqueId = NumericRangeTree_GetUniqueId(rt); | ||
| FGC_sendFixed(gc, &uniqueId, sizeof uniqueId); | ||
| sentHeader = 1; | ||
| } | ||
| nctx.last_block = NULL; | ||
| hll_clear(&nctx.majority_card); | ||
| hll_clear(&nctx.last_block_card); | ||
|
|
There was a problem hiding this comment.
can we do that outside the loop?
There was a problem hiding this comment.
We don't know in advance whether there will be a delta—the scanner iterator may yield no entries.
I'll change it to send the header unconditionally and then follow up with the terminator if there are no entries.
src/fork_gc.c
Outdated
|
|
||
| II_GCWriter wr = { .ctx = gc, .write = pipe_write_cb }; | ||
| // Send: has_more=1 + node_position + node_generation + entry_len + entry_data. | ||
| uint8_t hasMore = 1; |
There was a problem hiding this comment.
why do we need has_more? seems like we can get this information from FGC_sendTerminator
There was a problem hiding this comment.
Yes, we can do without. I have restructured the protocol to rely on a single final terminator.
src/fork_gc.c
Outdated
| status = FGC_CHILD_ERROR; | ||
| goto loop_cleanup; | ||
| } | ||
| entryData = rm_malloc(entryLen); |
There was a problem hiding this comment.
Is entryLen pretty constant or can be bounded?
If so we can pre-allocate the buffer for it (and only realloc if needed)
There was a problem hiding this comment.
We don't know the size in advance, since it depends on the block size and the number of values that must be GCed, but we can definitely reuse the buffer across loop iterations.
| ['Type', 'NUMERIC', 'Term', '-49.9 - 32.6', 'Number of reading operations', 3270, 'Estimated number of matches', 8260], | ||
| ['Type', 'NUMERIC', 'Term', '32.7 - 45.4', 'Number of reading operations', 1280, 'Estimated number of matches', 1280], | ||
| ['Type', 'NUMERIC', 'Term', '45.5 - 49.1', 'Number of reading operations', 370, 'Estimated number of matches', 370], | ||
| ['Type', 'NUMERIC', 'Term', '49.2 - 50', 'Number of reading operations', 90, 'Estimated number of matches', 90]]]] | ||
| # [1] (Profile data) -> [1] (`Shards` value) -> [0] (single shard/standalone) -> [2:4] (Iterators profile - key+value) |
There was a problem hiding this comment.
We are using a different hash function (wyhash rather than fnv), so cardinality estimation has changed, which in turn leads to different splits.
tests/pytests/test_numbers.py
Outdated
| # Allow ±1 tolerance due to tree rebalancing behavior with WyHash | ||
| env.assertAlmostEqual(expected_root_max_depth, numeric_tree['RootMaxDepth'], delta=1, message = "expected RootMaxDepth") | ||
|
|
There was a problem hiding this comment.
why isnt the tree depth consistent across runs for the same values?
There was a problem hiding this comment.
It is, changed it back to an exact match.
| env.expect('JSON.SET', 'doc:2', '$', json.dumps(["1.2,1.3", "1.4,1.5", "2,2"])).ok() | ||
|
|
||
| env.expect(debug_cmd(), 'DUMP_NUMIDX' ,'idx:top', 'val').equal([[1, 2]]) | ||
| env.expect(debug_cmd(), 'DUMP_NUMIDX' ,'idx:top', 'val').equal([[1, 1, 1, 2, 2, 2]]) |
There was a problem hiding this comment.
If you recall, we have changed the DUMP_NUMIDX logic in the pure Rust PRs to include all values when is_multi=true. Previously, it deduplicated them. That's why we have a diff here.
tests/pytests/test_gc.py
Outdated
|
|
||
| # Verify initial state | ||
| # Verify initial state - tree should have multiple buckets | ||
| # The exact number of buckets depends on the HLL hash function used |
There was a problem hiding this comment.
isnt the hash function constant?
There was a problem hiding this comment.
Changed the test back to have tight assertions on tree structure 👍🏻
tests/pytests/test_gc.py
Outdated
| env.assertGreaterEqual(initial_bucket_count, 2, message="Expected at least 2 buckets initially") | ||
| # Verify all documents are indexed | ||
| total_docs = sum(len(bucket) for bucket in res) | ||
| env.assertEqual(total_docs, InitialDocs) |
|
It was removed from the merge queue due to conflicts. It'll need another ✅ |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
| /// Resolve a [`NodeIndex`] to a mutable reference to the node. | ||
| pub fn node_mut(&mut self, idx: NodeIndex) -> &mut NumericRangeNode { | ||
| &mut self.nodes[idx] | ||
| } |
There was a problem hiding this comment.
Public node_mut lacks feature gate unlike range_mut
Low Severity
The new node_mut method is unconditionally pub, but it's only used externally in test code (rqe_iterators_test_utils). The companion range_mut method on NumericRangeNode was deliberately feature-gated behind test-utils (as requested in PR discussion), with pub(crate) in production and pub for tests. node_mut exposes mutable access to arbitrary tree nodes without the same gating, making the internal tree structure unnecessarily accessible to downstream crates in production builds.
Additional Locations (1)
Update the C fork_gc module to use the new streaming numeric GC scanner API instead of buffering deltas.
|
If I have to fix conflicts one more time... 😭 |
|




Describe the changes in the pull request
Replace the existing C implementation with the Rust one.
Stacked on #8401.
Which additional issues this PR fixes
Main objects this PR modified
Mark if applicable
Release Notes
If a release note is required (bug fix / new feature / enhancement), describe the user impact of this PR in the title.
Note
High Risk
High risk because it swaps the core numeric/geo indexing data structure, iterator creation, and ForkGC apply/compaction paths, which can affect query correctness, memory accounting, and GC safety under concurrency.
Overview
Replaces the C
NumericRangeTreeimplementation with the Rust-backed FFI version across indexing, querying, debug commands, memory accounting, and garbage collection.Numeric query execution now asks Rust to both find matching ranges and build per-range iterators via new
CreateNumericRangeIterators, andForkGCnow streams per-node GC entries from the child and applies them in the parent withNumericRangeTree_ApplyGcEntry, followed byNumericRangeTree_CompactIfSparseinstead of the previous empty-leaf trimming logic.Updates tests and debug outputs to match the new tree structure/ordering and memory sizing (
NumericRangeTree_BaseSize,NumericRangeTree_GetInvertedIndexesSize), and removes the old C++ numeric range tree test suite (test_cpp_range.cpp).Written by Cursor Bugbot for commit 4bfc792. This will update automatically on new commits. Configure here.