Skip to content

[MOD-13847] Swap C numeric range tree with Rust implementation#8306

Merged
LukeMathWalker merged 9 commits intomasterfrom
numeric-range-tree-3-swap
Feb 27, 2026
Merged

[MOD-13847] Swap C numeric range tree with Rust implementation#8306
LukeMathWalker merged 9 commits intomasterfrom
numeric-range-tree-3-swap

Conversation

@LukeMathWalker
Copy link
Copy Markdown
Collaborator

@LukeMathWalker LukeMathWalker commented Feb 8, 2026

Describe the changes in the pull request

Replace the existing C implementation with the Rust one.
Stacked on #8401.

Which additional issues this PR fixes

  1. MOD-...
  2. #...

Main objects this PR modified

  1. ...

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Release Notes

  • This PR requires release notes
  • This PR does not require release notes

If a release note is required (bug fix / new feature / enhancement), describe the user impact of this PR in the title.


Note

High Risk
High risk because it swaps the core numeric/geo indexing data structure, iterator creation, and ForkGC apply/compaction paths, which can affect query correctness, memory accounting, and GC safety under concurrency.

Overview
Replaces the C NumericRangeTree implementation with the Rust-backed FFI version across indexing, querying, debug commands, memory accounting, and garbage collection.

Numeric query execution now asks Rust to both find matching ranges and build per-range iterators via new CreateNumericRangeIterators, and ForkGC now streams per-node GC entries from the child and applies them in the parent with NumericRangeTree_ApplyGcEntry, followed by NumericRangeTree_CompactIfSparse instead of the previous empty-leaf trimming logic.

Updates tests and debug outputs to match the new tree structure/ordering and memory sizing (NumericRangeTree_BaseSize, NumericRangeTree_GetInvertedIndexesSize), and removes the old C++ numeric range tree test suite (test_cpp_range.cpp).

Written by Cursor Bugbot for commit 4bfc792. This will update automatically on new commits. Configure here.

@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-3-swap branch from 5ab7173 to a95aa68 Compare February 9, 2026 14:11
@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-2-ffi branch 2 times, most recently from 69d255a to 2d66944 Compare February 9, 2026 14:18
@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-3-swap branch from a95aa68 to b514f01 Compare February 9, 2026 14:18
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Feb 9, 2026

❌ The last analysis has failed.

See analysis details on SonarQube Cloud

@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-3-swap branch from b514f01 to c629c5b Compare February 9, 2026 14:59
@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-2-ffi branch from 2d66944 to 1fa3b06 Compare February 9, 2026 14:59
@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-3-swap branch from c629c5b to 5c61e20 Compare February 9, 2026 15:03
@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-2-ffi branch 2 times, most recently from 4f38267 to e873c7f Compare February 10, 2026 09:24
@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-3-swap branch 2 times, most recently from 151e4d1 to d9136d3 Compare February 10, 2026 10:57
@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-2-ffi branch from e873c7f to 0aa6d28 Compare February 10, 2026 10:57
@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-3-swap branch from d9136d3 to 609478b Compare February 10, 2026 12:18
@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-2-ffi branch from 0aa6d28 to 5ec972a Compare February 10, 2026 12:18
@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-3-swap branch from 609478b to ba9c5b1 Compare February 10, 2026 12:19
@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-2-ffi branch from 5ec972a to 5de5df1 Compare February 10, 2026 12:19
@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-3-swap branch from ba9c5b1 to d93c532 Compare February 10, 2026 12:36
@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-2-ffi branch from 5de5df1 to 494fed2 Compare February 10, 2026 12:36
@LukeMathWalker LukeMathWalker marked this pull request as ready for review February 10, 2026 13:07
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 10, 2026

Codecov Report

❌ Patch coverage is 86.25000% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.38%. Comparing base (bf50d19) to head (4bfc792).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/fork_gc.c 76.25% 19 Missing ⚠️
src/debug_commands.c 83.33% 1 Missing ⚠️
..._rs/c_entrypoint/numeric_range_tree_ffi/src/lib.rs 0.00% 1 Missing ⚠️
...ch_rs/rqe_iterators_test_utils/src/test_context.rs 96.42% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8306      +/-   ##
==========================================
- Coverage   82.89%   82.38%   -0.52%     
==========================================
  Files         433      440       +7     
  Lines       60961    60839     -122     
  Branches    18662    18998     +336     
==========================================
- Hits        50535    50122     -413     
- Misses      10239    10528     +289     
- Partials      187      189       +2     
Flag Coverage Δ
flow 83.64% <68.57%> (-0.26%) ⬇️
unit 51.60% <80.00%> (-0.36%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-2-ffi branch from 494fed2 to 4c0ce4b Compare February 11, 2026 11:19
@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-3-swap branch from d93c532 to a75af40 Compare February 11, 2026 11:19
@LukeMathWalker LukeMathWalker force-pushed the numeric-range-tree-2b-gc-ffi branch 3 times, most recently from fa9da32 to 91d9770 Compare February 23, 2026 11:05
@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Feb 23, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

Copy link
Copy Markdown
Collaborator

@meiravgri meiravgri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting! few comments :)

src/fork_gc.c Outdated
Comment on lines 234 to 238
}
FGC_SEND_VAR(ctx->gc, info->curPtr);
if (info->type == RSFLDTYPE_TAG) {
FGC_SEND_VAR(ctx->gc, info->curPtr);
FGC_sendBuffer(ctx->gc, info->tagValue, info->tagLen);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change the if (info->type == RSFLDTYPE_TAG) to an assert now that it's used only for tag fields?

Copy link
Copy Markdown
Collaborator Author

@LukeMathWalker LukeMathWalker Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have also renamed it (alongside the other related functions) to reflect their current usage.

src/fork_gc.c Outdated
Comment on lines 277 to 285
if (!sentHeader) {
// Send the field header once (field_name + unique_id).
const char *fieldName = HiddenString_GetUnsafe(numericFields[i]->fieldName, NULL);
FGC_sendBuffer(gc, fieldName, strlen(fieldName));
uint64_t uniqueId = NumericRangeTree_GetUniqueId(rt);
FGC_sendFixed(gc, &uniqueId, sizeof uniqueId);
sentHeader = 1;
}
nctx.last_block = NULL;
hll_clear(&nctx.majority_card);
hll_clear(&nctx.last_block_card);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do that outside the loop?

Copy link
Copy Markdown
Collaborator Author

@LukeMathWalker LukeMathWalker Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know in advance whether there will be a delta—the scanner iterator may yield no entries.
I'll change it to send the header unconditionally and then follow up with the terminator if there are no entries.

src/fork_gc.c Outdated

II_GCWriter wr = { .ctx = gc, .write = pipe_write_cb };
// Send: has_more=1 + node_position + node_generation + entry_len + entry_data.
uint8_t hasMore = 1;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need has_more? seems like we can get this information from FGC_sendTerminator

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do without. I have restructured the protocol to rely on a single final terminator.

src/fork_gc.c Outdated
status = FGC_CHILD_ERROR;
goto loop_cleanup;
}
entryData = rm_malloc(entryLen);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is entryLen pretty constant or can be bounded?
If so we can pre-allocate the buffer for it (and only realloc if needed)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know the size in advance, since it depends on the block size and the number of values that must be GCed, but we can definitely reuse the buffer across loop iterations.

Comment on lines +206 to 210
['Type', 'NUMERIC', 'Term', '-49.9 - 32.6', 'Number of reading operations', 3270, 'Estimated number of matches', 8260],
['Type', 'NUMERIC', 'Term', '32.7 - 45.4', 'Number of reading operations', 1280, 'Estimated number of matches', 1280],
['Type', 'NUMERIC', 'Term', '45.5 - 49.1', 'Number of reading operations', 370, 'Estimated number of matches', 370],
['Type', 'NUMERIC', 'Term', '49.2 - 50', 'Number of reading operations', 90, 'Estimated number of matches', 90]]]]
# [1] (Profile data) -> [1] (`Shards` value) -> [0] (single shard/standalone) -> [2:4] (Iterators profile - key+value)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the value changed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using a different hash function (wyhash rather than fnv), so cardinality estimation has changed, which in turn leads to different splits.

Comment on lines 49 to 51
# Allow ±1 tolerance due to tree rebalancing behavior with WyHash
env.assertAlmostEqual(expected_root_max_depth, numeric_tree['RootMaxDepth'], delta=1, message = "expected RootMaxDepth")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isnt the tree depth consistent across runs for the same values?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, changed it back to an exact match.

env.expect('JSON.SET', 'doc:2', '$', json.dumps(["1.2,1.3", "1.4,1.5", "2,2"])).ok()

env.expect(debug_cmd(), 'DUMP_NUMIDX' ,'idx:top', 'val').equal([[1, 2]])
env.expect(debug_cmd(), 'DUMP_NUMIDX' ,'idx:top', 'val').equal([[1, 1, 1, 2, 2, 2]])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

Copy link
Copy Markdown
Collaborator Author

@LukeMathWalker LukeMathWalker Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you recall, we have changed the DUMP_NUMIDX logic in the pure Rust PRs to include all values when is_multi=true. Previously, it deduplicated them. That's why we have a diff here.


# Verify initial state
# Verify initial state - tree should have multiple buckets
# The exact number of buckets depends on the HLL hash function used
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnt the hash function constant?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the test back to have tight assertions on tree structure 👍🏻

env.assertGreaterEqual(initial_bucket_count, 2, message="Expected at least 2 buckets initially")
# Verify all documents are indexed
total_docs = sum(len(bucket) for bucket in res)
env.assertEqual(total_docs, InitialDocs)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it be exact match?

meiravgri
meiravgri previously approved these changes Feb 26, 2026
meiravgri
meiravgri previously approved these changes Feb 26, 2026
@LukeMathWalker
Copy link
Copy Markdown
Collaborator Author

It was removed from the merge queue due to conflicts. It'll need another ✅

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

/// Resolve a [`NodeIndex`] to a mutable reference to the node.
pub fn node_mut(&mut self, idx: NodeIndex) -> &mut NumericRangeNode {
&mut self.nodes[idx]
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public node_mut lacks feature gate unlike range_mut

Low Severity

The new node_mut method is unconditionally pub, but it's only used externally in test code (rqe_iterators_test_utils). The companion range_mut method on NumericRangeNode was deliberately feature-gated behind test-utils (as requested in PR discussion), with pub(crate) in production and pub for tests. node_mut exposes mutable access to arbitrary tree nodes without the same gating, making the internal tree structure unnecessarily accessible to downstream crates in production builds.

Additional Locations (1)

Fix in Cursor Fix in Web

gdesmott
gdesmott previously approved these changes Feb 27, 2026
@LukeMathWalker
Copy link
Copy Markdown
Collaborator Author

If I have to fix conflicts one more time... 😭

@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants