[MOD-13847] Swap C numeric range tree with Rust implementation by LukeMathWalker · Pull Request #8306 · RediSearch/RediSearch

LukeMathWalker · 2026-02-08T18:15:28Z

Describe the changes in the pull request

Replace the existing C implementation with the Rust one.
Stacked on #8401.

Which additional issues this PR fixes

MOD-...
#...

Main objects this PR modified

...

Mark if applicable

This PR introduces API changes
This PR introduces serialization changes

Release Notes

This PR requires release notes
This PR does not require release notes

If a release note is required (bug fix / new feature / enhancement), describe the user impact of this PR in the title.

Note

High Risk
High risk because it swaps the core numeric/geo indexing data structure, iterator creation, and ForkGC apply/compaction paths, which can affect query correctness, memory accounting, and GC safety under concurrency.

Overview
Replaces the C NumericRangeTree implementation with the Rust-backed FFI version across indexing, querying, debug commands, memory accounting, and garbage collection.

Numeric query execution now asks Rust to both find matching ranges and build per-range iterators via new CreateNumericRangeIterators, and ForkGC now streams per-node GC entries from the child and applies them in the parent with NumericRangeTree_ApplyGcEntry, followed by NumericRangeTree_CompactIfSparse instead of the previous empty-leaf trimming logic.

Updates tests and debug outputs to match the new tree structure/ordering and memory sizing (NumericRangeTree_BaseSize, NumericRangeTree_GetInvertedIndexesSize), and removes the old C++ numeric range tree test suite (test_cpp_range.cpp).

^{Written by Cursor Bugbot for commit 4bfc792. This will update automatically on new commits. Configure here.}

sonarqubecloud · 2026-02-09T14:22:48Z

❌ The last analysis has failed.

See analysis details on SonarQube Cloud

src/redisearch_rs/numeric_range_tree/src/tree/gc.rs

tests/cpptests/test_cpp_llapi.cpp

codecov · 2026-02-10T13:49:28Z

Codecov Report

❌ Patch coverage is 86.25000% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.38%. Comparing base (bf50d19) to head (4bfc792).
⚠️ Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
src/fork_gc.c	76.25%	19 Missing ⚠️
src/debug_commands.c	83.33%	1 Missing ⚠️
..._rs/c_entrypoint/numeric_range_tree_ffi/src/lib.rs	0.00%	1 Missing ⚠️
...ch_rs/rqe_iterators_test_utils/src/test_context.rs	96.42%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #8306      +/-   ##
==========================================
- Coverage   82.89%   82.38%   -0.52%     
==========================================
  Files         433      440       +7     
  Lines       60961    60839     -122     
  Branches    18662    18998     +336     
==========================================
- Hits        50535    50122     -413     
- Misses      10239    10528     +289     
- Partials      187      189       +2

Flag	Coverage Δ
flow	`83.64% <68.57%> (-0.26%)`	⬇️
unit	`51.60% <80.00%> (-0.36%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/iterators/inverted_index_iterator.c

src/fork_gc.c

jit-ci · 2026-02-23T17:13:01Z

🛡️ Jit Security Scan Results

✅ No security findings were detected in this PR

^{Security scan by Jit}

src/document.c

src/debug_commands.c

tests/cpptests/index_utils.cpp

src/redisearch_rs/rqe_iterators/tests/integration/inverted_index/utils.rs

src/iterators/inverted_index_iterator.c

src/redisearch_rs/c_entrypoint/iterators_ffi/src/inverted_index.rs

src/redisearch_rs/numeric_range_tree/src/node.rs

src/redisearch_rs/rqe_iterators/tests/integration/inverted_index/mod.rs

src/redisearch_rs/rqe_iterators/tests/integration/inverted_index/numeric.rs

src/numeric_index.c

src/fork_gc.c

meiravgri

Exciting! few comments :)

src/fork_gc.c

meiravgri · 2026-02-26T09:31:53Z

src/fork_gc.c

  }
-  FGC_SEND_VAR(ctx->gc, info->curPtr);
  if (info->type == RSFLDTYPE_TAG) {
+    FGC_SEND_VAR(ctx->gc, info->curPtr);
    FGC_sendBuffer(ctx->gc, info->tagValue, info->tagLen);
  }


can we change the if (info->type == RSFLDTYPE_TAG) to an assert now that it's used only for tag fields?

Yes, I have also renamed it (alongside the other related functions) to reflect their current usage.

meiravgri · 2026-02-26T09:56:33Z

src/fork_gc.c

+      if (!sentHeader) {
+        // Send the field header once (field_name + unique_id).
+        const char *fieldName = HiddenString_GetUnsafe(numericFields[i]->fieldName, NULL);
+        FGC_sendBuffer(gc, fieldName, strlen(fieldName));
+        uint64_t uniqueId = NumericRangeTree_GetUniqueId(rt);
+        FGC_sendFixed(gc, &uniqueId, sizeof uniqueId);
+        sentHeader = 1;
      }
-      nctx.last_block = NULL;
-      hll_clear(&nctx.majority_card);
-      hll_clear(&nctx.last_block_card);



can we do that outside the loop?

We don't know in advance whether there will be a delta—the scanner iterator may yield no entries.
I'll change it to send the header unconditionally and then follow up with the terminator if there are no entries.

meiravgri · 2026-02-26T11:33:41Z

src/fork_gc.c

-
-      II_GCWriter wr = { .ctx = gc, .write = pipe_write_cb };
+      // Send: has_more=1 + node_position + node_generation + entry_len + entry_data.
+      uint8_t hasMore = 1;


why do we need has_more? seems like we can get this information from FGC_sendTerminator

Yes, we can do without. I have restructured the protocol to rely on a single final terminator.

meiravgri · 2026-02-26T11:42:07Z

src/fork_gc.c

+      status = FGC_CHILD_ERROR;
+      goto loop_cleanup;
+    }
+    entryData = rm_malloc(entryLen);


Is entryLen pretty constant or can be bounded?
If so we can pre-allocate the buffer for it (and only realloc if needed)

We don't know the size in advance, since it depends on the block size and the number of values that must be GCed, but we can definitely reuse the buffer across loop iterations.

meiravgri · 2026-02-26T12:07:12Z

tests/pytests/test_profile.py

+                    ['Type', 'NUMERIC', 'Term', '-49.9 - 32.6', 'Number of reading operations', 3270, 'Estimated number of matches', 8260],
+                    ['Type', 'NUMERIC', 'Term', '32.7 - 45.4', 'Number of reading operations', 1280, 'Estimated number of matches', 1280],
+                    ['Type', 'NUMERIC', 'Term', '45.5 - 49.1', 'Number of reading operations', 370, 'Estimated number of matches', 370],
+                    ['Type', 'NUMERIC', 'Term', '49.2 - 50', 'Number of reading operations', 90, 'Estimated number of matches', 90]]]]
  # [1] (Profile data) -> [1] (`Shards` value) -> [0] (single shard/standalone) -> [2:4] (Iterators profile - key+value)


why the value changed?

We are using a different hash function (wyhash rather than fnv), so cardinality estimation has changed, which in turn leads to different splits.

meiravgri · 2026-02-26T12:09:28Z

tests/pytests/test_numbers.py

+        # Allow ±1 tolerance due to tree rebalancing behavior with WyHash
+        env.assertAlmostEqual(expected_root_max_depth, numeric_tree['RootMaxDepth'], delta=1, message = "expected RootMaxDepth")



why isnt the tree depth consistent across runs for the same values?

It is, changed it back to an exact match.

meiravgri · 2026-02-26T12:11:40Z

tests/pytests/test_json_multi_geo.py

    env.expect('JSON.SET', 'doc:2', '$', json.dumps(["1.2,1.3", "1.4,1.5", "2,2"])).ok()

-    env.expect(debug_cmd(), 'DUMP_NUMIDX' ,'idx:top', 'val').equal([[1, 2]])
+    env.expect(debug_cmd(), 'DUMP_NUMIDX' ,'idx:top', 'val').equal([[1, 1, 1, 2, 2, 2]])


If you recall, we have changed the DUMP_NUMIDX logic in the pure Rust PRs to include all values when is_multi=true. Previously, it deduplicated them. That's why we have a diff here.

meiravgri · 2026-02-26T12:12:41Z

tests/pytests/test_gc.py


-    # Verify initial state
+    # Verify initial state - tree should have multiple buckets
+    # The exact number of buckets depends on the HLL hash function used


isnt the hash function constant?

Changed the test back to have tight assertions on tree structure 👍🏻

meiravgri · 2026-02-26T13:15:20Z

tests/pytests/test_gc.py

+    env.assertGreaterEqual(initial_bucket_count, 2, message="Expected at least 2 buckets initially")
+    # Verify all documents are indexed
+    total_docs = sum(len(bucket) for bucket in res)
+    env.assertEqual(total_docs, InitialDocs)


can it be exact match?

LukeMathWalker · 2026-02-27T07:26:40Z

It was removed from the merge queue due to conflicts. It'll need another ✅

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

cursor · 2026-02-27T10:38:11Z

src/redisearch_rs/numeric_range_tree/src/tree/mod.rs

+    /// Resolve a [`NodeIndex`] to a mutable reference to the node.
+    pub fn node_mut(&mut self, idx: NodeIndex) -> &mut NumericRangeNode {
+        &mut self.nodes[idx]
+    }


Public node_mut lacks feature gate unlike range_mut

Low Severity

The new node_mut method is unconditionally pub, but it's only used externally in test code (rqe_iterators_test_utils). The companion range_mut method on NumericRangeNode was deliberately feature-gated behind test-utils (as requested in PR discussion), with pub(crate) in production and pub for tests. node_mut exposes mutable access to arbitrary tree nodes without the same gating, making the internal tree structure unnecessarily accessible to downstream crates in production builds.

Additional Locations (1)

src/redisearch_rs/numeric_range_tree/src/node.rs#L341-L361

Update the C fork_gc module to use the new streaming numeric GC scanner API instead of buffering deltas.

LukeMathWalker · 2026-02-27T12:14:31Z

If I have to fix conflicts one more time... 😭

sonarqubecloud · 2026-02-27T12:18:25Z

Quality Gate failed

Failed conditions
1 Security Hotspot

See analysis details on SonarQube Cloud

LukeMathWalker added the action:run-benchmark label Feb 8, 2026

github-actions bot added the size:XL label Feb 8, 2026

LukeMathWalker force-pushed the numeric-range-tree-3-swap branch from 5ab7173 to a95aa68 Compare February 9, 2026 14:11

LukeMathWalker force-pushed the numeric-range-tree-2-ffi branch 2 times, most recently from 69d255a to 2d66944 Compare February 9, 2026 14:18

LukeMathWalker force-pushed the numeric-range-tree-3-swap branch from a95aa68 to b514f01 Compare February 9, 2026 14:18

LukeMathWalker force-pushed the numeric-range-tree-3-swap branch from b514f01 to c629c5b Compare February 9, 2026 14:59

LukeMathWalker force-pushed the numeric-range-tree-2-ffi branch from 2d66944 to 1fa3b06 Compare February 9, 2026 14:59

LukeMathWalker force-pushed the numeric-range-tree-3-swap branch from c629c5b to 5c61e20 Compare February 9, 2026 15:03

LukeMathWalker force-pushed the numeric-range-tree-2-ffi branch 2 times, most recently from 4f38267 to e873c7f Compare February 10, 2026 09:24

LukeMathWalker force-pushed the numeric-range-tree-3-swap branch 2 times, most recently from 151e4d1 to d9136d3 Compare February 10, 2026 10:57

LukeMathWalker force-pushed the numeric-range-tree-2-ffi branch from e873c7f to 0aa6d28 Compare February 10, 2026 10:57

LukeMathWalker force-pushed the numeric-range-tree-3-swap branch from d9136d3 to 609478b Compare February 10, 2026 12:18

LukeMathWalker force-pushed the numeric-range-tree-2-ffi branch from 0aa6d28 to 5ec972a Compare February 10, 2026 12:18

LukeMathWalker force-pushed the numeric-range-tree-3-swap branch from 609478b to ba9c5b1 Compare February 10, 2026 12:19

LukeMathWalker force-pushed the numeric-range-tree-2-ffi branch from 5ec972a to 5de5df1 Compare February 10, 2026 12:19

LukeMathWalker force-pushed the numeric-range-tree-3-swap branch from ba9c5b1 to d93c532 Compare February 10, 2026 12:36

LukeMathWalker force-pushed the numeric-range-tree-2-ffi branch from 5de5df1 to 494fed2 Compare February 10, 2026 12:36

LukeMathWalker marked this pull request as ready for review February 10, 2026 13:07

cursor bot reviewed Feb 10, 2026

View reviewed changes

src/redisearch_rs/numeric_range_tree/src/tree/gc.rs Outdated Show resolved Hide resolved

tests/cpptests/test_cpp_llapi.cpp Outdated Show resolved Hide resolved

LukeMathWalker force-pushed the numeric-range-tree-2-ffi branch from 494fed2 to 4c0ce4b Compare February 11, 2026 11:19

LukeMathWalker force-pushed the numeric-range-tree-3-swap branch from d93c532 to a75af40 Compare February 11, 2026 11:19

cursor bot reviewed Feb 11, 2026

View reviewed changes

src/iterators/inverted_index_iterator.c Outdated Show resolved Hide resolved

LukeMathWalker force-pushed the numeric-range-tree-3-swap branch from a75af40 to 7c2bdfb Compare February 11, 2026 13:34

LukeMathWalker mentioned this pull request Feb 11, 2026

[MOD-13913] Add debug implementation for NumericRangeTree #8292

Merged

4 tasks

cursor bot reviewed Feb 11, 2026

View reviewed changes

src/fork_gc.c Outdated Show resolved Hide resolved

LukeMathWalker force-pushed the numeric-range-tree-2b-gc-ffi branch 3 times, most recently from fa9da32 to 91d9770 Compare February 23, 2026 11:05

cursor bot reviewed Feb 23, 2026

View reviewed changes

src/document.c Show resolved Hide resolved