Skip to content

Implement slots tracker for ASM - [MOD-12167]#7200

Merged
chesedo merged 63 commits intomasterfrom
guyav-ASM_slots_tracker
Nov 10, 2025
Merged

Implement slots tracker for ASM - [MOD-12167]#7200
chesedo merged 63 commits intomasterfrom
guyav-ASM_slots_tracker

Conversation

@GuyAv46
Copy link
Copy Markdown
Collaborator

@GuyAv46 GuyAv46 commented Oct 30, 2025

Implement slots state machine as part of the ASM effort

Implements a C API for a state machine, as described in the design document.

The implementation has 3 levels:

  1. SlotSet - Implement set operations sor a set of slots, represented as an array of ranges, represented by the start and end slot of each range. This is to be memory efficient (under unfragmanted topology), and to seamlessly work with RedisModuleSlotRangeArray, which is expected to be supplied by the notifications we are going to use it in.
  2. SlotsTracker - Implements the state machine as described in the design document, with functionalities according to the changes we want to track.
    Internally, it uses 3 SlotSets to track the different states of slots in the shard while the topology changes.
  3. the C API - A wrapper around a static SlotsTracker instance. It gives an unsafe access to the state machine, which is safe under the assumption only the main thread calls these functions, except slots_tracker_get_version to atomically fetch the current version of the SlotsTracker, which can be called from any thread. The implementation is safe with thread_local! instanciation (so no thread can access the state machine managed by the main-thread), and the API remember the thread-ID that first called the API and panics if any other thread calls it, so we should never have more than a single state machine

Main objects this PR modified

  1. SlotSet - track slots (values [0-16383]) using vector of continuous ranges, represented by the start and end slot of the range, and implements Set operations
  2. SlotsTracker - Implements the state machine as described in the design document
  3. The C API - wrapper around a single instance of SlotsTracker with minimal synchronization overhead

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Note

Introduces a Rust slots tracker and C FFI to manage cluster slot ranges with versioned state, plus an atomic version counter in result processing for future slot-based filtering.

  • Core (Rust):
    • New crate slots_tracker: implements SlotSet (range-based set ops) and SlotsTracker (local/fully/partially-available sets, wraparound Version, availability checks).
    • Extensive unit tests for set operations, versioning, and state transitions.
  • FFI:
    • New crate c_entrypoint/slots_tracker_ffi: thread-local singleton tracker with owner-thread enforcement; exports C functions: slots_tracker_set_local_slots_internal, slots_tracker_mark_partially_available_slots_internal, slots_tracker_promote_to_local_slots, slots_tracker_mark_fully_available_slots, slots_tracker_remove_deleted_slots, slots_tracker_has_fully_available_overlap, slots_tracker_check_availability.
    • Generated header headers/slots_tracker.h via cbindgen with inline wrappers managing an atomic version and OptionSlotTrackerVersion.
  • Integration:
    • Adds key_space_version (atomic_uint) declaration/definition in result_processor.c.
    • Updates workspace Cargo.toml and dependencies to include slots_tracker and FFI crate; build script for header generation.

Written by Cursor Bugbot for commit 88a4827. This will update automatically on new commits. Configure here.

@GuyAv46 GuyAv46 added the enforce:coverage Run coverage flow even on draft pull request label Oct 30, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Oct 30, 2025

Codecov Report

❌ Patch coverage is 90.18519% with 106 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.15%. Comparing base (fd48342) to head (88a4827).
⚠️ Report is 47 commits behind head on master.

Files with missing lines Patch % Lines
...earch_rs/c_entrypoint/slots_tracker_ffi/src/lib.rs 0.00% 99 Missing ⚠️
...c/redisearch_rs/slots_tracker/src/slots_tracker.rs 98.52% 6 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7200      +/-   ##
==========================================
- Coverage   85.49%   85.15%   -0.35%     
==========================================
  Files         334      343       +9     
  Lines       51550    52674    +1124     
  Branches    11507    13601    +2094     
==========================================
+ Hits        44071    44852     +781     
- Misses       7300     7629     +329     
- Partials      179      193      +14     
Flag Coverage Δ
flow 84.38% <ø> (+0.23%) ⬆️
unit 52.34% <90.18%> (+0.38%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@chesedo chesedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor question

Comment on lines +37 to +43
static void slots_tracker_set_local_slots(const RedisModuleSlotRangeArray *ranges) {
uint32_t version_before = atomic_load_explicit(&key_space_version, memory_order_relaxed);
uint32_t version_after = slots_tracker_set_local_slots_internal(ranges);
if (version_after != version_before) {
atomic_store_explicit(&key_space_version, version_after, memory_order_relaxed);
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this means you can't define this function in the Rust FFI layer?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to avoid FFI overhead in the query execution path, which only requires reading the atomic variable.
With this trick, we can leave the atomic variable on the C side, giving very low overhead (everything is done with relaxed memory order), while still guarantee its maintenance by the FFI library (using static definition in the H file).
Any attempt to maintain the atomic variable from the FFI will give some overhead. This should be simplified once the pipeline execution is moved to Rust.
Alternativly, we could pass a pointer to the atomic variable to the FFI, and only have overhead when updating it (which is fine, as it's not the hot path). But it's an additional unsafety and will also require the caller to pass the right atomic variable, while with this solution the caller soen't have to explicitly know what's the right variable

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, how many places in the code is expected to call this? If it is more than one then we'll have to rethink this (what will happen when one of the call sides are ported to Rust while the others are not)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

@GuyAv46 GuyAv46 Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each function will be called from a single component (the first from the coordinator, the second from a notification).
In any case, the atomic variable should be linked to the query result processor (where the access performance is critical). I believe we can handle any in-between state while we migrate

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sounds good then

@GuyAv46 GuyAv46 requested a review from chesedo November 10, 2025 12:09
@GuyAv46 GuyAv46 requested a review from JoanFM November 10, 2025 13:07
@GuyAv46 GuyAv46 added this pull request to the merge queue Nov 10, 2025
@GuyAv46 GuyAv46 removed this pull request from the merge queue due to a manual request Nov 10, 2025
@chesedo chesedo added this pull request to the merge queue Nov 10, 2025
Merged via the queue into master with commit 742f192 Nov 10, 2025
19 checks passed
@chesedo chesedo deleted the guyav-ASM_slots_tracker branch November 10, 2025 23:37
redisearch-backport-pull-request bot pushed a commit that referenced this pull request Nov 10, 2025
* initial component layout

* layout `slots_tracker_set_local_slots` plan

* layout `slots_tracker_set_partially_available_slots` plan

* layout `slots_tracker_set_fully_available_slots` plan

* layout `slots_tracker_remove_deleted_slots` and `slots_tracker_has_fully_available_overlap` plan

* layout `slots_tracker_check_availability` plan

* minimal refactor to the API

* implement helpers for the version management

* initial implementation of SlotsSet

* another iteration

* refine implementation

* add some tests

* formatting

* group state parts and other cleanup

* reuse functions

* rename

* set unit test module in slot_set.rs

* formatting

* move helpers

* implement separately slots_tracker for safer API and easier testing

* refactor API to use SlotsTracker

* add some tests

* tidy ups and fix lifetime

* format

* better struct exposure

* better memory management in union_relation

* add test

* rewrite some comments

* switch values anf six comments

* allow empty SlotRangeArray input (not null)

* Check complete state in tests with a custom PartialEq implementation

* update comments

* add promote_to_local_slots functionality

* add tests

* formatting

* tests cleanup

* add coverage_relation algorithm

* refactor union_relation to use new coverage_relation

* formatting

* linting

* an attempt to create an FFI correctly

* address some review comments

* another review comment fix

* update ffi and header

* fix cbindgen build

* more idiomatic

* add debug assert to promote_to_local_slots

* remove tests for unexpected situations

* refactor has_overlap to use "any"

* minor improvement

* reorder match cases for better readability

* use thread_local to avoid unsafe sync

* replace boolean assert with thread id validation

* implement version enum for slots_tracker

* address review comments

* improve comments

* rename value to version

* optimize atomic access

* simplify helpers

* use assert

* move atomic manipulations to the C side

* improve tests

* improve API comments

(cherry picked from commit 742f192)
@redisearch-backport-pull-request
Copy link
Copy Markdown
Contributor

Successfully created backport PR for 8.4:

github-merge-queue bot pushed a commit that referenced this pull request Nov 11, 2025
Implement slots tracker for ASM - [MOD-12167] (#7200)

* initial component layout

* layout `slots_tracker_set_local_slots` plan

* layout `slots_tracker_set_partially_available_slots` plan

* layout `slots_tracker_set_fully_available_slots` plan

* layout `slots_tracker_remove_deleted_slots` and `slots_tracker_has_fully_available_overlap` plan

* layout `slots_tracker_check_availability` plan

* minimal refactor to the API

* implement helpers for the version management

* initial implementation of SlotsSet

* another iteration

* refine implementation

* add some tests

* formatting

* group state parts and other cleanup

* reuse functions

* rename

* set unit test module in slot_set.rs

* formatting

* move helpers

* implement separately slots_tracker for safer API and easier testing

* refactor API to use SlotsTracker

* add some tests

* tidy ups and fix lifetime

* format

* better struct exposure

* better memory management in union_relation

* add test

* rewrite some comments

* switch values anf six comments

* allow empty SlotRangeArray input (not null)

* Check complete state in tests with a custom PartialEq implementation

* update comments

* add promote_to_local_slots functionality

* add tests

* formatting

* tests cleanup

* add coverage_relation algorithm

* refactor union_relation to use new coverage_relation

* formatting

* linting

* an attempt to create an FFI correctly

* address some review comments

* another review comment fix

* update ffi and header

* fix cbindgen build

* more idiomatic

* add debug assert to promote_to_local_slots

* remove tests for unexpected situations

* refactor has_overlap to use "any"

* minor improvement

* reorder match cases for better readability

* use thread_local to avoid unsafe sync

* replace boolean assert with thread id validation

* implement version enum for slots_tracker

* address review comments

* improve comments

* rename value to version

* optimize atomic access

* simplify helpers

* use assert

* move atomic manipulations to the C side

* improve tests

* improve API comments

(cherry picked from commit 742f192)

Co-authored-by: GuyAv46 <47632673+GuyAv46@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants