Skip to content

Reduce search spikes, don't lock Gridstore bitmask for duration of flush#8169

Merged
timvisee merged 1 commit intodevfrom
gridstore-short-bitmask-locks
Feb 18, 2026
Merged

Reduce search spikes, don't lock Gridstore bitmask for duration of flush#8169
timvisee merged 1 commit intodevfrom
gridstore-short-bitmask-locks

Conversation

@timvisee
Copy link
Member

@timvisee timvisee commented Feb 18, 2026

Supersedes #8167

In the Gridstore flusher, a lock on the bitmask was held across the full duration of the flusher including IO. On big flushes, it means we could hold the lock for a long time.

It blocks writes on Gridstore and can in turn lock reads as well. We assume this has been causing spikes in searches.

It turns out we don't have to hold the lock. In stead we can lock in two separate stages and make the critical sections short.

As far as I can tell this change is sound. I've not been able to come up with a problematic scenario.

Thanks @IvanPleshkov for finding this one.

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

return Err(GridstoreError::FlushCancelled);
};

let mut bitmask_guard = bitmask.upgradable_read();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding the read guard for the whole operation time guarantees that bitmask and tracker contents are consistent. It doesn't seems to be true anymore. Can it be a problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be a problem.

The bitmask must be set before putting the mapping in the tracker. And mappings from the tracker must be freed before being freed from the bitmask. That is the case. These operations don't have to be atomic.

@timvisee timvisee marked this pull request as ready for review February 18, 2026 12:47
Copy link
Contributor

@coszio coszio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK to me, but was any testing performed to validate this is the source of latency spikes?

@timvisee timvisee mentioned this pull request Feb 18, 2026
9 tasks
@timvisee
Copy link
Member Author

This looks OK to me, but was any testing performed to validate this is the source of latency spikes?

We're doing that separately. @JojiiOfficial and @IvanPleshkov are looking into it.

@qdrant qdrant deleted a comment from coderabbitai bot Feb 18, 2026
Copy link
Member

@agourlay agourlay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Did a pass with crasher 👍

@timvisee timvisee changed the title Fix search spikes, don't lock Gridstore bitmask for duration of flush Reduce search spikes, don't lock Gridstore bitmask for duration of flush Feb 18, 2026
@timvisee timvisee merged commit e160308 into dev Feb 18, 2026
17 of 19 checks passed
@timvisee timvisee deleted the gridstore-short-bitmask-locks branch February 18, 2026 15:29
@timvisee timvisee mentioned this pull request Feb 18, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants