Skip to content

Explicit gracefull stop for LocalShard#7640

Merged
generall merged 20 commits intodevfrom
sync-local-shard-drop
Dec 1, 2025
Merged

Explicit gracefull stop for LocalShard#7640
generall merged 20 commits intodevfrom
sync-local-shard-drop

Conversation

@generall
Copy link
Member

@generall generall commented Nov 29, 2025

Motivation:

Having long running stop_gracefully function inside Drop is bad on it's own, but even worse if this long drop needs to call for async runtime operations.
In our case it created a situation, where Drop can not finish because it wants to spawn another async task on a runtime, which is has already used for drop itself. This results in a Deadlock, in those cases when runtime has only one available thread.

This PR refactors Drop in a way, that it only makes very small and syncronous operations, while relying on external explicit shutdown, which should be executed on a real async runtime.

  • sync function to ask to stop workers
  • graceful shard stop on ShardReplicaSet changes
  • make stop consuming
  • implement graceful stop for replica set
  • graceful shard stop in start_resharding_unchecked
  • stop_gracefully fn on shard_holder
  • collection graceful stop on delete
  • explain when it is ok to shutdown

Recommed to review by commits

@generall generall requested a review from timvisee November 29, 2025 01:21
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@generall generall force-pushed the sync-local-shard-drop branch from 93869e0 to ff006d0 Compare November 30, 2025 12:54
coderabbitai[bot]

This comment was marked as resolved.

@KShivendu KShivendu self-requested a review December 1, 2025 11:40
@qdrant qdrant deleted a comment from coderabbitai bot Dec 1, 2025
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Dec 1, 2025
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Dec 1, 2025
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Dec 1, 2025
@generall generall merged commit 824c5ad into dev Dec 1, 2025
15 checks passed
@generall generall deleted the sync-local-shard-drop branch December 1, 2025 17:46
@timvisee timvisee mentioned this pull request Dec 2, 2025
timvisee added a commit that referenced this pull request Dec 3, 2025
* sync function to ask to stop workers

fmt

update blocking_ask_workers_to_stop

* graceful shard stop on ShardReplicaSet changes

* make stop consuming

* implement graceful stop for replica set

* graceful shard stop in start_resharding_unchecked

* stop_gracefully fn on shard_holder

* collection graceful stop on delete

* explain when it is ok to shutdown

* fix tests and ensure blocking callsa are not called from async runtime

* fmt

* Annotate cancel safety for do_recover_from_snapshot function

* Minor comment nit

* make drop not wait on stop thread

* graceful stop in more tests

* graceful stop in more tests

* avoid background thread is shard was stopped gracefully

* annotate cancel safety

* fmt

* More cancel safety annotations

* Add two spawns to ensure cancel safety

---------

Co-authored-by: timvisee <tim@visee.me>
@timvisee timvisee mentioned this pull request Dec 3, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants