Skip to content

check if we try to deactivate last initializing replica#6755

Merged
generall merged 3 commits intodevfrom
do-not-deactivate-last-initializing
Jun 25, 2025
Merged

check if we try to deactivate last initializing replica#6755
generall merged 3 commits intodevfrom
do-not-deactivate-last-initializing

Conversation

@generall
Copy link
Member

There is a bug, which I can't reproduce locally, but it was observed on practice multiple times:

If pod was somehow killed during collection creation or there was an error during creating a collection (due to file descriptors or something like that), it might be possible that some shards of the collection have inconsistent state between initializing and dead.

Local shard thinks the shard is dead while other machines in the cluster consider it initializing.

Since local shard status it dead it needs to recover it from somewhere, but it is also the only shard in the cluster. So cluster is stuck in this inconsistent state without ability to recover (except for collection deletion).

This PR extends our check for is_last_active_replica and handles the case of no active replicas in more details.

@generall generall requested review from ffuugoo and timvisee June 24, 2025 22:26
@coderabbitai

This comment was marked as resolved.

Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not reproduce this either, but I've seen this problem as well. The implementation looks sound 👍

Co-authored-by: Tim Visée <tim+github@visee.me>
@generall generall merged commit c29ab98 into dev Jun 25, 2025
18 checks passed
@generall generall deleted the do-not-deactivate-last-initializing branch June 25, 2025 12:04
generall added a commit that referenced this pull request Jul 17, 2025
* check if we try to deactivate last initializing replica

* consider more cases

* Update lib/collection/src/shards/replica_set/mod.rs

Co-authored-by: Tim Visée <tim+github@visee.me>

---------

Co-authored-by: Tim Visée <tim+github@visee.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants