Skip to content

Fix hash ring initialization for custom sharding, prevent resharding panic#7517

Merged
timvisee merged 10 commits intodevfrom
hashring-rebuild-no-default
Nov 12, 2025
Merged

Fix hash ring initialization for custom sharding, prevent resharding panic#7517
timvisee merged 10 commits intodevfrom
hashring-rebuild-no-default

Conversation

@timvisee
Copy link
Member

@timvisee timvisee commented Nov 11, 2025

We currently don't initialize hash rings correctly when custom sharding is used.

Specifically, when custom sharding is used, we also initialize a hash ring for the None key. I'd argue this is not correct.

This PR changes the shard holder loading behavior to not create the None hash ring anymore. We now strictly define what kind of sharding is used, and derive the default rings from there.

We had a production cluster panicking on resharding because no shard key was provided. Loading the hash rings as described in this PR would have prevented the panic with a graceful error. Applying this fix on a cluster that is already in a crash loop due to this problem will resolve the issue.

This PR adds some tests to ensure we check the shard count precondition before resharding. It also ensures we don't crash like we did before.

Tried with:

PUT collections/test
{
  "vectors": {
    "distance": "Cosine",
    "size": 1
  },
  "sharding_method": "custom"
}

POST collections/test/cluster
{
  "create_sharding_key": {
    "shard_key": "test",
    "shards_number": 2
  }
}

POST collections/test/cluster
{
  "start_resharding": {
    "direction": "down",
    "shard_key": null
  }
}

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@timvisee timvisee requested a review from generall November 11, 2025 17:24
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Nov 11, 2025
When using auto sharding, require no shard key. When using custom
sharding, require a shard key.
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Nov 12, 2025
@qdrant qdrant deleted a comment from coderabbitai bot Nov 12, 2025
@agourlay
Copy link
Member

Do we have time for adding tests right now?

@timvisee
Copy link
Member Author

Do we have time for adding tests right now?

Yes!

@timvisee timvisee changed the title Hotfix: correctly initialize hash rings for custom sharding Correctly initialize hash rings for custom sharding Nov 12, 2025
@qdrant qdrant deleted a comment from coderabbitai bot Nov 12, 2025
@timvisee timvisee requested a review from agourlay November 12, 2025 09:58
@timvisee
Copy link
Member Author

timvisee commented Nov 12, 2025

@agourlay I've added some end to end tests to ensure this cannot happen again.

Most critically it asserts that:

  1. that we don't crash on the customers edge case, like we did before
  2. that we check preconditions for shard count before resharding

coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Nov 12, 2025
@timvisee timvisee changed the title Correctly initialize hash rings for custom sharding Fix hash ring initialization for custom sharding, prevent resharding panic Nov 12, 2025
Co-authored-by: Roman Titov <ffuugoo@users.noreply.github.com>
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Nov 12, 2025
Copy link
Member

@agourlay agourlay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding those great tests 👏

@timvisee timvisee merged commit c3d2d5d into dev Nov 12, 2025
15 checks passed
@timvisee timvisee deleted the hashring-rebuild-no-default branch November 12, 2025 10:25
timvisee added a commit that referenced this pull request Nov 14, 2025
…panic (#7517)

* Correctly rebuild hash rings, don't use default with custom sharding

* Correctly initialize hash ring map with custom sharding

Don't load the non key ring by default

* Validate shard key for resharding more aggressively

When using auto sharding, require no shard key. When using custom
sharding, require a shard key.

* Only pass sharding method into shard holder construction

* Remove now obsolete debug assertion, we use same source config

* Correctly reconstruct hash rings, also add shard IDs in auto mode

* Add test, do not break consensus on invalid resharding down operation

* Add tests for resharding down shard count precondition

* Minor test tweaks

* Improve debug assertion

Co-authored-by: Roman Titov <ffuugoo@users.noreply.github.com>

---------

Co-authored-by: Roman Titov <ffuugoo@users.noreply.github.com>
@timvisee timvisee mentioned this pull request Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants