Skip to content

Fix too_many_internal_resets error#8128

Merged
ffuugoo merged 1 commit intodevfrom
too-many-internal-resets-go-away
Feb 13, 2026
Merged

Fix too_many_internal_resets error#8128
ffuugoo merged 1 commit intodevfrom
too-many-internal-resets-go-away

Conversation

@ffuugoo
Copy link
Contributor

@ffuugoo ffuugoo commented Feb 13, 2026

Patch tonic and hyper crates to expose max_local_error_reset_streams, and disable it when creating internal gRPC connections.

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --workspace --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

Patch `tonic` and `hyper` crates to expose `max_local_error_reset_streams`,
and *disable* it when creating internal gRPC connections
@qdrant qdrant deleted a comment from coderabbitai bot Feb 13, 2026
@timvisee
Copy link
Member

timvisee commented Feb 13, 2026

Some more context for the future 🦾 🤖 🦿 if we hit this again:

We hit a security feature here that helps us clean up borked connections. Since this is for internal cluster communication and we manage both sides of the connection we can disable this security feature, which is what this PR takes care of.

We mainly hit this because we drop connections before waiting on the result. We count an error when we drop a connection before we receive it's response, and eventually hit a limit. This happens a lot in fanning out reads which races with the local replica. Forcefully dropping these connections once we get a result is much easier and fits better with our implementation than waiting for all responses and dropping connections gracefully.

@agourlay
Copy link
Member

How can this fix be validated? :)

@ffuugoo
Copy link
Contributor Author

ffuugoo commented Feb 13, 2026

How can this fix be validated? :)

bfb \
    --uri $QDRANT_HOST \
    -n 10M \
    -d 512 \
    --skip-setup \
    --search \
    --keywords 5000 \
    --rps 300

@generall used this collection setup (not sure if critical, though, search is what repro the bug)

bfb \
    --uri $QDRANT_HOST \
    -n 10M \
    -d 512 \
    --shards 9 \
    --replication-factor 2 \
    --on-disk-vectors true \
    --keywords 5000 \
    --hnsw-m 0 \
    --hnsw-payload-m 16 \
    --tenants true \
    -b 10 \
    --timeout 60 \
    --rps 100

@generall
Copy link
Member

important detail is fan-out-factor

@ffuugoo
Copy link
Contributor Author

ffuugoo commented Feb 13, 2026

important detail is fan-out-factor

Repro with default setup, without any explicit fan-out factor setting on my machine.

Behavior with and without fix is different:

  • without the fix too_many_internal_resets
    • note "internal"
    • that's what we observed recently and what we expected from this test
  • with the fix there's still error, but it's too_many_resets
    • no "internal"
    • it's abort-reset-streams error, which is different
    • I'd consider this as indication that "fix works", because "internal" error goes away as expected

@generall
Copy link
Member

It seems to fix problem on my repro setup

@ffuugoo ffuugoo merged commit 5bb06d9 into dev Feb 13, 2026
17 checks passed
@ffuugoo ffuugoo deleted the too-many-internal-resets-go-away branch February 13, 2026 17:21
timvisee pushed a commit that referenced this pull request Feb 16, 2026
Patch `tonic` and `hyper` crates to expose `max_local_error_reset_streams`,
and *disable* it when creating internal gRPC connections
@timvisee timvisee mentioned this pull request Feb 17, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants