Fix deadlock on Actix RT in Collection Info by agourlay · Pull Request #7277 · qdrant/qdrant

agourlay · 2025-09-22T08:46:11Z

Follow up on #7267

Same rational and same fix to remove blocking computation from Actix runtime.

coderabbitai · 2025-09-22T08:50:56Z

📝 Walkthrough

Walkthrough

In lib/collection/src/shards/local_shard/mod.rs, per-segment statistics for shard info are moved into a spawned blocking task that iterates segments and accumulates payload schema, indexed_vectors_count, points_count, and segments_count. The async caller awaits the task, logs on failure, and falls back to defaults. A cloned segments handle is used within the blocking task. Status and optimizer_status retrieval remain unchanged.
In tests/consensus_tests/test_shard_snapshot_deadlock.py, a background polling helper loop loop_collection_info(peer_url) is added and launched during the test. ThreadPoolExecutor max_workers is increased from 5 to 6.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

timvisee
ffuugoo
KShivendu

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "Fix deadlock on Actix RT in Collection Info" succinctly and accurately summarizes the primary change in this PR: moving blocking computation out of the Actix runtime to prevent deadlocks in the collection info path. It is concise, focused on the main issue and subsystem, and contains no extraneous noise. This makes the intent clear to reviewers and to teammates scanning the project history.
Description Check	✅ Passed	The description states this is a follow-up to PR #7267 and that it applies the same rationale and fix to remove blocking computation from the Actix runtime, which directly matches the changes in the collection info code and accompanying test adjustments. Although brief, the description is on-topic and conveys the intent of the changes. It is therefore acceptable for this lenient check.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch local-shard-info-blocking-thread

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

tests/consensus_tests/test_shard_snapshot_deadlock.py (1)
27-30: Make loop_collection_info resilient and stoppable

As written, any transient RequestException will terminate the loop and reduce the intended pressure. Also consider an optional stop flag to allow graceful shutdown in finally.

Apply:
-def loop_collection_info(peer_url):
-    while True:
-        requests.get(f"{peer_url}/collections/test_collection", timeout = 0.5)
+def loop_collection_info(peer_url, stop=None, session=None):
+    sess = session or requests.Session()
+    try:
+        while stop is None or not stop.is_set():
+            try:
+                sess.get(f"{peer_url}/collections/test_collection", timeout=0.5)
+            except requests.RequestException:
+                # ignore and continue to keep pressure on the endpoint
+                pass
+    finally:
+        if session is None:
+            sess.close()
Optionally mirror this pattern in the other loop_* helpers and pass a shared threading.Event you set in the finally block before executor shutdown. Happy to provide a full diff across the file if you want that change.
lib/collection/src/shards/local_shard/mod.rs (2)
1103-1129: Good: moves per‑segment aggregation off Actix via spawn_blocking

This removes blocking work from the RT thread while keeping segment access under a blocking read. LGTM. One minor improvement: consider taking a snapshot of segment handles (if feasible) under the short read lock and then iterating them after dropping the holder read lock to minimize contention.

1130-1136: Streamline error handling and keep context

Current pattern logs Err then unwrap_or_default(). A compact match avoids double handling and lets you add context once.

Apply:
-        let segment_info = tokio::task::spawn_blocking(move || {
+        let segment_info = tokio::task::spawn_blocking(move || {
             // ... unchanged ...
-        })
-        .await;
-
-        if let Err(err) = &segment_info {
-            log::error!("Failed to get local shard info: {err}");
-        }
-
-        let (schema, indexed_vectors_count, points_count, segments_count) =
-            segment_info.unwrap_or_default();
+        }).await;
+
+        let (schema, indexed_vectors_count, points_count, segments_count) = match segment_info {
+            Ok(res) => res,
+            Err(err) => {
+                log::error!("Failed to get local shard info: {err}");
+                Default::default()
+            }
+        };
Also, please confirm that merging PayloadIndexInfo by summing only points (keeping other fields from the first seen segment) matches intended semantics across segments.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e86d60e and 2de4e2d.

📒 Files selected for processing (2)

lib/collection/src/shards/local_shard/mod.rs (2 hunks)
tests/consensus_tests/test_shard_snapshot_deadlock.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead

Files:

lib/collection/src/shards/local_shard/mod.rs

**/src/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)

Files:

lib/collection/src/shards/local_shard/mod.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: e2e-tests
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-consensus-compose
GitHub Check: integration-tests
GitHub Check: integration-tests-consensus
GitHub Check: lint
GitHub Check: storage-compat-test
GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: rust-tests (macos-latest)
GitHub Check: rust-tests (windows-latest)

🔇 Additional comments (2)

tests/consensus_tests/test_shard_snapshot_deadlock.py (2)

59-59: Bumping ThreadPoolExecutor to 6 workers is appropriate

Matches the additional background task. No concerns.

76-77: Adding collection info pressure is spot on

This exercises the newly offloaded Collection Info path during the deadlock scenario. LGTM.

Fix deadlock on Actix RT in Collection Info

2de4e2d

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

agourlay requested a review from timvisee September 22, 2025 09:10

timvisee approved these changes Sep 22, 2025

View reviewed changes

agourlay merged commit 04d8d05 into dev Sep 22, 2025
21 of 22 checks passed

agourlay deleted the local-shard-info-blocking-thread branch September 22, 2025 09:14

timvisee pushed a commit that referenced this pull request Sep 29, 2025

Fix deadlock on Actix RT in Collection Info (#7277)

e0b1538

timvisee mentioned this pull request Sep 29, 2025

Bump version to 1.15.5 #7323

Merged

This was referenced Feb 6, 2026

Don't lock SegmentHolder for the entire duration of read operations #8056

Merged

Add test for consensus' freeze during streaming snapshot #8075

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deadlock on Actix RT in Collection Info#7277

Fix deadlock on Actix RT in Collection Info#7277
agourlay merged 1 commit intodevfrom
local-shard-info-blocking-thread

agourlay commented Sep 22, 2025

Uh oh!

coderabbitai bot commented Sep 22, 2025

Walkthrough

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

agourlay commented Sep 22, 2025

Uh oh!

coderabbitai bot commented Sep 22, 2025

Walkthrough

Estimated code review effort

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants