async scroll by generall · Pull Request #7928 · qdrant/qdrant

generall · 2026-01-16T19:46:17Z

This is an experimental PR aimed to test a hypothesis of slow reads during shard transfers being caused by sequential reads.

This RP changes approach of how we read vectors for scroll operation: instead of reading vectors one-by-one sequntially, it reads batch of vectors. Each batch is processed with io_uring if enabled.

Concurrent reads of vectors significancly decrease time to transfer shards, especially when receiving side is not empty (see charts below)

generall · 2026-01-17T01:03:17Z

left to right: async_io transfer, regular transfer, transfer with madvise=sequential

generall · 2026-01-17T14:02:11Z

Transfer under higher memory preasure + UUID instead of sequential.

Left: sequential reads
Right: async IO

coderabbitai · 2026-01-17T21:54:58Z

📝 Walkthrough

Walkthrough

Adds a new SegmentRecord type and exports it. Introduces SegmentEntry::retrieve and implements it across segment and proxy layers. Reworks vector access into batched APIs (VectorStorage::read_vectors, storage-specific read_vectors, Segment::read_vectors, vectors_by_offsets) and updates search, segment ops, segment_holder, shard retrieve/update, and RecordInternal conversions to use batched retrieve/read flows and propagate an is_stopped stop flag.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Move ProxySegment from collection into shard crate #7093 — Overlapping additions of SegmentRecord and SegmentEntry::retrieve surface.
non appendable proxies #7345 — Changes to SegmentEntry and ProxySegment implementations related to the retrieve API.
Batched reader for segment construction #6487 — Batched vector-reading and vector storage API changes that intersect with read_vectors work.

Suggested reviewers

agourlay

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'async scroll' is vague and uses a non-descriptive term that doesn't clearly convey the main change to the changeset.	Consider using a more descriptive title such as 'Implement batched async vector reads for scroll operations' or 'Change vector retrieval to use batch I/O for scroll operations'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description relates to the changeset by explaining the motivations and approach for batched vector reads during shard transfers, which aligns with the technical changes in the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@lib/shard/src/retrieve/retrieve_blocking.rs`:
- Around line 31-63: The code updates point_version for every id in
newer_version_points before calling segment.retrieve(), which wrongly marks IDs
not actually returned as applied; change this by collecting the IDs returned by
segment.retrieve() (e.g., into a HashSet or Vec) and only update point_version
and increment applied for those returned IDs; specifically, in the closure
passed to segments_guard.read_points, keep the initial loop that builds
newer_version_points but do NOT call *version_entry.or_default() = version
there, call segment.retrieve(...) and collect each returned record.id into a set
while inserting records into point_records (using RecordInternal::from(record)),
then iterate over that returned-id set to update point_version entries and
increment applied for them (preserving the existing early "already latest" logic
that increments applied when Entry::Occupied and >= version).

In `@lib/shard/src/update.rs`:
- Around line 468-484: The missing-record detection builds missing_record_ids
from stored_records (so it's immediately emptied) — change it to start from the
set of requested IDs (the keys in id_to_point or the original requested IDs) and
then remove each stored_record.id during the stored_records loop; keep the
equality check (point.is_equal_to) and push differing points into
points_to_update as before, and after the loop iterate the remaining
missing_record_ids to push points for IDs not returned by the store; update the
variable initialization in sync_points (referencing missing_record_ids and
id_to_point) so missing IDs are computed from id_to_point.keys() rather than
stored_records.iter().

🧹 Nitpick comments (3)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)

191-205: Unused AccessPattern generic parameter.

The generic parameter P: AccessPattern is declared but never used in this implementation. The method always delegates to read_vectors_async regardless of the access pattern. If this is intentional for the experimental async scroll feature, consider adding a comment explaining this behavior. Otherwise, the sequential/random access hints could potentially be used to optimize the async read strategy.

Additionally, the .unwrap() on line 204 will panic if read_vectors_async fails. Based on learnings from this codebase, the io_uring feature is experimental and designed to panic rather than silently fall back. If this is the intended behavior here, a brief comment would clarify the design decision.

lib/shard/src/proxy_segment/segment_entry.rs (1)

350-369: Avoid allocation when there are no deleted points.
You can skip building filtered_point_ids if deleted_points is empty to avoid an extra allocation and copy on the hot path.

♻️ Suggested tweak

     fn retrieve(
         &self,
         point_ids: &[PointIdType],
         with_payload: &WithPayload,
         with_vector: &WithVector,
         hw_counter: &HardwareCounterCell,
         is_stopped: &AtomicBool,
     ) -> OperationResult<Vec<SegmentRecord>> {
+        if self.deleted_points.is_empty() {
+            return self.wrapped_segment.get().read().retrieve(
+                point_ids,
+                with_payload,
+                with_vector,
+                hw_counter,
+                is_stopped,
+            );
+        }
         let filtered_point_ids: Vec<PointIdType> = point_ids
             .iter()
             .copied()
             .filter(|id| !self.deleted_points.contains_key(id))
             .collect();
         self.wrapped_segment.get().read().retrieve(

lib/segment/src/segment/vectors.rs (1)

20-44: Preserve first lookup error and avoid partial callbacks.

Right now a lookup error still allows callbacks to run and overwrites earlier errors. If an error is returned, it’s safer to avoid partial side effects and to keep the first failure for determinism.

♻️ Suggested refactor

-        let mut error = None;
-        let internal_ids = point_ids
-            .iter()
-            .copied()
-            .stop_if(is_stopped)
-            .filter_map(|point_id| match self.lookup_internal_id(point_id) {
-                Ok(point_offset) => Some(point_offset),
-                Err(err) => {
-                    error = Some(err);
-                    None
-                }
-            });
-        self.vectors_by_offsets(
-            vector_names,
-            internal_ids,
-            hw_counter,
-            |point_offset, vector_internal| {
-                if let Some(point_id) = self.id_tracker.borrow().external_id(point_offset) {
-                    callback(point_id, vector_internal);
-                }
-            },
-        )?;
-        if let Some(err) = error {
-            return Err(err);
-        }
+        let mut error = None;
+        let mut internal_ids = Vec::with_capacity(point_ids.len());
+        for point_id in point_ids.iter().copied().stop_if(is_stopped) {
+            match self.lookup_internal_id(point_id) {
+                Ok(point_offset) => internal_ids.push(point_offset),
+                Err(err) => {
+                    error.get_or_insert(err);
+                    break; // avoid partial callbacks when returning Err
+                }
+            }
+        }
+        if let Some(err) = error {
+            return Err(err);
+        }
+        self.vectors_by_offsets(
+            vector_names,
+            internal_ids,
+            hw_counter,
+            |point_offset, vector_internal| {
+                if let Some(point_id) = self.id_tracker.borrow().external_id(point_offset) {
+                    callback(point_id, vector_internal);
+                }
+            },
+        )?;

lib/shard/src/retrieve/retrieve_blocking.rs

coderabbitai · 2026-01-17T21:55:01Z

lib/shard/src/update.rs

+            let mut missing_record_ids: AHashSet<PointIdType> =
+                stored_records.iter().map(|record| record.id).collect();
+
+            for stored_record in stored_records {
+                missing_record_ids.remove(&stored_record.id);
+                let point = id_to_point.get(&stored_record.id).unwrap();
+                if !point.is_equal_to(&stored_record) {
                    points_to_update.push(*point);
-                    Ok(true)
-                } else {
-                    Ok(false)
+                    updated += 1;
                }
            }
-        })?;
+
+            for missing_id in missing_record_ids {
+                let point = id_to_point.get(&missing_id).unwrap();
+                points_to_update.push(*point);
+                updated += 1;
+            }


⚠️ Potential issue | 🟠 Major

Fix missing-record detection in sync_points.

missing_record_ids is built from stored_records, then immediately emptied by the loop, so the “missing IDs” branch never runs. This skips updates for IDs that were requested but not returned.

🐛 Proposed fix

- let mut missing_record_ids: AHashSet<PointIdType> = - stored_records.iter().map(|record| record.id).collect(); + let mut missing_record_ids: AHashSet<PointIdType> = + ids.iter().copied().collect();

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let mut missing_record_ids: AHashSet<PointIdType> =

stored_records.iter().map(|record| record.id).collect();

for stored_record in stored_records {

missing_record_ids.remove(&stored_record.id);

let point = id_to_point.get(&stored_record.id).unwrap();

if !point.is_equal_to(&stored_record) {

points_to_update.push(*point);

Ok(true)

} else {

Ok(false)

updated += 1;

}

}

})?;

for missing_id in missing_record_ids {

let point = id_to_point.get(&missing_id).unwrap();

points_to_update.push(*point);

updated += 1;

}

let mut missing_record_ids: AHashSet<PointIdType> =

ids.iter().copied().collect();

for stored_record in stored_records {

missing_record_ids.remove(&stored_record.id);

let point = id_to_point.get(&stored_record.id).unwrap();

if !point.is_equal_to(&stored_record) {

points_to_update.push(*point);

updated += 1;

}

}

for missing_id in missing_record_ids {

let point = id_to_point.get(&missing_id).unwrap();

points_to_update.push(*point);

updated += 1;

}

🤖 Prompt for AI Agents

In `@lib/shard/src/update.rs` around lines 468 - 484, The missing-record detection builds missing_record_ids from stored_records (so it's immediately emptied) — change it to start from the set of requested IDs (the keys in id_to_point or the original requested IDs) and then remove each stored_record.id during the stored_records loop; keep the equality check (point.is_equal_to) and push differing points into points_to_update as before, and after the loop iterate the remaining missing_record_ids to push points for IDs not returned by the store; update the variable initialization in sync_points (referencing missing_record_ids and id_to_point) so missing IDs are computed from id_to_point.keys() rather than stored_records.iter().

generall · 2026-01-17T21:58:31Z

lib/segment/src/data_types/segment_record.rs

+pub struct SegmentRecord {
+    pub id: PointIdType,
+    pub vectors: Option<NamedVectorsOwned>,
+    pub payload: Option<Payload>,
+}


inctead of exposing vectors and payloads functions separatelly, segment now expose retrieve fucntion, which returns this.

Reason for this is that we can't have impl .. with dyn SegmentEntry so it is hard to expose callback function in this trait

generall · 2026-01-17T21:59:39Z

lib/segment/src/entry/entry_point.rs

+    fn retrieve(
+        &self,
+        point_ids: &[PointIdType],
+        with_payload: &WithPayload,
+        with_vector: &WithVector,
+        hw_counter: &HardwareCounterCell,
+        is_stopped: &AtomicBool,
+    ) -> OperationResult<Vec<SegmentRecord>>;


This function, actually, the only thing we need, vectors all_vectors and payloads are now only used in a few not-so-important places, which can be refactored out later

generall · 2026-01-17T22:03:05Z

lib/segment/src/segment/search.rs

+                let point_id = id_tracker.external_id(point_offset);
+                // This can happen if point was modified between retrieving and post-processing
+                // But this function locks the segment, so it can't be modified during its execution
+                debug_assert!(
+                    point_id.is_some(),
+                    "Point with internal ID {point_offset} not found in id tracker"
+                );
+                point_id.map(|id| (id, scored_point_offset))


Handling of unexpected situations changed a bit in this function, mostly because it is a bit harder to return error from batch of points instead of single one.

But I think it is fine, as we don't want to crash production if some edge-case point is broken.

generall · 2026-01-17T22:03:59Z

lib/segment/src/segment/segment_ops.rs

+        let mut result = None;
+        self.vectors_by_offsets(
+            vector_name,
+            std::iter::once(point_offset),
+            hw_counter,
+            |_, vector_internal| {
+                result = Some(vector_internal);
+            },
+        )?;
+        Ok(result)
+    }


this is changed to increase test coverage of underlying vectors_by_offsets

generall · 2026-01-17T22:04:55Z

lib/segment/src/segment/vectors.rs

+use crate::segment::Segment;
+use crate::types::{PointIdType, VectorName};
+
+impl Segment {


more vector-related function can be moved into this file later

generall · 2026-01-17T22:06:25Z

lib/shard/src/operations/point_ops.rs

        named_vectors
    }
+
+    pub fn is_equal_to(&self, segment_record: &SegmentRecord) -> bool {


this function if used for deduplication during sync calls

generall · 2026-01-17T22:07:18Z

lib/shard/src/retrieve/record_internal.rs

+        Self {
+            id,
+            payload,
+            vector: vectors.map(VectorStructInternal::from),


this is used to handle API expectation of points with no vector be displayed as vector: {}.

timvisee · 2026-01-19T09:43:10Z

Nice. The change looks significant, assuming I parse the graphs correctly.

I did start an effort a few weeks back for concurrent reading+transferring. I think this approach is better. Though I might implement the same concurrentness on top of this in a separate PR. It'd eliminate waiting on roundtrip time.

lib/segment/src/segment/entry.rs

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs

lib/segment/src/vector_storage/vector_storage_base.rs

agourlay

Struggled a bit to review the change across all files.

I did validate the integration tests locally with QDRANT__STORAGE__PERFORMANCE__ASYNC_SCORER=true

Also tried some basic benchmarking of scroll with_vectors but could not see a difference locally probably because of my fast SSD.

* implementation of async batch vectors reading * EXPERIMENT: async-io for reading on scroll * disable on non-linux * make retrieve sequential for test * wip: implement vector reading via callback * simplify operations, remove duplicates * use batch retrieve also for post-processing search results * clippy * fix tests * review fixes * Replace big match statement with simple option filter and equal check * Inline format arguments --------- Co-authored-by: timvisee <tim@visee.me>

generall added 5 commits January 16, 2026 18:52

implementation of async batch vectors reading

73f9db0

EXPERIMENT: async-io for reading on scroll

73a4de4

disable on non-linux

ada4a95

make retrieve sequential for test

8714e8e

wip: implement vector reading via callback

2b19567

generall added 4 commits January 17, 2026 18:12

simplify operations, remove duplicates

0290fdb

use batch retrieve also for post-processing search results

9cf4e35

clippy

916d48c

fix tests

7743077

generall marked this pull request as ready for review January 17, 2026 21:49

generall requested review from agourlay and timvisee January 17, 2026 21:49

coderabbitai bot reviewed Jan 17, 2026

View reviewed changes

generall commented Jan 17, 2026

View reviewed changes

review fixes

336669f

timvisee added 2 commits January 19, 2026 11:27

Replace big match statement with simple option filter and equal check

9c4c271

Inline format arguments

503ef76

timvisee reviewed Jan 19, 2026

View reviewed changes

lib/segment/src/segment/entry.rs Show resolved Hide resolved

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs Show resolved Hide resolved

lib/segment/src/vector_storage/vector_storage_base.rs Show resolved Hide resolved

timvisee approved these changes Jan 19, 2026

View reviewed changes

agourlay approved these changes Jan 19, 2026

View reviewed changes

generall merged commit 21deeec into dev Jan 19, 2026
15 checks passed

generall deleted the async-scroll branch January 19, 2026 17:43

coderabbitai bot mentioned this pull request Jan 20, 2026

Always Copy-on-Write when updating payload in immutable segments #7952

Merged

9 tasks

coderabbitai bot mentioned this pull request Feb 5, 2026

Don't lock SegmentHolder for the entire duration of read operations #8056

Merged

timvisee mentioned this pull request Feb 17, 2026

Bump version to 1.17.0 #8160

Merged

5 tasks

coderabbitai bot mentioned this pull request Feb 17, 2026

Optimize process search results #8163

Merged

coderabbitai bot mentioned this pull request Mar 11, 2026

Disable deferred point filtering in resharding #8310

Merged

Conversation

generall commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

generall commented Jan 17, 2026

Uh oh!

generall commented Jan 17, 2026

Uh oh!

coderabbitai bot commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

generall Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

generall Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

generall Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

generall Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

generall Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

generall Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

generall Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

timvisee commented Jan 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

agourlay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

generall commented Jan 16, 2026 •

edited

Loading

coderabbitai bot commented Jan 17, 2026 •

edited

Loading