Indexing progress by xzfc · Pull Request #7625 · qdrant/qdrant

xzfc · 2025-11-27T05:21:31Z

This PR adds /collections/{name}/optimizations endpoint to track ongoing optimization progress.

It is implemented via new module progress_tracker.rs. The optimization process is represented as a tree of operations, e.g.,

optimization
├── copy_data
├── populate_vector_storages
├── wait_cpu_permit
├── quantization
│   └── ""
├── payload_index
│   ├── uuid:field1
│   └── bool:field2
├── vector_index
│   └── ""
│       ├── migrate
│       ├── main_graph
│       └── additional_links
│           ├── uuid:field1
│           └── bool:field2
└── sparse_vector_index

We get start/end times for each step/substep. Later we might draw trees or flamecharts or out of this data.

Bonus: vibe-coded HTML demo page to render flamecharts: https://gist.github.com/xzfc/f5122e1efcd0e67cdcf77c25133616c7

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (4)

docs/redoc/master/openapi.json (2)
16163-16235: Progress schemas look good; add examples and clarify ordering (optional).

Document that completed is ordered by finished_at desc (or intended order).

Add minimal examples to IndexingProgress and ProgressTree to improve docs.

Example snippet (illustrative, not a full diff):
"IndexingProgress": {
  "example": {
    "ongoing": [
      { "name": "build_hnsw", "started_at": "2025-12-01T10:00:00Z", "done": 1200, "total": 5000, "children": [] }
    ],
    "completed": [
      { "name": "optimize_segments", "started_at": "2025-11-30T09:00:00Z", "finished_at": "2025-11-30T09:05:10Z", "duration_sec": 310.2, "children": [] }
    ]
  }
}
16208-16213: Nit: align number format for duration_sec (optional).

Most numeric fields use "format": "float". Consider switching duration_sec from "double" to "float" for consistency (or standardize on one). No behavior change.
-          "duration_sec": {
-            "description": "For finished operations, how long they took, in seconds.",
-            "type": "number",
-            "format": "double",
+          "duration_sec": {
+            "description": "For finished operations, how long they took, in seconds.",
+            "type": "number",
+            "format": "float",
src/actix/api/collections_api.rs (2)
239-257: Consider using async fn for consistency with other endpoints.

Most endpoints in this file use async fn directly. While returning impl Future works, using async fn would be more consistent with the rest of the codebase:
-#[get("/collections/{name}/indexing_progress")]
-fn get_indexing_progress(
-    dispatcher: web::Data<Dispatcher>,
-    collection: Path<CollectionPath>,
-    ActixAccess(access): ActixAccess,
-    params: Query<IndexingProgressParam>,
-) -> impl Future<Output = HttpResponse> {
-    helpers::time(async move {
+#[get("/collections/{name}/indexing_progress")]
+async fn get_indexing_progress(
+    dispatcher: web::Data<Dispatcher>,
+    collection: Path<CollectionPath>,
+    ActixAccess(access): ActixAccess,
+    params: Query<IndexingProgressParam>,
+) -> HttpResponse {
+    helpers::time(async {
         let pass = new_unchecked_verification_pass();
         let collection_pass =
             access.check_collection_access(&collection.name, AccessRequirements::new())?;
         Ok(dispatcher
             .toc(&access, &pass)
             .get_collection(&collection_pass)
             .await?
             .indexing_progress(params.completed_limit.unwrap_or(0) as usize)
             .await?)
     })
+    .await
 }
254-254: Potential truncation when casting u64 to usize on 32-bit platforms.

While unlikely to be an issue in practice (completed_limit values are typically small), the as usize cast could silently truncate values on 32-bit systems. Consider using saturating conversion:
-            .indexing_progress(params.completed_limit.unwrap_or(0) as usize)
+            .indexing_progress(params.completed_limit.unwrap_or(0).min(usize::MAX as u64) as usize)
Alternatively, if large values are truly not expected, you could add validation in IndexingProgressParam to ensure reasonable bounds.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e8f8e3f and a658332.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (46)

docs/redoc/master/openapi.json (2 hunks)
lib/collection/src/collection/collection_ops.rs (3 hunks)
lib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rs (7 hunks)
lib/collection/src/collection_manager/optimizers/indexing_optimizer.rs (7 hunks)
lib/collection/src/collection_manager/optimizers/merge_optimizer.rs (2 hunks)
lib/collection/src/collection_manager/optimizers/mod.rs (5 hunks)
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs (9 hunks)
lib/collection/src/collection_manager/optimizers/vacuum_optimizer.rs (4 hunks)
lib/collection/src/operations/types.rs (2 hunks)
lib/collection/src/shards/forward_proxy_shard.rs (3 hunks)
lib/collection/src/shards/local_shard/mod.rs (1 hunks)
lib/collection/src/shards/proxy_shard.rs (3 hunks)
lib/collection/src/shards/queue_proxy_shard.rs (2 hunks)
lib/collection/src/shards/replica_set/mod.rs (3 hunks)
lib/collection/src/shards/shard.rs (2 hunks)
lib/collection/src/update_handler.rs (2 hunks)
lib/common/common/Cargo.toml (1 hunks)
lib/common/common/src/lib.rs (1 hunks)
lib/common/common/src/progress_tracker.rs (1 hunks)
lib/segment/benches/hnsw_incremental_build.rs (2 hunks)
lib/segment/benches/multi_vector_search.rs (2 hunks)
lib/segment/src/index/hnsw_index/hnsw.rs (11 hunks)
lib/segment/src/index/hnsw_index/tests/test_graph_connectivity.rs (2 hunks)
lib/segment/src/segment_constructor/segment_builder.rs (13 hunks)
lib/segment/src/segment_constructor/segment_constructor_base.rs (2 hunks)
lib/segment/tests/integration/batch_search_test.rs (2 hunks)
lib/segment/tests/integration/byte_storage_hnsw_test.rs (2 hunks)
lib/segment/tests/integration/byte_storage_quantization_test.rs (2 hunks)
lib/segment/tests/integration/disbalanced_vectors_test.rs (2 hunks)
lib/segment/tests/integration/exact_search_test.rs (2 hunks)
lib/segment/tests/integration/filtrable_hnsw_test.rs (3 hunks)
lib/segment/tests/integration/gpu_hnsw_test.rs (2 hunks)
lib/segment/tests/integration/hnsw_discover_test.rs (3 hunks)
lib/segment/tests/integration/hnsw_incremental_build.rs (2 hunks)
lib/segment/tests/integration/hnsw_quantized_search_test.rs (3 hunks)
lib/segment/tests/integration/multivector_filtrable_hnsw_test.rs (2 hunks)
lib/segment/tests/integration/multivector_hnsw_test.rs (2 hunks)
lib/segment/tests/integration/multivector_quantization_test.rs (2 hunks)
lib/segment/tests/integration/payload_index_test.rs (2 hunks)
lib/segment/tests/integration/segment_builder_test.rs (7 hunks)
lib/segment/tests/integration/segment_on_disk_snapshot.rs (2 hunks)
openapi/openapi-collections.ytt.yaml (1 hunks)
src/actix/api/collections_api.rs (3 hunks)
src/schema_generator.rs (2 hunks)
tests/openapi/test_indexing_progress.py (1 hunks)
tests/openapi_consistency_check.sh (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust
In new code, don't use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice - use bytemuck or zerocopy crates instead
Prefer explicit exhaustive matches over catch-all _ arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using : _ over using .. in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)

Files:

lib/segment/tests/integration/segment_on_disk_snapshot.rs
lib/segment/tests/integration/hnsw_incremental_build.rs
src/actix/api/collections_api.rs
lib/collection/src/shards/replica_set/mod.rs
lib/segment/tests/integration/batch_search_test.rs
lib/segment/benches/hnsw_incremental_build.rs
lib/segment/tests/integration/exact_search_test.rs
lib/collection/src/collection_manager/optimizers/merge_optimizer.rs
lib/common/common/src/lib.rs
lib/segment/tests/integration/disbalanced_vectors_test.rs
lib/segment/tests/integration/gpu_hnsw_test.rs
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs
lib/segment/tests/integration/multivector_quantization_test.rs
lib/segment/tests/integration/hnsw_quantized_search_test.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/collection/src/shards/shard.rs
lib/segment/src/segment_constructor/segment_builder.rs
lib/collection/src/collection_manager/optimizers/vacuum_optimizer.rs
lib/collection/src/shards/local_shard/mod.rs
lib/collection/src/update_handler.rs
lib/segment/tests/integration/byte_storage_hnsw_test.rs
lib/segment/tests/integration/payload_index_test.rs
lib/collection/src/collection_manager/optimizers/indexing_optimizer.rs
lib/segment/tests/integration/multivector_filtrable_hnsw_test.rs
lib/collection/src/shards/queue_proxy_shard.rs
lib/segment/tests/integration/multivector_hnsw_test.rs
lib/collection/src/operations/types.rs
lib/segment/benches/multi_vector_search.rs
lib/collection/src/shards/forward_proxy_shard.rs
lib/segment/src/index/hnsw_index/hnsw.rs
lib/segment/tests/integration/hnsw_discover_test.rs
lib/common/common/src/progress_tracker.rs
lib/segment/tests/integration/byte_storage_quantization_test.rs
lib/collection/src/shards/proxy_shard.rs
lib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rs
lib/segment/tests/integration/segment_builder_test.rs
lib/segment/tests/integration/filtrable_hnsw_test.rs
src/schema_generator.rs
lib/collection/src/collection_manager/optimizers/mod.rs
lib/segment/src/index/hnsw_index/tests/test_graph_connectivity.rs
lib/collection/src/collection/collection_ops.rs

🧠 Learnings (18)

📚 Learning: 2025-07-11T11:35:21.549Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Applied to files:

lib/segment/tests/integration/segment_on_disk_snapshot.rs
lib/segment/tests/integration/hnsw_incremental_build.rs
lib/segment/tests/integration/batch_search_test.rs
lib/segment/tests/integration/exact_search_test.rs
lib/collection/src/collection_manager/optimizers/merge_optimizer.rs
lib/segment/tests/integration/disbalanced_vectors_test.rs
lib/segment/tests/integration/gpu_hnsw_test.rs
lib/segment/tests/integration/multivector_quantization_test.rs
lib/segment/tests/integration/hnsw_quantized_search_test.rs
lib/collection/src/shards/shard.rs
lib/collection/src/collection_manager/optimizers/vacuum_optimizer.rs
lib/segment/tests/integration/byte_storage_hnsw_test.rs
lib/segment/tests/integration/payload_index_test.rs
lib/collection/src/collection_manager/optimizers/indexing_optimizer.rs
lib/segment/tests/integration/multivector_filtrable_hnsw_test.rs
lib/segment/tests/integration/multivector_hnsw_test.rs
lib/collection/src/shards/forward_proxy_shard.rs
lib/segment/tests/integration/hnsw_discover_test.rs
lib/common/common/src/progress_tracker.rs
lib/segment/tests/integration/byte_storage_quantization_test.rs
lib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rs
lib/segment/tests/integration/segment_builder_test.rs
lib/segment/tests/integration/filtrable_hnsw_test.rs
lib/collection/src/collection_manager/optimizers/mod.rs
lib/segment/src/index/hnsw_index/tests/test_graph_connectivity.rs

📚 Learning: 2025-10-13T22:58:03.121Z

Learnt from: generall
Repo: qdrant/qdrant PR: 7400
File: lib/segment/src/id_tracker/simple_id_tracker.rs:234-241
Timestamp: 2025-10-13T22:58:03.121Z
Learning: SimpleIdTracker in lib/segment/src/id_tracker/simple_id_tracker.rs is being deprecated and should not receive fixes related to version tracking or recovery logic, as it has a different version storage structure that is incompatible with newer trackers.

Applied to files:

lib/segment/tests/integration/segment_on_disk_snapshot.rs
lib/segment/tests/integration/hnsw_incremental_build.rs
lib/segment/tests/integration/exact_search_test.rs
lib/segment/tests/integration/disbalanced_vectors_test.rs
lib/segment/tests/integration/gpu_hnsw_test.rs
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs
lib/segment/tests/integration/multivector_quantization_test.rs
lib/segment/tests/integration/hnsw_quantized_search_test.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/collection/src/shards/shard.rs
lib/segment/src/segment_constructor/segment_builder.rs
lib/segment/tests/integration/byte_storage_hnsw_test.rs
lib/segment/tests/integration/payload_index_test.rs
lib/collection/src/collection_manager/optimizers/indexing_optimizer.rs
lib/segment/tests/integration/multivector_filtrable_hnsw_test.rs
lib/segment/tests/integration/multivector_hnsw_test.rs
lib/segment/benches/multi_vector_search.rs
lib/collection/src/shards/forward_proxy_shard.rs
lib/segment/src/index/hnsw_index/hnsw.rs
lib/segment/tests/integration/hnsw_discover_test.rs
lib/segment/tests/integration/byte_storage_quantization_test.rs
lib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rs
lib/segment/tests/integration/segment_builder_test.rs
lib/segment/tests/integration/filtrable_hnsw_test.rs
lib/collection/src/collection_manager/optimizers/mod.rs

📚 Learning: 2025-08-15T11:42:00.297Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

lib/collection/src/shards/replica_set/mod.rs
lib/segment/tests/integration/disbalanced_vectors_test.rs
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs
lib/segment/tests/integration/multivector_quantization_test.rs
lib/segment/tests/integration/hnsw_quantized_search_test.rs
lib/segment/src/segment_constructor/segment_builder.rs
lib/segment/tests/integration/multivector_filtrable_hnsw_test.rs
lib/collection/src/shards/forward_proxy_shard.rs
lib/segment/tests/integration/byte_storage_quantization_test.rs
lib/collection/src/shards/proxy_shard.rs

📚 Learning: 2025-09-01T11:19:26.371Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7193
File: lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs:17-30
Timestamp: 2025-09-01T11:19:26.371Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs, the ChunkedMmapVectors underlying implementation does not validate against zero-sized vectors, so adding such validation in QuantizedChunkedMmapStorage::new is unnecessary and would be redundant.

Applied to files:

lib/segment/tests/integration/disbalanced_vectors_test.rs
lib/segment/tests/integration/multivector_quantization_test.rs
lib/segment/tests/integration/hnsw_quantized_search_test.rs
lib/segment/src/segment_constructor/segment_builder.rs
lib/segment/tests/integration/multivector_filtrable_hnsw_test.rs
lib/segment/tests/integration/byte_storage_quantization_test.rs

📚 Learning: 2025-04-20T19:08:45.034Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6403
File: lib/collection/src/collection_manager/holders/proxy_segment.rs:1178-1197
Timestamp: 2025-04-20T19:08:45.034Z
Learning: In the ProxySegment implementation, overwriting previous index changes when inserting a DeleteIfIncompatible operation is intentional. When an incompatible index is detected, the DeleteIfIncompatible operation should take precedence over any previously scheduled operations on that index.

Applied to files:

lib/collection/src/collection_manager/optimizers/segment_optimizer.rs

📚 Learning: 2025-08-15T11:41:10.629Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

lib/collection/src/collection_manager/optimizers/segment_optimizer.rs
lib/segment/tests/integration/multivector_quantization_test.rs
lib/segment/src/segment_constructor/segment_builder.rs

📚 Learning: 2025-08-11T00:37:34.100Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

lib/collection/src/collection_manager/optimizers/segment_optimizer.rs
lib/segment/src/segment_constructor/segment_builder.rs

📚 Learning: 2025-08-18T10:56:43.707Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.

Applied to files:

lib/segment/tests/integration/multivector_quantization_test.rs
lib/segment/src/segment_constructor/segment_builder.rs

📚 Learning: 2025-08-11T07:57:01.399Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.

Applied to files:

lib/segment/tests/integration/multivector_quantization_test.rs
lib/segment/src/segment_constructor/segment_builder.rs
lib/segment/tests/integration/byte_storage_quantization_test.rs

📚 Learning: 2025-09-01T11:42:06.964Z

Learnt from: timvisee
Repo: qdrant/qdrant PR: 7157
File: lib/shard/src/segment_holder/mod.rs:808-814
Timestamp: 2025-09-01T11:42:06.964Z
Learning: In Qdrant's segment holder, panicking when no segments exist during flush_all is intentional and preferred over graceful error handling, as having zero segments could permanently corrupt the WAL by acknowledging u64::MAX. The maintainers consider this condition impossible and use the panic as a fail-fast safety mechanism to prevent data corruption.

Applied to files:

lib/collection/src/shards/shard.rs

📚 Learning: 2025-04-20T18:32:43.471Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6403
File: lib/segment/src/index/struct_payload_index.rs:485-497
Timestamp: 2025-04-20T18:32:43.471Z
Learning: In StructPayloadIndex, checking if a field exists in `self.config.indexed_fields` is sufficient to determine if an index should exist, as the initialization process in `load_all_fields()` ensures that `field_indexes` is synchronized with the configuration, automatically rebuilding any missing indexes.

Applied to files:

lib/segment/src/segment_constructor/segment_builder.rs
lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-08-10T18:30:02.986Z

Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.

Applied to files:

lib/collection/src/operations/types.rs
lib/collection/src/collection/collection_ops.rs

📚 Learning: 2025-09-16T21:39:09.353Z

Learnt from: xzfc
Repo: qdrant/qdrant PR: 7232
File: lib/shard/src/segment_holder/mod.rs:1088-1094
Timestamp: 2025-09-16T21:39:09.353Z
Learning: The hnsw_format_force and hnsw_with_vectors flags in HnswGlobalConfig primarily affect existing segment conversion during load/startup, not the initial format selection for newly created empty segments. For new segments, format decisions are made by GraphLinksFormatParam::recommended() based on runtime conditions. Therefore, using HnswGlobalConfig::default() for temporary/proxy segments in SegmentHolder::build_tmp_segment is acceptable.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-05-30T15:26:14.488Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6479
File: lib/segment/src/index/hnsw_index/graph_layers_builder.rs:137-182
Timestamp: 2025-05-30T15:26:14.488Z
Learning: In the subgraph_connectivity method in graph_layers_builder.rs, the previous_visited_points vector and visited BitVec entry point marking are automatically re-initialized at the start of each retry loop iteration, so manual clearing is not needed.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs
lib/segment/src/index/hnsw_index/tests/test_graph_connectivity.rs

📚 Learning: 2025-08-06T09:56:59.311Z

Learnt from: xzfc
Repo: qdrant/qdrant PR: 6982
File: lib/segment/src/index/hnsw_index/graph_links/view.rs:217-220
Timestamp: 2025-08-06T09:56:59.311Z
Learning: In Qdrant's GraphLinksView implementation (lib/segment/src/index/hnsw_index/graph_links/view.rs), methods like links() and links_with_vectors() intentionally use unwrap() and can panic for performance reasons, maintaining consistency across similar methods. Error handling improvements are considered secondary to feature completion and require benchmarking before implementation.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-08-14T11:31:21.777Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.

Applied to files:

lib/segment/tests/integration/byte_storage_quantization_test.rs

📚 Learning: 2025-05-30T15:26:54.048Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6479
File: lib/segment/src/index/hnsw_index/graph_layers_builder.rs:156-159
Timestamp: 2025-05-30T15:26:54.048Z
Learning: In the subgraph_connectivity method in lib/segment/src/index/hnsw_index/graph_layers_builder.rs, the parameter `q` represents the edge removal/failure probability, not the edge retention probability. The logic `if coin_flip < q { continue; }` correctly simulates edge failures for percolation-based connectivity estimation.

Applied to files:

lib/segment/src/index/hnsw_index/tests/test_graph_connectivity.rs

📚 Learning: 2025-09-08T08:47:43.795Z

Learnt from: timvisee
Repo: qdrant/qdrant PR: 7188
File: lib/collection/src/update_handler.rs:810-813
Timestamp: 2025-09-08T08:47:43.795Z
Learning: In the Qdrant codebase, CollectionId implements traits that allow &CollectionId to convert Into<String>, so passing &collection_name to functions requiring Into<String> works correctly without needing .clone() or .to_string().

Applied to files:

lib/collection/src/collection/collection_ops.rs

🧬 Code graph analysis (9)

src/actix/api/collections_api.rs (3)

src/tonic/api/collections_api.rs (1)

get (64-81)

src/actix/helpers.rs (1)

time (141-147)

lib/collection/src/operations/verification/mod.rs (1)

new_unchecked_verification_pass (26-28)

lib/collection/src/shards/replica_set/mod.rs (5)

lib/collection/src/shards/forward_proxy_shard.rs (1)

optimizers_log (385-387)

lib/collection/src/shards/local_shard/mod.rs (1)

optimizers_log (961-963)

lib/collection/src/shards/proxy_shard.rs (1)

optimizers_log (172-174)

lib/collection/src/shards/queue_proxy_shard.rs (1)

optimizers_log (224-226)

lib/collection/src/shards/shard.rs (1)

optimizers_log (239-249)

lib/collection/src/shards/local_shard/mod.rs (5)

lib/collection/src/shards/forward_proxy_shard.rs (1)

optimizers_log (385-387)

lib/collection/src/shards/proxy_shard.rs (1)

optimizers_log (172-174)

lib/collection/src/shards/queue_proxy_shard.rs (1)

optimizers_log (224-226)

lib/collection/src/shards/replica_set/mod.rs (1)

optimizers_log (1259-1262)

lib/collection/src/shards/shard.rs (1)

optimizers_log (239-249)

lib/collection/src/update_handler.rs (1)

lib/collection/src/collection_manager/optimizers/mod.rs (1)

start (110-119)

tests/openapi/test_indexing_progress.py (2)

tests/openapi/helpers/helpers.py (1)

request_with_validation (41-116)

tests/openapi/conftest.py (1)

on_disk_vectors (11-12)

lib/collection/src/shards/forward_proxy_shard.rs (5)

lib/collection/src/shards/local_shard/mod.rs (1)

optimizers_log (961-963)

lib/collection/src/shards/proxy_shard.rs (1)

optimizers_log (172-174)

lib/collection/src/shards/queue_proxy_shard.rs (1)

optimizers_log (224-226)

lib/collection/src/shards/replica_set/mod.rs (1)

optimizers_log (1259-1262)

lib/collection/src/shards/shard.rs (1)

optimizers_log (239-249)

lib/segment/src/index/hnsw_index/hnsw.rs (2)

lib/segment/src/json_path/parse.rs (1)

json_path (28-40)

lib/segment/src/index/plain_payload_index.rs (1)

indexed_fields (75-77)

lib/collection/src/shards/proxy_shard.rs (5)

lib/collection/src/shards/forward_proxy_shard.rs (1)

optimizers_log (385-387)

lib/collection/src/shards/local_shard/mod.rs (1)

optimizers_log (961-963)

lib/collection/src/shards/queue_proxy_shard.rs (1)

optimizers_log (224-226)

lib/collection/src/shards/replica_set/mod.rs (1)

optimizers_log (1259-1262)

lib/collection/src/shards/shard.rs (1)

optimizers_log (239-249)

lib/collection/src/collection_manager/optimizers/mod.rs (1)

lib/common/common/src/progress_tracker.rs (1)

new_progress_tracker (123-144)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: e2e-tests
GitHub Check: integration-tests-consensus
GitHub Check: rust-tests (windows-latest)
GitHub Check: lint

🔇 Additional comments (72)

lib/common/common/src/progress_tracker.rs (7)

1-31: LGTM!

Clear module documentation with a visual example of the hierarchical structure. The error handling philosophy (fail gracefully in release, panic in debug) is well-documented and appropriate for non-critical telemetry features.

32-80: LGTM!

Clean separation between read-only ProgressView and the serializable ProgressTree snapshot. The skip_serializing_if annotations keep the API response clean by omitting null/empty fields.

82-117: LGTM!

The design appropriately separates concerns: ProgressTracker for write operations with path-based navigation, NodeProgress with Arc<AtomicU64> enabling lock-free progress updates from worker threads, and ProgressState capturing the full lifecycle. All ProgressState matches throughout the file use explicit exhaustive patterns as per coding guidelines.

146-242: LGTM!

The implementation correctly handles edge cases with graceful fallbacks in release mode while catching bugs via debug_assert! in development. The lock-free progress counter via Arc<AtomicU64> is well-suited for high-frequency updates from worker threads without contention on the main mutex.

244-248: LGTM!

RAII-based finalization ensures nodes are properly marked as Finished or Skipped when the tracker is dropped, preventing orphaned in-progress states.

250-317: LGTM!

The state machine transitions are correct: Pending becomes Skipped if never started, while InProgress transitions to Finished with accurate duration computed from Instant. The explicit match arms on ProgressState align with coding guidelines for exhaustive matching.

319-413: LGTM!

Comprehensive test coverage validating nested subtask creation, state transitions, and progress tracking. The custom test_render helper provides readable state assertions.

lib/common/common/Cargo.toml (1)

21-21: LGTM!

The chrono dependency is correctly added using the workspace pattern and is necessary for DateTime<Utc> in the progress tracking module.

lib/segment/tests/integration/batch_search_test.rs (1)

7-7: LGTM!

Correct integration of ProgressTracker into the test setup using the test helper new_for_test(). This aligns with the broader PR pattern of threading progress tracking through HNSW index build paths.

Also applies to: 171-171

lib/segment/tests/integration/byte_storage_hnsw_test.rs (1)

7-7: LGTM!

Consistent test wiring for progress tracking, matching the pattern established across other integration tests in this PR.

Also applies to: 209-209

lib/segment/tests/integration/multivector_filtrable_hnsw_test.rs (1)

7-7: LGTM!

Consistent test wiring for progress tracking in multivector HNSW tests.

Also applies to: 154-154

lib/collection/src/operations/types.rs (1)

286-291: LGTM! Well-structured API response type.

The IndexingProgress struct is appropriately designed for the new indexing progress endpoint, with clear field names and proper derives for JSON serialization and schema generation.

lib/segment/tests/integration/segment_on_disk_snapshot.rs (1)

149-149: LGTM! Test scaffolding appropriately updated.

The test correctly passes ProgressTracker::new_for_test() to match the updated SegmentBuilder::build signature, consistent with progress tracking integration across the test suite.

lib/collection/src/shards/proxy_shard.rs (1)

172-174: LGTM! Consistent delegation pattern.

The optimizers_log method correctly follows the established delegation pattern used throughout ProxyShard and matches the implementation in other shard types (LocalShard, ForwardProxyShard, QueueProxyShard).

lib/collection/src/collection_manager/optimizers/merge_optimizer.rs (1)

260-260: LGTM! Test updated to match optimizer signature.

The test correctly passes ProgressTracker::new_for_test() to enable progress telemetry during test execution, consistent with the broader progress tracking integration across optimizer tests.

lib/common/common/src/lib.rs (1)

22-22: LGTM! Module declaration exposes new progress tracking subsystem.

The public progress_tracker module is appropriately exposed to support the comprehensive progress tracking integration throughout the codebase (optimizers, segment builders, HNSW indexing, etc.).

lib/segment/tests/integration/byte_storage_quantization_test.rs (1)

362-362: LGTM! Test scaffolding correctly updated.

The test appropriately passes ProgressTracker::new_for_test() in VectorIndexBuildArgs, enabling progress tracking during HNSW index construction in the test path, consistent with similar test updates throughout the segment test suite.

tests/openapi_consistency_check.sh (1)

40-40: LGTM! API count correctly updated for new endpoint.

The expected API count is appropriately incremented from 70 to 71 to account for the new GET /collections/{collection_name}/indexing_progress endpoint, ensuring the consistency check passes.

lib/segment/tests/integration/hnsw_discover_test.rs (1)

123-123: LGTM! Both tests correctly updated for progress tracking.

Both test functions appropriately pass ProgressTracker::new_for_test() in VectorIndexBuildArgs to match the updated HNSW index build signature, consistent with the progress tracking integration pattern used throughout the test suite.

Also applies to: 251-251

lib/collection/src/shards/queue_proxy_shard.rs (1)

224-227: LGTM!

The optimizers_log accessor follows the established delegation pattern used by other methods like update_tracker() and correctly forwards to the wrapped shard's implementation.

lib/segment/benches/multi_vector_search.rs (1)

135-135: LGTM!

Correctly wires ProgressTracker::new_for_test() into the benchmark's HNSW index build path, consistent with the broader progress-tracking integration.

lib/segment/tests/integration/payload_index_test.rs (1)

296-300: LGTM!

The test correctly instantiates ProgressTracker::new_for_test() and passes it to SegmentBuilder::build(), aligning with the progress-tracking integration across test and build paths.

lib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rs (1)

366-367: LGTM!

Test correctly passes ProgressTracker::new_for_test() to the optimizer, consistent with the updated optimize() API signature.

lib/collection/src/shards/shard.rs (1)

239-249: LGTM!

The new optimizers_log() method correctly delegates to each shard variant with explicit match arms (per coding guidelines) and returns None for the Dummy variant, which appropriately has no optimizer log.

lib/segment/benches/hnsw_incremental_build.rs (1)

403-403: LGTM!

The benchmark correctly passes ProgressTracker::new_for_test() to VectorIndexBuildArgs, consistent with the progress-tracking integration across build paths.

lib/segment/tests/integration/filtrable_hnsw_test.rs (2)

171-172: LGTM!

Test correctly passes ProgressTracker::new_for_test() to VectorIndexBuildArgs, aligning with the progress-tracking integration in HNSW build paths.

332-333: LGTM!

Consistent usage of ProgressTracker::new_for_test() in the second test case.

lib/segment/tests/integration/disbalanced_vectors_test.rs (1)

8-8: LGTM! Progress tracking correctly integrated into test.

The test now imports ProgressTracker and wires it through to the SegmentBuilder::build call using ProgressTracker::new_for_test(), which is the appropriate pattern for test code.

Also applies to: 109-112

lib/segment/tests/integration/multivector_quantization_test.rs (1)

9-9: LGTM! Test correctly wires progress tracking.

The import and usage of ProgressTracker::new_for_test() in the VectorIndexBuildArgs is consistent with the broader test suite integration patterns for progress tracking.

Also applies to: 345-345

lib/segment/src/segment_constructor/segment_constructor_base.rs (1)

10-10: LGTM! Public API surface correctly extended for progress tracking.

The addition of pub progress: ProgressTracker to VectorIndexBuildArgs cleanly establishes the public API for threading progress tracking through vector index build operations.

Also applies to: 291-291

lib/collection/src/shards/local_shard/mod.rs (1)

961-963: LGTM! Optimizer log accessor correctly implemented.

The new optimizers_log() method provides access to the optimizer tracking log by cloning the internal Arc, which is the standard pattern. This aligns with similar accessors in other shard types (ForwardProxyShard, ProxyShard, QueueProxyShard).

lib/segment/tests/integration/exact_search_test.rs (1)

8-8: LGTM! Test correctly integrated with progress tracking.

The test appropriately imports ProgressTracker and uses ProgressTracker::new_for_test() in the VectorIndexBuildArgs, consistent with other test files in the PR.

Also applies to: 152-152

lib/collection/src/shards/replica_set/mod.rs (1)

21-21: LGTM! Replica set accessor correctly delegates to local shard.

The new optimizers_log() method properly delegates to the local shard's optimizer log accessor, returning Option to handle cases where no local shard exists. The imports and implementation are consistent with the delegation patterns used elsewhere in the replica set.

Also applies to: 37-37, 1259-1262

src/schema_generator.rs (1)

20-20: LGTM! Schema generator correctly extended for IndexingProgress.

The addition of IndexingProgress to the imports and as a field in AllDefinitions properly extends the schema generator to include the new indexing progress type.

Also applies to: 100-100

lib/collection/src/update_handler.rs (1)

318-318: LGTM! Progress tracker correctly wired through optimizer execution.

The changes properly destructure the (tracker, progress) tuple returned by Tracker::start() and thread the progress tracker through to the optimizer.optimize() call. The #[allow(clippy::too_many_arguments)] attribute is appropriate given the additional parameter handling complexity.

Also applies to: 408-421

lib/segment/tests/integration/segment_builder_test.rs (7)

9-9: LGTM!

The import for ProgressTracker is correctly added to support the new progress tracking functionality in tests.

84-88: LGTM!

The ProgressTracker::new_for_test() is correctly instantiated and passed to SegmentBuilder::build. This pattern is consistent with the PR's goal of threading progress tracking through the build pipeline.

171-175: LGTM!

Consistent usage of ProgressTracker::new_for_test() for the defragmented segment build test.

305-309: LGTM!

Progress tracker correctly added to the sparse segment build test.

382-384: LGTM!

Progress tracker correctly integrated into the estimate_build_time helper function.

450-454: LGTM!

Progress tracker correctly added to the bug regression test (5614).

594-598: LGTM!

Progress tracker correctly added to the mmap payload segment build test.

lib/segment/tests/integration/hnsw_incremental_build.rs (2)

9-9: LGTM!

Import correctly added for ProgressTracker.

152-163: LGTM!

The progress field is correctly added to VectorIndexBuildArgs using ProgressTracker::new_for_test(). This properly integrates progress tracking into the incremental HNSW build test path.

tests/openapi/test_indexing_progress.py (1)

1-23: LGTM!

The test correctly validates the new /indexing_progress endpoint. The fixture properly sets up and tears down the collection, and the test verifies the expected empty response structure for a freshly indexed collection.

Consider adding a test that verifies behavior with the completed_limit query parameter to ensure full coverage of the API surface.

lib/segment/tests/integration/gpu_hnsw_test.rs (2)

7-7: LGTM!

Import correctly added for ProgressTracker.

143-152: LGTM!

The progress field is correctly added to VectorIndexBuildArgs with ProgressTracker::new_for_test(). This properly integrates progress tracking into the GPU HNSW build test.

lib/collection/src/collection_manager/optimizers/vacuum_optimizer.rs (1)

224-224: LGTM! Test scaffolding for progress tracking.

The import and usage of ProgressTracker::new_for_test() in test calls correctly threads progress tracking through the optimizer test infrastructure without affecting test logic.

Also applies to: 358-358, 511-511, 628-628

lib/collection/src/collection_manager/optimizers/indexing_optimizer.rs (1)

294-294: LGTM! Consistent progress tracking integration.

The progress tracker is consistently threaded through all test optimize calls using ProgressTracker::new_for_test(), maintaining test coverage while enabling progress instrumentation.

Also applies to: 402-402, 552-552, 569-569, 699-699, 820-820, 990-990

lib/segment/tests/integration/multivector_hnsw_test.rs (1)

8-8: LGTM! Progress tracking wired into HNSW build test.

The progress field initialization with ProgressTracker::new_for_test() correctly extends the test to support progress tracking during index construction.

Also applies to: 138-138

lib/segment/src/index/hnsw_index/tests/test_graph_connectivity.rs (1)

7-7: LGTM! Progress tracking added to graph connectivity test.

The progress field correctly integrates progress tracking into the HNSW build path for connectivity testing.

Also applies to: 88-88

lib/collection/src/shards/forward_proxy_shard.rs (1)

385-387: LGTM!

The optimizers_log method correctly delegates to the wrapped shard, following the same pattern used in ProxyShard and QueueProxyShard. This ensures consistent access to optimizer logs across all shard types.

lib/collection/src/collection/collection_ops.rs (1)

418-440: LGTM!

The indexing_progress method correctly aggregates progress views from all replica sets. The implementation:

Properly handles missing optimizer logs with continue

Sorts both lists by most recent first using Reverse(started_at())

Truncates completed list to respect the limit

Creates snapshots with a consistent root label

lib/segment/tests/integration/hnsw_quantized_search_test.rs (2)

145-148: LGTM!

The test correctly uses ProgressTracker::new_for_test() to satisfy the updated VectorIndexBuildArgs signature. This is the appropriate pattern for test code that doesn't need actual progress tracking.

445-447: LGTM!

The SegmentBuilder::build call is updated correctly with the new progress parameter using ProgressTracker::new_for_test().

lib/collection/src/collection_manager/optimizers/segment_optimizer.rs (6)

429-431: LGTM!

The subtask creation follows a clear pattern for tracking the major phases of segment building. Each subtask is created upfront and then started/dropped at the appropriate points in the code flow.

461-468: LGTM!

The copy_data subtask is correctly started before the update operation and dropped after completion to mark the end of this phase.

491-493: LGTM!

The populate_vector_storages subtask correctly wraps the populate_vector_storages() call with start and drop for progress tracking.

533-544: LGTM!

The wait_cpu_permit subtask properly tracks the time spent waiting for the resource budget, and the progress tracker is correctly passed through to segment_builder.build() for continued progress tracking during the build phase.

605-613: LGTM!

The optimize method signature is correctly updated to accept a ProgressTracker parameter, which is then threaded through to optimize_segment_propagate_changes.

848-875: LGTM!

The optimize_segment_propagate_changes method correctly accepts the ProgressTracker parameter and passes it to build_new_segment, ensuring progress tracking continues through the entire optimization pipeline.

lib/segment/src/index/hnsw_index/hnsw.rs (3)

268-293: Progress subtask setup looks correct.

The progress tracking is well-integrated with conditional subtask creation based on build_main_graph and payload_m.m > 0. The .unwrap() calls on lines 414-415 are safe since they're guarded by the same build_main_graph condition.

455-477: Progress counter implementation is correct.

The use of Ordering::Relaxed for the progress counter is appropriate since progress tracking doesn't require strict synchronization guarantees. The counter correctly tracks insertions across both single-threaded and parallel phases.

495-496: Progress tracking for additional links is correctly implemented.

Per-field progress trackers are properly started and will be dropped at the end of each loop iteration. The outer progress_additional_links tracker is started and will be dropped when it goes out of scope.

Also applies to: 564-566

lib/segment/src/segment_constructor/segment_builder.rs (4)

485-496: Well-structured progress subtask hierarchy.

The progress tracking is organized with clear subtasks for each phase (quantization, payload_index, vector_index, sparse_vector_index). The per-field progress naming includes the schema type which aids in debugging and monitoring.

576-580: Per-field progress tracking is correctly implemented.

Each field's progress is started before indexing. The progress tracker is dropped automatically when it goes out of scope at the end of each loop iteration, which correctly signals completion.

601-642: Vector index progress tracking is well-integrated.

Using running_subtask(vector_name) creates and starts a per-vector progress tracker in one call, which is appropriate for this context. The parent progress_vector_index is correctly started before the loop and dropped after.

751-773: Quantization progress tracking is correctly implemented.

Per-vector progress is created using running_subtask and explicitly dropped after quantization completes. This pattern is consistent with the vector index progress tracking.

lib/collection/src/collection_manager/optimizers/mod.rs (4)

30-34: New IndexingProgressViews struct is well-designed.

The struct correctly separates ongoing and completed progress views. Using #[derive(Default)] is appropriate since it initializes both vectors as empty.

78-92: progress_views() implementation follows best practices.

The explicit match on all TrackerStatus variants adheres to the coding guideline for exhaustive matches. Reverse iteration is consistent with to_telemetry(). As per coding guidelines, this explicit matching ensures new enum variants won't be silently ignored.

127-136: Unified start time source is a good design choice.

Using progress_view.started_at() as the single source of truth for start time eliminates potential inconsistencies between telemetry and progress tracking timestamps.

110-119: Clean API design for progress tracking integration.

The start() method now returns both the Tracker (for registration/telemetry) and the ProgressTracker (for actual progress updates). This separation ensures the progress view is owned by the tracker for consistent telemetry while giving callers the ability to report progress. All callers have been properly updated to destructure the tuple return.

docs/redoc/master/openapi.json

lib/common/common/src/progress_tracker.rs

lib/segment/src/index/hnsw_index/hnsw.rs

openapi/openapi-collections.ytt.yaml

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (6)

lib/collection/src/operations/types.rs (1)
286-291: Consider adding documentation for the public API type.

The new IndexingProgress struct is part of the public API surface (exposed via a REST endpoint) but lacks documentation comments. Consider adding doc comments to explain:

The purpose of this struct

What "ongoing" and "completed" progress entries represent

How this relates to collection indexing operations

Example:
+/// Progress information for indexing operations on a collection.
+///
+/// Tracks both in-progress and completed indexing tasks.
 #[derive(Debug, Serialize, JsonSchema)]
 pub struct IndexingProgress {
+    /// Currently running indexing operations
     pub ongoing: Vec<ProgressTree>,
+    /// Recently completed indexing operations
     pub completed: Vec<ProgressTree>,
 }
docs/redoc/master/openapi.json (3)
1837-1845: completed_limit: add format/example (and optional upper bound) for better client generation

Looks good switching to integer. To improve codegen and docs, add a numeric format and example; if the server enforces a cap, document with a maximum.

Apply:
           "schema": {
-              "type": "integer",
-              "minimum": 0,
-              "default": 0
+              "type": "integer",
+              "format": "uint64",
+              "minimum": 0,
+              "default": 0,
+              "example": 10
+              /* If applicable:
+              ,"maximum": 1000
+              */
           }
Note: since this file is generated, make the change in the Rust/OpenAPI schema source (e.g., schemars attrs/doc comments), not here. Based on learnings, docs/redoc/master/openapi.json is auto-generated.

16165-16186: IndexingProgress: clarify ordering and defaults for arrays

Consider stating ordering for completed/ongoing (e.g., newest-first).

You may set default: [] for both arrays to signal empty lists explicitly.

Example patch:
       "properties": {
         "ongoing": {
           "type": "array",
+          "default": [],
           "items": { "$ref": "#/components/schemas/ProgressTree" }
         },
         "completed": {
           "type": "array",
+          "default": [],
           "items": { "$ref": "#/components/schemas/ProgressTree" }
         }
       }
If editing, please update the Rust schema source so the generator emits these hints. Based on learnings, this file is generated.

16187-16237: ProgressTree: minor schema polish (optional defaults/examples)

children is required; adding default: [] avoids null ambiguity in some clients.

Add a short example object to aid docs/renderers.

Example:
       "children": {
         "description": "Child operations.",
         "type": "array",
+        "default": [],
         "items": { "$ref": "#/components/schemas/ProgressTree" }
       }
+    },
+    "example": {
+      "name": "Build HNSW index",
+      "started_at": "2025-12-01T10:00:00Z",
+      "finished_at": "2025-12-01T10:05:12Z",
+      "duration_sec": 312.4,
+      "done": 500000,
+      "total": 500000,
+      "children": [
+        { "name": "Merge segments", "started_at": "2025-12-01T10:00:00Z", "finished_at": "2025-12-01T10:02:10Z", "duration_sec": 130.1, "children": [] }
+      ]
+    }
As this JSON is generated, make changes in the Rust schema definitions. Based on learnings, this file is generated.
src/actix/api/collections_api.rs (1)
239-257: Make the function signature consistent with other endpoints.

This endpoint uses fn ... -> impl Future<Output = HttpResponse> without async fn and returns the future from helpers::time directly. While functionally correct, this is inconsistent with other endpoints in this file (e.g., get_collections, get_collection_aliases, get_cluster_info) which use async fn and .await on helpers::time.

Apply this diff to make it consistent:
-fn get_indexing_progress(
+async fn get_indexing_progress(
     dispatcher: web::Data<Dispatcher>,
     collection: Path<CollectionPath>,
     ActixAccess(access): ActixAccess,
     params: Query<IndexingProgressParam>,
-) -> impl Future<Output = HttpResponse> {
-    helpers::time(async move {
+) -> HttpResponse {
+    helpers::time(async move {
         let pass = new_unchecked_verification_pass();
         let collection_pass =
             access.check_collection_access(&collection.name, AccessRequirements::new())?;
         Ok(dispatcher
             .toc(&access, &pass)
             .get_collection(&collection_pass)
             .await?
             .indexing_progress(params.completed_limit.unwrap_or(0) as usize)
             .await?)
-    })
+    }).await
 }
lib/segment/src/index/hnsw_index/hnsw.rs (1)

236-237: Progress tracking wiring for HNSW build looks correct; only minor optional polish

The new ProgressTracker integration for migrate, main_graph, and per-field additional_links is consistent: subtasks are created once, started where work actually happens, and finished via RAII drops after the relevant phases complete.

The track_progress counter is shared across rayon threads and incremented with fetch_add(1, Ordering::Relaxed), which is appropriate for a pure monotonic progress counter that isn’t used for synchronization.

The build_main_graph flag correctly guards both main-graph construction and the unwrapping of progress_migrate / progress_main_graph, so there is no panic path even when GPU construction succeeds and flips build_main_graph to false.

Two small, non-blocking tweaks you might consider:

Only create the migrate subtask when old_index.is_some(), so the progress tree doesn’t show a never-started “migrate” node when there is nothing to migrate.

If you expect future refactors to touch the build_main_graph gating, you could replace the unwrap()s with an if let Some(progress_main_graph) / Some(progress_migrate) pattern inside the if build_main_graph block to encode the invariant more explicitly.

Also applies to: 268-293, 413-416, 422-422, 451-451, 455-459, 476-476, 489-489, 495-497, 564-566

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a658332 and 0f2fc28.

📒 Files selected for processing (18)

docs/redoc/master/openapi.json (2 hunks)
lib/collection/src/collection/collection_ops.rs (3 hunks)
lib/collection/src/collection_manager/optimizers/mod.rs (5 hunks)
lib/collection/src/operations/types.rs (2 hunks)
lib/collection/src/shards/forward_proxy_shard.rs (3 hunks)
lib/collection/src/shards/local_shard/mod.rs (1 hunks)
lib/collection/src/shards/proxy_shard.rs (3 hunks)
lib/collection/src/shards/queue_proxy_shard.rs (2 hunks)
lib/collection/src/shards/replica_set/mod.rs (3 hunks)
lib/collection/src/shards/shard.rs (2 hunks)
lib/common/common/src/progress_tracker.rs (1 hunks)
lib/segment/src/index/hnsw_index/hnsw.rs (10 hunks)
openapi/openapi-collections.ytt.yaml (1 hunks)
src/actix/api/collections_api.rs (3 hunks)
src/schema_generator.rs (2 hunks)
tests/consensus_tests/auth_tests/test_jwt_access.py (2 hunks)
tests/openapi/test_indexing_progress.py (1 hunks)
tests/openapi_consistency_check.sh (1 hunks)

✅ Files skipped from review due to trivial changes (1)

tests/openapi_consistency_check.sh

🚧 Files skipped from review as they are similar to previous changes (7)

src/schema_generator.rs
lib/collection/src/shards/queue_proxy_shard.rs
lib/collection/src/shards/proxy_shard.rs
lib/collection/src/shards/local_shard/mod.rs
openapi/openapi-collections.ytt.yaml
tests/openapi/test_indexing_progress.py
lib/common/common/src/progress_tracker.rs

🧰 Additional context used

📓 Path-based instructions (1)

**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust
In new code, don't use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice - use bytemuck or zerocopy crates instead
Prefer explicit exhaustive matches over catch-all _ arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using : _ over using .. in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)

Files:

lib/collection/src/collection/collection_ops.rs
lib/collection/src/shards/replica_set/mod.rs
lib/segment/src/index/hnsw_index/hnsw.rs
lib/collection/src/operations/types.rs
src/actix/api/collections_api.rs
lib/collection/src/shards/shard.rs
lib/collection/src/shards/forward_proxy_shard.rs
lib/collection/src/collection_manager/optimizers/mod.rs

🧠 Learnings (11)

📚 Learning: 2025-09-08T08:47:43.795Z

Learnt from: timvisee
Repo: qdrant/qdrant PR: 7188
File: lib/collection/src/update_handler.rs:810-813
Timestamp: 2025-09-08T08:47:43.795Z
Learning: In the Qdrant codebase, CollectionId implements traits that allow &CollectionId to convert Into<String>, so passing &collection_name to functions requiring Into<String> works correctly without needing .clone() or .to_string().

Applied to files:

lib/collection/src/collection/collection_ops.rs

📚 Learning: 2025-08-10T18:30:02.986Z

Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.

Applied to files:

lib/collection/src/collection/collection_ops.rs

📚 Learning: 2025-08-10T18:26:12.443Z

Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: docs/redoc/master/openapi.json:13645-13652
Timestamp: 2025-08-10T18:26:12.443Z
Learning: In qdrant, docs/redoc/master/openapi.json is auto-generated from the REST schemas. To change field docs, edit lib/api/src/rest/schema.rs (e.g., add doc comments or #[schemars(description = ...)]). Specifically, UpdateVectors.update_filter lacked a description and should state: "If specified, only update vectors for points that match this filter; points not matching the filter are left unchanged."

Applied to files:

docs/redoc/master/openapi.json

📚 Learning: 2025-08-15T11:42:00.297Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

lib/collection/src/shards/replica_set/mod.rs
lib/collection/src/shards/forward_proxy_shard.rs

📚 Learning: 2025-09-16T21:39:09.353Z

Learnt from: xzfc
Repo: qdrant/qdrant PR: 7232
File: lib/shard/src/segment_holder/mod.rs:1088-1094
Timestamp: 2025-09-16T21:39:09.353Z
Learning: The hnsw_format_force and hnsw_with_vectors flags in HnswGlobalConfig primarily affect existing segment conversion during load/startup, not the initial format selection for newly created empty segments. For new segments, format decisions are made by GraphLinksFormatParam::recommended() based on runtime conditions. Therefore, using HnswGlobalConfig::default() for temporary/proxy segments in SegmentHolder::build_tmp_segment is acceptable.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-05-30T15:26:14.488Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6479
File: lib/segment/src/index/hnsw_index/graph_layers_builder.rs:137-182
Timestamp: 2025-05-30T15:26:14.488Z
Learning: In the subgraph_connectivity method in graph_layers_builder.rs, the previous_visited_points vector and visited BitVec entry point marking are automatically re-initialized at the start of each retry loop iteration, so manual clearing is not needed.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-08-06T09:56:59.311Z

Learnt from: xzfc
Repo: qdrant/qdrant PR: 6982
File: lib/segment/src/index/hnsw_index/graph_links/view.rs:217-220
Timestamp: 2025-08-06T09:56:59.311Z
Learning: In Qdrant's GraphLinksView implementation (lib/segment/src/index/hnsw_index/graph_links/view.rs), methods like links() and links_with_vectors() intentionally use unwrap() and can panic for performance reasons, maintaining consistency across similar methods. Error handling improvements are considered secondary to feature completion and require benchmarking before implementation.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-04-20T18:32:43.471Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6403
File: lib/segment/src/index/struct_payload_index.rs:485-497
Timestamp: 2025-04-20T18:32:43.471Z
Learning: In StructPayloadIndex, checking if a field exists in `self.config.indexed_fields` is sufficient to determine if an index should exist, as the initialization process in `load_all_fields()` ensures that `field_indexes` is synchronized with the configuration, automatically rebuilding any missing indexes.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-10-13T22:58:03.121Z

Learnt from: generall
Repo: qdrant/qdrant PR: 7400
File: lib/segment/src/id_tracker/simple_id_tracker.rs:234-241
Timestamp: 2025-10-13T22:58:03.121Z
Learning: SimpleIdTracker in lib/segment/src/id_tracker/simple_id_tracker.rs is being deprecated and should not receive fixes related to version tracking or recovery logic, as it has a different version storage structure that is incompatible with newer trackers.

Applied to files:

lib/collection/src/shards/shard.rs
lib/collection/src/shards/forward_proxy_shard.rs
lib/collection/src/collection_manager/optimizers/mod.rs

📚 Learning: 2025-07-11T11:35:21.549Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Applied to files:

lib/collection/src/shards/shard.rs
lib/collection/src/shards/forward_proxy_shard.rs
lib/collection/src/collection_manager/optimizers/mod.rs

📚 Learning: 2025-09-01T11:42:06.964Z

Learnt from: timvisee
Repo: qdrant/qdrant PR: 7157
File: lib/shard/src/segment_holder/mod.rs:808-814
Timestamp: 2025-09-01T11:42:06.964Z
Learning: In Qdrant's segment holder, panicking when no segments exist during flush_all is intentional and preferred over graceful error handling, as having zero segments could permanently corrupt the WAL by acknowledging u64::MAX. The maintainers consider this condition impossible and use the panic as a fail-fast safety mechanism to prevent data corruption.

Applied to files:

lib/collection/src/shards/shard.rs

🧬 Code graph analysis (8)

lib/collection/src/collection/collection_ops.rs (2)

lib/segment/src/data_types/query_context.rs (1)

default (125-127)

lib/collection/src/collection/mod.rs (1)

shards_holder (853-855)

lib/collection/src/shards/replica_set/mod.rs (5)

lib/collection/src/shards/forward_proxy_shard.rs (1)

optimizers_log (385-387)

lib/collection/src/shards/local_shard/mod.rs (1)

optimizers_log (961-963)

lib/collection/src/shards/proxy_shard.rs (1)

optimizers_log (172-174)

lib/collection/src/shards/queue_proxy_shard.rs (1)

optimizers_log (224-226)

lib/collection/src/shards/shard.rs (1)

optimizers_log (239-249)

lib/segment/src/index/hnsw_index/hnsw.rs (4)

lib/segment/src/json_path/parse.rs (1)

json_path (28-40)

lib/segment/src/index/hnsw_index/graph_layers_healer.rs (1)

new (27-56)

lib/segment/src/index/hnsw_index/mod.rs (1)

new (30-32)

lib/common/common/src/progress_tracker.rs (1)

drop (245-247)

tests/consensus_tests/auth_tests/test_jwt_access.py (1)

lib/storage/src/rbac/ops_checks.rs (13)

check_access (82-82)

check_access (113-115)

check_access (127-130)

check_access (142-144)

check_access (156-158)

check_access (170-172)

check_access (184-192)

check_access (204-207)

check_access (219-221)

check_access (233-241)

check_access (267-269)

check_access (281-283)

check_access (304-306)

src/actix/api/collections_api.rs (6)

src/tonic/api/collections_api.rs (1)

get (64-81)

lib/common/common/src/counter/counter_cell.rs (5)

get (26-28)

get (86-88)

new (13-15)

new (80-82)

new (148-150)

src/actix/helpers.rs (1)

time (141-147)

lib/collection/src/operations/verification/mod.rs (1)

new_unchecked_verification_pass (26-28)

lib/collection/src/shards/forward_proxy_shard.rs (1)

new (63-102)

lib/collection/src/shards/local_shard/mod.rs (2)

new (189-274)

new (1055-1060)

lib/collection/src/shards/shard.rs (5)

lib/collection/src/shards/forward_proxy_shard.rs (1)

optimizers_log (385-387)

lib/collection/src/shards/local_shard/mod.rs (1)

optimizers_log (961-963)

lib/collection/src/shards/proxy_shard.rs (1)

optimizers_log (172-174)

lib/collection/src/shards/queue_proxy_shard.rs (1)

optimizers_log (224-226)

lib/collection/src/shards/replica_set/mod.rs (1)

optimizers_log (1259-1262)

lib/collection/src/shards/forward_proxy_shard.rs (5)

lib/collection/src/shards/local_shard/mod.rs (1)

optimizers_log (961-963)

lib/collection/src/shards/proxy_shard.rs (1)

optimizers_log (172-174)

lib/collection/src/shards/queue_proxy_shard.rs (1)

optimizers_log (224-226)

lib/collection/src/shards/replica_set/mod.rs (1)

optimizers_log (1259-1262)

lib/collection/src/shards/shard.rs (1)

optimizers_log (239-249)

lib/collection/src/collection_manager/optimizers/mod.rs (1)

lib/common/common/src/progress_tracker.rs (2)

new_progress_tracker (123-144)

start (226-241)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: integration-tests
GitHub Check: integration-tests-consensus
GitHub Check: e2e-tests
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-consistency
GitHub Check: test-consensus-compose
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: rust-tests (macos-latest)
GitHub Check: rust-tests (windows-latest)
GitHub Check: Build Qdrant Edge Python bindings
GitHub Check: lint

🔇 Additional comments (11)

lib/collection/src/collection_manager/optimizers/mod.rs (2)

110-118: LGTM! Signature change is appropriately structured.

The change to return a tuple (Self, ProgressTracker) from Tracker::start properly integrates progress tracking. The implementation correctly:

Initializes progress tracking via new_progress_tracker()

Stores the ProgressView in the tracker

Returns the ProgressTracker for use by callers

78-92: LGTM! Progress aggregation logic is correct.

The progress_views method properly separates ongoing from completed operations based on TrackerStatus, iterating in reverse order to show latest items first.

lib/collection/src/shards/shard.rs (1)

239-249: LGTM! Properly delegates to all shard variants.

The optimizers_log method correctly:

Delegates to each concrete shard variant's implementation

Returns None for the Dummy variant (which has no optimizer log)

Follows the established pattern used by other accessor methods like update_tracker()

lib/collection/src/shards/replica_set/mod.rs (1)

1259-1262: LGTM! Async delegation is properly implemented.

The optimizers_log method correctly:

Uses async to acquire the local shard read lock

Chains and_then to delegate to the local shard's optimizers_log when present

Returns None when no local shard exists

Follows the pattern established by other replica set accessors

tests/consensus_tests/auth_tests/test_jwt_access.py (2)

987-991: LGTM! Test follows the established pattern.

The test properly:

Follows the naming convention (test_{action_name})

Uses check_access with the correct action name

Provides the required collection_name path parameter

Omits grpc_request appropriately since no gRPC endpoint is defined

141-146: The get_indexing_progress endpoint is intentionally REST-only. No gRPC service method exists in the proto definitions, and there are no TODO comments indicating future gRPC support. This aligns with the endpoint's design as a REST API feature.

Likely an incorrect or invalid review comment.

src/actix/api/collections_api.rs (1)

234-237: LGTM!

The parameter struct is well-defined with appropriate validation traits.

lib/collection/src/collection/collection_ops.rs (2)

1-1: LGTM!

The new imports are appropriate for the sorting logic and progress aggregation in the indexing_progress method.

Also applies to: 11-11

418-440: LGTM!

The indexing_progress method correctly:

Aggregates progress views from all replica sets

Sorts ongoing and completed items by started_at in reverse chronological order

Applies the completed_limit via truncation

Returns a well-structured IndexingProgress rooted at "Segment Optimizing"

The implementation is clear and follows the expected aggregation pattern for cross-shard monitoring.

lib/collection/src/shards/forward_proxy_shard.rs (2)

10-10: LGTM!

The imports are appropriate for exposing the optimizers log functionality.

Also applies to: 25-25

385-387: LGTM!

The optimizers_log method correctly delegates to the wrapped shard, maintaining consistency with the delegation pattern used in ProxyShard and QueueProxyShard. This provides a unified access point for optimizer tracking across all shard types.

lib/segment/src/index/hnsw_index/hnsw.rs

lib/collection/src/operations/types.rs

coderabbitai · 2025-12-08T23:17:22Z

📝 Walkthrough

Walkthrough

Adds a hierarchical, thread-safe progress-tracking subsystem (common::progress_tracker) exposing ProgressTracker, ProgressView, and ProgressTree. Threads ProgressTracker through optimizer and segment build paths (SegmentOptimizer trait methods, SegmentBuilder::build, HNSW build), updates Tracker::start to return a progress handle, and surfaces aggregated views via TrackerLog.progress_views and Collection::optimizations. Exposes a REST endpoint GET /collections/{collection_name}/optimizations with OpenAPI/schema additions and tests. Multiple shard proxy types gain optimizers_log accessors to expose per-shard TrackerLog.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Areas needing extra attention:

lib/common/common/src/progress_tracker.rs — concurrency, Drop semantics, snapshot correctness, and public API invariants.
Widespread trait/signature changes: SegmentOptimizer trait, SegmentBuilder::build, segment_constructor types, HNSW build flow — verify all implementations and call sites updated.
Tracker API changes and aggregation: optimizers/mod.rs (Tracker::start -> (Tracker, ProgressTracker)), TrackerLog.progress_views(), and collection_ops::optimizations aggregation logic.
Shard surface changes: new optimizers_log() across Shard, ProxyShard, ForwardProxyShard, QueueProxyShard, ShardReplicaSet — ensure None/Some handling and thread-safety (Arc<Mutex>) consistency.
Integration points: update of update_workers/optimization_worker.rs to pass progress into Optimizer::optimize and adjust result handling.
OpenAPI/schema/tests: docs/redoc/master/openapi.json, openapi generator, tests/openapi/test_optimizations.py, and openapi consistency count bump.

Possibly related PRs

Incremental HNSW index building: append-only case #6325 — touches HNSW build path and VectorIndexBuildArgs; likely overlaps where build arguments and progress propagation were modified.
Measure Payload Index IO Writes #6137 — modifies collection optimizer codepaths and optimizer method signatures; may conflict with the threaded progress parameter and optimizer implementations.

Suggested reviewers

generall
ffuugoo
timvisee

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Indexing progress' accurately reflects the main change: adding optimization progress tracking via a new progress_tracker module and /collections/{name}/optimizations endpoint.
Description check	✅ Passed	The description clearly explains the purpose, implementation details, and expected output structure of the optimization progress tracking feature.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch index-progress

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (9)

src/actix/api/collections_api.rs (2)
234-238: The Validate derive is unused.

OptimizationsParam derives Validate but has no validation attributes. Consider removing the derive or adding validation (e.g., #[validate(range(max = 1000))] on completed_limit) if you want to enforce limits.

249-253: Potential truncation on 32-bit platforms.

The cast as usize from u64 could silently truncate on 32-bit platforms. While unlikely to be an issue in practice given the small default value, consider using try_into() with a fallback or capping the value explicitly.
-    let completed_limit = params.completed.unwrap_or(false).then(|| {
-        params
-            .completed_limit
-            .unwrap_or(DEFAULT_OPTIMIZATIONS_COMPLETED_LIMIT) as usize
-    });
+    let completed_limit = params.completed.unwrap_or(false).then(|| {
+        params
+            .completed_limit
+            .unwrap_or(DEFAULT_OPTIMIZATIONS_COMPLETED_LIMIT)
+            .min(u64::from(u32::MAX)) as usize
+    });
docs/redoc/master/openapi.json (3)

1818-1917: New endpoint looks solid; minor polish for client ergonomics.

Consider adding a numeric format for completed_limit (e.g., "uint" or "int64") to match existing integer formats elsewhere and aid generators.

Optional: rename operationId to get_collection_optimizations for uniqueness/readability across clients.

Based on learnings, docs are auto-generated from Rust REST schemas; please apply in lib/api/src/rest/schema.rs and re-generate.

16175-16198: Clarify nullable semantics for completed.

If completed=false, return completed: null explicitly (and/or set "default": null in the schema) to make the contract obvious to SDKs. Otherwise, some generators may treat the field as optional vs. intentionally null.

Based on learnings, update in schema source and re-generate.

16199-16249: ProgressTree schema LGTM; tiny consistency nit.

done/total use uint64; elsewhere many counters use "format": "uint". Not critical, but aligning formats could reduce client type variance.
lib/collection/src/collection/collection_ops.rs (1)
418-449: LGTM!

The optimizations() method correctly aggregates progress views across all shards. The sorting by Reverse(started_at()) ensures newest operations appear first, and the conditional truncation logic for completed_limit is sound.

The invariant that all_completed.is_some() == completed_limit.is_some() holds, making the unwrap() on line 442 safe. For slightly more idiomatic Rust, you could use zip to avoid the unwrap:
-        if let Some(all_completed) = all_completed.as_mut() {
-            all_completed.sort_by_key(|v| Reverse(v.started_at()));
-            // Unwrap is ok because `all_completed` and `completed_limit`
-            // either are both `Some` or both `None`.
-            all_completed.truncate(completed_limit.unwrap());
-        }
+        if let Some((all_completed, limit)) = all_completed.as_mut().zip(completed_limit) {
+            all_completed.sort_by_key(|v| Reverse(v.started_at()));
+            all_completed.truncate(limit);
+        }
lib/segment/src/index/hnsw_index/hnsw.rs (2)
414-416: Unwrap on progress trackers could panic if logic changes.

Lines 414-415 use .unwrap() on progress_main_graph and progress_migrate. These are safe now because they're only reached when build_main_graph is true (same condition used to create them on lines 280-281), but the logic relies on matching conditions in two separate places.

Consider restructuring to avoid the unwrap:
-        if build_main_graph {
-            let progress_main_graph = progress_main_graph.unwrap();
-            let progress_migrate = progress_migrate.unwrap();
+        if let (Some(progress_main_graph), Some(progress_migrate)) = (progress_main_graph, progress_migrate) {
This makes the invariant explicit at the usage site.

582-583: Consider passing None explicitly for total count.

The call to track_progress(None) indicates the total is unknown. Since the number of payload blocks is not known ahead of time, this is appropriate. However, if a size hint is available from the iterator, it could be used to provide a more informative progress display.
lib/segment/src/segment_constructor/segment_builder.rs (1)
644-670: Consider adding per-vector progress tracking for sparse indices.

The sparse vector index loop starts the progress tracker at line 644 but doesn't create per-vector subtasks like the dense vector index does at line 624. For consistency and better progress visibility, consider adding subtasks for each sparse vector:
 progress_sparse_vector_index.start();
 for (vector_name, sparse_vector_config) in &segment_config.sparse_vector_data {
+    let _progress_vector = progress_sparse_vector_index.running_subtask(vector_name);
     let vector_index_path = get_vector_index_path(temp_dir.path(), vector_name);

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1962d89 and da59034.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (47)

docs/redoc/master/openapi.json (2 hunks)
lib/collection/src/collection/collection_ops.rs (3 hunks)
lib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rs (7 hunks)
lib/collection/src/collection_manager/optimizers/indexing_optimizer.rs (7 hunks)
lib/collection/src/collection_manager/optimizers/merge_optimizer.rs (2 hunks)
lib/collection/src/collection_manager/optimizers/mod.rs (5 hunks)
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs (9 hunks)
lib/collection/src/collection_manager/optimizers/vacuum_optimizer.rs (4 hunks)
lib/collection/src/operations/types.rs (2 hunks)
lib/collection/src/shards/forward_proxy_shard.rs (3 hunks)
lib/collection/src/shards/local_shard/mod.rs (1 hunks)
lib/collection/src/shards/proxy_shard.rs (3 hunks)
lib/collection/src/shards/queue_proxy_shard.rs (2 hunks)
lib/collection/src/shards/replica_set/mod.rs (3 hunks)
lib/collection/src/shards/shard.rs (2 hunks)
lib/collection/src/update_workers/optimization_worker.rs (1 hunks)
lib/common/common/Cargo.toml (1 hunks)
lib/common/common/src/lib.rs (1 hunks)
lib/common/common/src/progress_tracker.rs (1 hunks)
lib/segment/benches/hnsw_incremental_build.rs (2 hunks)
lib/segment/benches/multi_vector_search.rs (2 hunks)
lib/segment/src/index/hnsw_index/hnsw.rs (14 hunks)
lib/segment/src/index/hnsw_index/tests/test_graph_connectivity.rs (2 hunks)
lib/segment/src/segment_constructor/segment_builder.rs (13 hunks)
lib/segment/src/segment_constructor/segment_constructor_base.rs (2 hunks)
lib/segment/tests/integration/batch_search_test.rs (2 hunks)
lib/segment/tests/integration/byte_storage_hnsw_test.rs (2 hunks)
lib/segment/tests/integration/byte_storage_quantization_test.rs (2 hunks)
lib/segment/tests/integration/disbalanced_vectors_test.rs (2 hunks)
lib/segment/tests/integration/exact_search_test.rs (2 hunks)
lib/segment/tests/integration/filtrable_hnsw_test.rs (3 hunks)
lib/segment/tests/integration/gpu_hnsw_test.rs (2 hunks)
lib/segment/tests/integration/hnsw_discover_test.rs (3 hunks)
lib/segment/tests/integration/hnsw_incremental_build.rs (2 hunks)
lib/segment/tests/integration/hnsw_quantized_search_test.rs (3 hunks)
lib/segment/tests/integration/multivector_filtrable_hnsw_test.rs (2 hunks)
lib/segment/tests/integration/multivector_hnsw_test.rs (2 hunks)
lib/segment/tests/integration/multivector_quantization_test.rs (2 hunks)
lib/segment/tests/integration/payload_index_test.rs (2 hunks)
lib/segment/tests/integration/segment_builder_test.rs (7 hunks)
lib/segment/tests/integration/segment_on_disk_snapshot.rs (2 hunks)
openapi/openapi-collections.ytt.yaml (1 hunks)
src/actix/api/collections_api.rs (3 hunks)
src/schema_generator.rs (2 hunks)
tests/consensus_tests/auth_tests/test_jwt_access.py (2 hunks)
tests/openapi/test_optimizations.py (1 hunks)
tests/openapi_consistency_check.sh (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust
In new code, don't use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice - use bytemuck or zerocopy crates instead
Prefer explicit exhaustive matches over catch-all _ arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using : _ over using .. in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)

Files:

lib/common/common/src/lib.rs
src/schema_generator.rs
lib/segment/tests/integration/hnsw_discover_test.rs
lib/segment/tests/integration/gpu_hnsw_test.rs
lib/collection/src/operations/types.rs
lib/segment/src/index/hnsw_index/tests/test_graph_connectivity.rs
lib/collection/src/shards/shard.rs
lib/segment/tests/integration/batch_search_test.rs
lib/segment/benches/multi_vector_search.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/segment/tests/integration/filtrable_hnsw_test.rs
lib/segment/tests/integration/multivector_quantization_test.rs
lib/collection/src/shards/proxy_shard.rs
src/actix/api/collections_api.rs
lib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rs
lib/collection/src/collection_manager/optimizers/merge_optimizer.rs
lib/collection/src/shards/forward_proxy_shard.rs
lib/segment/tests/integration/hnsw_incremental_build.rs
lib/segment/tests/integration/segment_builder_test.rs
lib/collection/src/collection/collection_ops.rs
lib/segment/tests/integration/payload_index_test.rs
lib/collection/src/collection_manager/optimizers/indexing_optimizer.rs
lib/collection/src/collection_manager/optimizers/vacuum_optimizer.rs
lib/segment/tests/integration/multivector_filtrable_hnsw_test.rs
lib/collection/src/shards/queue_proxy_shard.rs
lib/segment/tests/integration/segment_on_disk_snapshot.rs
lib/segment/benches/hnsw_incremental_build.rs
lib/segment/tests/integration/multivector_hnsw_test.rs
lib/segment/src/segment_constructor/segment_builder.rs
lib/segment/src/index/hnsw_index/hnsw.rs
lib/collection/src/shards/replica_set/mod.rs
lib/collection/src/collection_manager/optimizers/mod.rs
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs
lib/segment/tests/integration/exact_search_test.rs
lib/collection/src/shards/local_shard/mod.rs
lib/segment/tests/integration/byte_storage_hnsw_test.rs
lib/segment/tests/integration/byte_storage_quantization_test.rs
lib/collection/src/update_workers/optimization_worker.rs
lib/segment/tests/integration/hnsw_quantized_search_test.rs
lib/common/common/src/progress_tracker.rs
lib/segment/tests/integration/disbalanced_vectors_test.rs

🧠 Learnings (22)

📚 Learning: 2025-10-13T22:58:03.121Z

Learnt from: generall
Repo: qdrant/qdrant PR: 7400
File: lib/segment/src/id_tracker/simple_id_tracker.rs:234-241
Timestamp: 2025-10-13T22:58:03.121Z
Learning: SimpleIdTracker in lib/segment/src/id_tracker/simple_id_tracker.rs is being deprecated and should not receive fixes related to version tracking or recovery logic, as it has a different version storage structure that is incompatible with newer trackers.

Applied to files:

lib/common/common/src/lib.rs
lib/collection/src/shards/shard.rs
lib/segment/benches/multi_vector_search.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/segment/tests/integration/filtrable_hnsw_test.rs
lib/segment/tests/integration/multivector_quantization_test.rs
lib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rs
lib/collection/src/collection_manager/optimizers/merge_optimizer.rs
lib/collection/src/shards/forward_proxy_shard.rs
lib/segment/tests/integration/hnsw_incremental_build.rs
lib/segment/tests/integration/segment_builder_test.rs
lib/segment/tests/integration/payload_index_test.rs
lib/segment/tests/integration/segment_on_disk_snapshot.rs
lib/segment/benches/hnsw_incremental_build.rs
lib/segment/src/segment_constructor/segment_builder.rs
lib/segment/src/index/hnsw_index/hnsw.rs
lib/collection/src/collection_manager/optimizers/mod.rs
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs
lib/segment/tests/integration/exact_search_test.rs
lib/segment/tests/integration/byte_storage_hnsw_test.rs
lib/segment/tests/integration/byte_storage_quantization_test.rs
lib/collection/src/update_workers/optimization_worker.rs
lib/segment/tests/integration/hnsw_quantized_search_test.rs
lib/segment/tests/integration/disbalanced_vectors_test.rs

📚 Learning: 2025-07-11T11:35:21.549Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Applied to files:

lib/segment/tests/integration/hnsw_discover_test.rs
lib/segment/src/index/hnsw_index/tests/test_graph_connectivity.rs
lib/collection/src/shards/shard.rs
lib/segment/tests/integration/batch_search_test.rs
lib/segment/tests/integration/filtrable_hnsw_test.rs
lib/segment/tests/integration/multivector_quantization_test.rs
lib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rs
lib/collection/src/collection_manager/optimizers/merge_optimizer.rs
lib/collection/src/shards/forward_proxy_shard.rs
lib/segment/tests/integration/hnsw_incremental_build.rs
lib/segment/tests/integration/segment_builder_test.rs
lib/segment/tests/integration/payload_index_test.rs
lib/collection/src/collection_manager/optimizers/indexing_optimizer.rs
lib/collection/src/collection_manager/optimizers/vacuum_optimizer.rs
lib/segment/tests/integration/multivector_filtrable_hnsw_test.rs
lib/segment/tests/integration/segment_on_disk_snapshot.rs
lib/segment/tests/integration/multivector_hnsw_test.rs
lib/collection/src/collection_manager/optimizers/mod.rs
lib/segment/tests/integration/exact_search_test.rs
lib/segment/tests/integration/byte_storage_hnsw_test.rs
lib/segment/tests/integration/byte_storage_quantization_test.rs
lib/segment/tests/integration/hnsw_quantized_search_test.rs
lib/common/common/src/progress_tracker.rs
lib/segment/tests/integration/disbalanced_vectors_test.rs

📚 Learning: 2025-05-30T15:26:14.488Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6479
File: lib/segment/src/index/hnsw_index/graph_layers_builder.rs:137-182
Timestamp: 2025-05-30T15:26:14.488Z
Learning: In the subgraph_connectivity method in graph_layers_builder.rs, the previous_visited_points vector and visited BitVec entry point marking are automatically re-initialized at the start of each retry loop iteration, so manual clearing is not needed.

Applied to files:

lib/segment/src/index/hnsw_index/tests/test_graph_connectivity.rs
lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-05-30T15:26:54.048Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6479
File: lib/segment/src/index/hnsw_index/graph_layers_builder.rs:156-159
Timestamp: 2025-05-30T15:26:54.048Z
Learning: In the subgraph_connectivity method in lib/segment/src/index/hnsw_index/graph_layers_builder.rs, the parameter `q` represents the edge removal/failure probability, not the edge retention probability. The logic `if coin_flip < q { continue; }` correctly simulates edge failures for percolation-based connectivity estimation.

Applied to files:

lib/segment/src/index/hnsw_index/tests/test_graph_connectivity.rs

📚 Learning: 2025-09-01T11:42:06.964Z

Learnt from: timvisee
Repo: qdrant/qdrant PR: 7157
File: lib/shard/src/segment_holder/mod.rs:808-814
Timestamp: 2025-09-01T11:42:06.964Z
Learning: In Qdrant's segment holder, panicking when no segments exist during flush_all is intentional and preferred over graceful error handling, as having zero segments could permanently corrupt the WAL by acknowledging u64::MAX. The maintainers consider this condition impossible and use the panic as a fail-fast safety mechanism to prevent data corruption.

Applied to files:

lib/collection/src/shards/shard.rs

📚 Learning: 2025-09-01T11:19:26.371Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7193
File: lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs:17-30
Timestamp: 2025-09-01T11:19:26.371Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs, the ChunkedMmapVectors underlying implementation does not validate against zero-sized vectors, so adding such validation in QuantizedChunkedMmapStorage::new is unnecessary and would be redundant.

Applied to files:

lib/segment/benches/multi_vector_search.rs
lib/segment/tests/integration/multivector_quantization_test.rs
lib/segment/tests/integration/multivector_hnsw_test.rs
lib/segment/src/segment_constructor/segment_builder.rs
lib/segment/tests/integration/byte_storage_hnsw_test.rs
lib/segment/tests/integration/byte_storage_quantization_test.rs

📚 Learning: 2025-08-15T11:42:00.297Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

lib/segment/benches/multi_vector_search.rs
lib/segment/tests/integration/multivector_quantization_test.rs
lib/collection/src/shards/proxy_shard.rs
lib/collection/src/shards/forward_proxy_shard.rs
lib/segment/src/segment_constructor/segment_builder.rs
lib/collection/src/shards/replica_set/mod.rs
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs
lib/segment/tests/integration/byte_storage_hnsw_test.rs
lib/segment/tests/integration/byte_storage_quantization_test.rs
lib/segment/tests/integration/hnsw_quantized_search_test.rs

📚 Learning: 2025-08-11T07:57:01.399Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.

Applied to files:

lib/segment/tests/integration/multivector_quantization_test.rs
lib/segment/src/segment_constructor/segment_builder.rs
lib/segment/tests/integration/byte_storage_hnsw_test.rs
lib/segment/tests/integration/byte_storage_quantization_test.rs

📚 Learning: 2025-08-11T00:37:34.100Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

lib/segment/tests/integration/multivector_quantization_test.rs
lib/segment/src/segment_constructor/segment_builder.rs
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs
lib/segment/tests/integration/byte_storage_hnsw_test.rs
lib/segment/tests/integration/byte_storage_quantization_test.rs

📚 Learning: 2025-08-15T11:41:10.629Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

lib/segment/tests/integration/multivector_quantization_test.rs
lib/segment/src/segment_constructor/segment_builder.rs
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs
lib/segment/tests/integration/byte_storage_hnsw_test.rs
lib/segment/tests/integration/byte_storage_quantization_test.rs

📚 Learning: 2025-08-18T10:56:43.707Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.

Applied to files:

lib/segment/tests/integration/multivector_quantization_test.rs
lib/segment/src/segment_constructor/segment_builder.rs

📚 Learning: 2025-09-08T08:47:43.795Z

Learnt from: timvisee
Repo: qdrant/qdrant PR: 7188
File: lib/collection/src/update_handler.rs:810-813
Timestamp: 2025-09-08T08:47:43.795Z
Learning: In the Qdrant codebase, CollectionId implements traits that allow &CollectionId to convert Into<String>, so passing &collection_name to functions requiring Into<String> works correctly without needing .clone() or .to_string().

Applied to files:

lib/collection/src/collection/collection_ops.rs

📚 Learning: 2025-08-10T18:30:02.986Z

Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.

Applied to files:

lib/collection/src/collection/collection_ops.rs

📚 Learning: 2025-08-10T18:26:12.443Z

Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: docs/redoc/master/openapi.json:13645-13652
Timestamp: 2025-08-10T18:26:12.443Z
Learning: In qdrant, docs/redoc/master/openapi.json is auto-generated from the REST schemas. To change field docs, edit lib/api/src/rest/schema.rs (e.g., add doc comments or #[schemars(description = ...)]). Specifically, UpdateVectors.update_filter lacked a description and should state: "If specified, only update vectors for points that match this filter; points not matching the filter are left unchanged."

Applied to files:

docs/redoc/master/openapi.json

📚 Learning: 2025-04-20T18:32:43.471Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6403
File: lib/segment/src/index/struct_payload_index.rs:485-497
Timestamp: 2025-04-20T18:32:43.471Z
Learning: In StructPayloadIndex, checking if a field exists in `self.config.indexed_fields` is sufficient to determine if an index should exist, as the initialization process in `load_all_fields()` ensures that `field_indexes` is synchronized with the configuration, automatically rebuilding any missing indexes.

Applied to files:

lib/segment/src/segment_constructor/segment_builder.rs
lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-03-12T09:25:16.340Z

Learnt from: JojiiOfficial
Repo: qdrant/qdrant PR: 5951
File: lib/api/src/grpc/qdrant.rs:6493-6496
Timestamp: 2025-03-12T09:25:16.340Z
Learning: The `payload_index_io_read` field in `HardwareUsage` was introduced in PR #5951 and is safe to modify as it was not used in production before this PR.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-09-16T21:39:09.353Z

Learnt from: xzfc
Repo: qdrant/qdrant PR: 7232
File: lib/shard/src/segment_holder/mod.rs:1088-1094
Timestamp: 2025-09-16T21:39:09.353Z
Learning: The hnsw_format_force and hnsw_with_vectors flags in HnswGlobalConfig primarily affect existing segment conversion during load/startup, not the initial format selection for newly created empty segments. For new segments, format decisions are made by GraphLinksFormatParam::recommended() based on runtime conditions. Therefore, using HnswGlobalConfig::default() for temporary/proxy segments in SegmentHolder::build_tmp_segment is acceptable.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-08-06T09:56:59.311Z

Learnt from: xzfc
Repo: qdrant/qdrant PR: 6982
File: lib/segment/src/index/hnsw_index/graph_links/view.rs:217-220
Timestamp: 2025-08-06T09:56:59.311Z
Learning: In Qdrant's GraphLinksView implementation (lib/segment/src/index/hnsw_index/graph_links/view.rs), methods like links() and links_with_vectors() intentionally use unwrap() and can panic for performance reasons, maintaining consistency across similar methods. Error handling improvements are considered secondary to feature completion and require benchmarking before implementation.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-04-20T19:08:45.034Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6403
File: lib/collection/src/collection_manager/holders/proxy_segment.rs:1178-1197
Timestamp: 2025-04-20T19:08:45.034Z
Learning: In the ProxySegment implementation, overwriting previous index changes when inserting a DeleteIfIncompatible operation is intentional. When an incompatible index is detected, the DeleteIfIncompatible operation should take precedence over any previously scheduled operations on that index.

Applied to files:

lib/collection/src/collection_manager/optimizers/segment_optimizer.rs

📚 Learning: 2025-08-14T11:31:21.777Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.

Applied to files:

lib/segment/tests/integration/byte_storage_hnsw_test.rs
lib/segment/tests/integration/byte_storage_quantization_test.rs

📚 Learning: 2025-08-21T13:45:05.899Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.899Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.

Applied to files:

lib/segment/tests/integration/byte_storage_quantization_test.rs

📚 Learning: 2025-08-11T12:30:47.220Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7009
File: lib/quantization/src/encoded_vectors_binary.rs:614-618
Timestamp: 2025-08-11T12:30:47.220Z
Learning: In `lib/quantization/src/encoded_vectors_binary.rs`, the `encode_two_bits_value` function has two paths: when `vector_stats` is available, it uses mean/stddev for accurate z-score based quantization boundaries; when stats are None (for appendable BQ), it falls back to a simple sign-based heuristic (positive -> (true, true), non-positive -> (false, false)). Having stats available provides higher accuracy for storage encoding.

Applied to files:

lib/segment/tests/integration/byte_storage_quantization_test.rs

🧬 Code graph analysis (9)

lib/segment/tests/integration/filtrable_hnsw_test.rs (1)

lib/segment/src/index/hnsw_index/point_scorer.rs (2)

new_for_test (183-197)

new_for_test (318-347)

lib/collection/src/shards/proxy_shard.rs (5)

lib/collection/src/shards/forward_proxy_shard.rs (1)

optimizers_log (385-387)

lib/collection/src/shards/local_shard/mod.rs (1)

optimizers_log (962-964)

lib/collection/src/shards/queue_proxy_shard.rs (1)

optimizers_log (224-226)

lib/collection/src/shards/replica_set/mod.rs (1)

optimizers_log (1259-1262)

lib/collection/src/shards/shard.rs (1)

optimizers_log (239-249)

lib/collection/src/shards/forward_proxy_shard.rs (5)

lib/collection/src/shards/local_shard/mod.rs (1)

optimizers_log (962-964)

lib/collection/src/shards/proxy_shard.rs (1)

optimizers_log (171-173)

lib/collection/src/shards/queue_proxy_shard.rs (1)

optimizers_log (224-226)

lib/collection/src/shards/replica_set/mod.rs (1)

optimizers_log (1259-1262)

lib/collection/src/shards/shard.rs (1)

optimizers_log (239-249)

lib/collection/src/shards/queue_proxy_shard.rs (5)

lib/collection/src/shards/forward_proxy_shard.rs (1)

optimizers_log (385-387)

lib/collection/src/shards/local_shard/mod.rs (1)

optimizers_log (962-964)

lib/collection/src/shards/proxy_shard.rs (1)

optimizers_log (171-173)

lib/collection/src/shards/replica_set/mod.rs (1)

optimizers_log (1259-1262)

lib/collection/src/shards/shard.rs (1)

optimizers_log (239-249)

tests/openapi/test_optimizations.py (1)

tests/openapi/helpers/helpers.py (1)

request_with_validation (41-116)

lib/collection/src/shards/replica_set/mod.rs (5)

lib/collection/src/shards/forward_proxy_shard.rs (1)

optimizers_log (385-387)

lib/collection/src/shards/local_shard/mod.rs (1)

optimizers_log (962-964)

lib/collection/src/shards/proxy_shard.rs (1)

optimizers_log (171-173)

lib/collection/src/shards/queue_proxy_shard.rs (1)

optimizers_log (224-226)

lib/collection/src/shards/shard.rs (1)

optimizers_log (239-249)

lib/collection/src/collection_manager/optimizers/mod.rs (1)

lib/common/common/src/progress_tracker.rs (1)

new_progress_tracker (124-145)

lib/collection/src/shards/local_shard/mod.rs (5)

lib/collection/src/shards/forward_proxy_shard.rs (1)

optimizers_log (385-387)

lib/collection/src/shards/proxy_shard.rs (1)

optimizers_log (171-173)

lib/collection/src/shards/queue_proxy_shard.rs (1)

optimizers_log (224-226)

lib/collection/src/shards/replica_set/mod.rs (1)

optimizers_log (1259-1262)

lib/collection/src/shards/shard.rs (1)

optimizers_log (239-249)

lib/collection/src/update_workers/optimization_worker.rs (1)

lib/collection/src/collection_manager/optimizers/mod.rs (1)

start (112-124)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: test-consensus-compose
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: e2e-tests
GitHub Check: test-consistency
GitHub Check: rust-tests (macos-latest)
GitHub Check: rust-tests (windows-latest)
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: integration-tests-consensus
GitHub Check: integration-tests
GitHub Check: lint
GitHub Check: Build Qdrant Edge Python bindings

🔇 Additional comments (64)

lib/collection/src/update_workers/optimization_worker.rs (1)

341-357: Progress handle wiring into Tracker::start and optimize looks correct

The closure now correctly adapts to the new (Tracker, ProgressTracker) API: the tracker is registered in optimizers_log and the progress handle is passed into optimizer.optimize(...), while result handling and TrackerStatus updates remain consistent with prior behavior. I don’t see regressions or ownership/concurrency issues in this integration.

lib/common/common/Cargo.toml (1)

21-21: Chrono workspace dependency looks appropriate

Adding chrono as a workspace-scoped dependency is a straightforward, low-risk addition and matches its expected use for timestamping progress.

lib/segment/src/index/hnsw_index/tests/test_graph_connectivity.rs (1)

7-7: ProgressTracker integration in HNSW connectivity test is consistent

Importing ProgressTracker and passing ProgressTracker::new_for_test() into VectorIndexBuildArgs aligns with the updated constructor and keeps the test behavior functionally unchanged.

Also applies to: 80-89

lib/common/common/src/lib.rs (1)

22-22: Exporting progress_tracker module is correct

Making progress_tracker a public module from common is necessary for the new progress-tracking API and fits the existing module layout.

lib/collection/src/shards/replica_set/mod.rs (1)

21-21: ShardReplicaSet::optimizers_log correctly exposes local shard optimizer logs

The new ParkingMutex alias, TrackerLog import, and optimizers_log accessor cleanly delegate to the local Shard and preserve the existing Option<Arc<ParkingMutex<TrackerLog>>> semantics (including None for missing/dummy shards).

Also applies to: 37-37, 1259-1262

lib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rs (1)

261-261: Tests correctly updated to pass ProgressTracker into optimizers

The added ProgressTracker import and use of ProgressTracker::new_for_test() in all test optimize calls match the extended optimizer API without altering test semantics.

Also applies to: 359-367, 392-400, 521-531, 563-571, 699-707, 743-751

lib/collection/src/collection_manager/optimizers/vacuum_optimizer.rs (1)

225-225: Vacuum optimizer tests now conform to the new optimize signature

Importing ProgressTracker and supplying ProgressTracker::new_for_test() at all vacuum optimizer optimize call sites keeps the tests compatible with the new API while leaving their logic intact.

Also applies to: 351-359, 503-512, 621-629

lib/collection/src/shards/local_shard/mod.rs (1)

100-113: LocalShard optimizer log exposure is well-wired

Storing optimizers_log on LocalShard, sharing it with UpdateHandler::new, and exposing it via optimizers_log(&self) -> Arc<ParkingMutex<TrackerLog>> provides a clean, thread-safe way to observe optimizer progress that matches the shard-level accessors.

Also applies to: 189-209, 223-240, 255-274, 962-964

tests/openapi_consistency_check.sh (1)

40-40: OpenAPI API-count expectation correctly updated

Bumping EXPECTED_NUMBER_OF_APIS to 71 keeps the consistency check aligned with the new endpoint count; the surrounding messaging about updating whitelists remains valid.

lib/segment/tests/integration/multivector_quantization_test.rs (1)

9-9: LGTM! Clean integration of progress tracking.

The import and usage of ProgressTracker::new_for_test() correctly wire progress tracking into the HNSW index build path for testing. This aligns with the broader PR objective of introducing hierarchical progress tracking across optimization workflows.

Also applies to: 345-345

lib/segment/src/segment_constructor/segment_constructor_base.rs (1)

10-10: LGTM! Core API extension for progress tracking.

The addition of the progress: ProgressTracker field to VectorIndexBuildArgs enables progress tracking to be threaded through the entire HNSW index build pipeline. The field is appropriately public and positioned at the end of the struct definition, which is good practice for struct evolution.

Also applies to: 291-291

lib/segment/tests/integration/batch_search_test.rs (1)

7-7: LGTM! Consistent test integration.

Progress tracking properly integrated into the batch search test setup using the appropriate test constructor.

Also applies to: 171-171

lib/segment/tests/integration/gpu_hnsw_test.rs (1)

7-7: LGTM! Progress tracking enabled for GPU HNSW tests.

The integration follows the established pattern and enables progress monitoring during GPU-accelerated index building.

Also applies to: 151-151

lib/segment/tests/integration/multivector_filtrable_hnsw_test.rs (1)

7-7: LGTM! Multivector test updated correctly.

Progress tracking properly threaded through the multivector filterable HNSW test setup.

Also applies to: 154-154

lib/segment/tests/integration/byte_storage_quantization_test.rs (1)

8-8: LGTM! Quantization test properly updated.

Progress tracking integrated into the byte storage quantization test following the established pattern.

Also applies to: 362-362

lib/segment/tests/integration/filtrable_hnsw_test.rs (1)

8-8: LGTM! Both test cases properly updated.

Progress tracking correctly integrated into both filterable HNSW test cases. The consistent use of new_for_test() across both instances is appropriate for the test context.

Also applies to: 171-171, 332-332

lib/segment/benches/hnsw_incremental_build.rs (1)

12-12: LGTM! Benchmark code properly integrated.

Progress tracking correctly added to the HNSW incremental build benchmark. Using new_for_test() is appropriate here as benchmarks, like tests, don't require production-level progress tracking infrastructure.

Also applies to: 403-403

lib/collection/src/shards/shard.rs (1)

4-4: LGTM! Clean delegation pattern.

The new optimizers_log() method follows the established pattern from update_tracker() (lines 227-237) and properly delegates to all shard variants with exhaustive matching.

Also applies to: 10-10, 17-17, 239-249

lib/segment/benches/multi_vector_search.rs (1)

8-8: LGTM! Appropriate test setup.

The addition of ProgressTracker::new_for_test() to the benchmark's build arguments is consistent with the broader PR changes threading progress tracking through the build pipeline.

Also applies to: 135-135

lib/collection/src/shards/queue_proxy_shard.rs (1)

28-28: LGTM! Consistent with proxy pattern.

The optimizers_log() accessor correctly delegates to the wrapped shard, mirroring the existing update_tracker() pattern (line 220-222) and matching implementations in other proxy shards.

Also applies to: 224-226

tests/openapi/test_optimizations.py (1)

1-35: LGTM! Solid validation of the new endpoint.

The test appropriately validates:

The default behavior (completed parameter omitted)

The explicit completed=true query parameter

Schema compliance via request_with_validation

Expected empty lists for a fresh collection

lib/segment/tests/integration/disbalanced_vectors_test.rs (1)

8-8: LGTM! Appropriate test adaptation.

The test correctly adapts to the new ProgressTracker parameter requirement in SegmentBuilder::build(), using the test-specific factory method.

Also applies to: 109-112

src/schema_generator.rs (1)

20-22: LGTM! Schema generation updated correctly.

The OptimizationsResponse type is properly added to the schema generator following the established pattern for exposing new API response types.

Also applies to: 100-100

lib/segment/tests/integration/multivector_hnsw_test.rs (1)

8-8: LGTM! Test correctly updated.

The test properly supplies the new progress field to VectorIndexBuildArgs using the test-specific factory, maintaining test functionality while adapting to the API change.

Also applies to: 138-138

lib/segment/tests/integration/hnsw_incremental_build.rs (1)

9-9: LGTM! Consistent test adaptation.

The incremental build test correctly adopts the new ProgressTracker parameter in the build arguments, using the appropriate test factory method.

Also applies to: 162-162

lib/collection/src/shards/forward_proxy_shard.rs (1)

10-10: LGTM!

The new optimizers_log accessor correctly delegates to the wrapped shard, consistent with the pattern used in ProxyShard and QueueProxyShard. The imports are appropriately added.

Also applies to: 25-25, 385-387

src/actix/api/collections_api.rs (1)

242-266: LGTM!

The endpoint implementation correctly validates collection access, retrieves the collection, and calls the new optimizations method. The use of fn returning impl Future is an acceptable pattern in Actix when you need ownership of captured variables in the async block.

lib/segment/tests/integration/byte_storage_hnsw_test.rs (1)

7-7: LGTM!

The test correctly uses ProgressTracker::new_for_test() to supply the new required progress field in VectorIndexBuildArgs, consistent with the broader PR changes.

Also applies to: 209-209

lib/segment/tests/integration/hnsw_quantized_search_test.rs (2)

10-10: LGTM!

Progress tracking correctly integrated into the HNSW index build test using ProgressTracker::new_for_test().

Also applies to: 147-147

445-447: LGTM!

The test_build_hnsw_using_quantization test properly creates and passes a ProgressTracker instance to SegmentBuilder::build, matching the updated API signature.

lib/segment/tests/integration/segment_builder_test.rs (2)

9-9: LGTM!

Import and first usage correctly added. The ProgressTracker::new_for_test() is appropriately used for test contexts.

Also applies to: 84-87

171-174: LGTM!

All remaining test functions (test_building_new_defragmented_segment, test_building_new_sparse_segment, estimate_build_time, test_building_new_segment_bug_5614, test_building_new_segment_with_mmap_payload) correctly updated to pass ProgressTracker::new_for_test() to SegmentBuilder::build.

Also applies to: 305-308, 382-384, 450-453, 594-597

lib/segment/tests/integration/hnsw_discover_test.rs (1)

7-7: LGTM!

The ProgressTracker integration is correctly wired into both test cases. The use of ProgressTracker::new_for_test() is appropriate for test contexts and aligns with the pattern used across other test files in this PR.

Also applies to: 123-123, 251-251

lib/segment/tests/integration/segment_on_disk_snapshot.rs (1)

5-5: LGTM!

The ProgressTracker is correctly integrated into the segment builder test. The placement as the last argument to build() is consistent with the updated API signature.

Also applies to: 149-149

lib/segment/tests/integration/exact_search_test.rs (1)

8-8: LGTM!

The ProgressTracker integration follows the same pattern as other test files in this PR.

Also applies to: 152-152

lib/collection/src/shards/proxy_shard.rs (1)

11-11: LGTM!

The optimizers_log() method correctly follows the delegation pattern used by other methods in ProxyShard and is consistent with the implementations in ForwardProxyShard, QueueProxyShard, and LocalShard as shown in the relevant code snippets.

Also applies to: 26-26, 171-173

lib/collection/src/collection_manager/optimizers/merge_optimizer.rs (1)

164-164: LGTM!

The ProgressTracker is correctly integrated into the merge optimizer test, following the updated optimize() signature.

Also applies to: 260-260

openapi/openapi-collections.ytt.yaml (1)

259-292: LGTM!

The new /collections/{collection_name}/optimizations endpoint is well-defined with appropriate parameter types and defaults. The completed_limit is correctly typed as integer with minimum: 0 and defaults to 16, which matches the DEFAULT_OPTIMIZATIONS_COMPLETED_LIMIT constant in src/actix/api/collections_api.rs:240.

lib/collection/src/operations/types.rs (1)

15-16: OptimizationsResponse shape and ProgressTree import look correct

The new OptimizationsResponse type cleanly exposes ongoing vs completed optimizations, and using Option<Vec<ProgressTree>> with skip_serializing_if = "Option::is_none" plus the clarifying comment gives the API the right ability to omit completed data when not requested. No issues from a types/serialization perspective.

Also applies to: 286-296

tests/consensus_tests/auth_tests/test_jwt_access.py (1)

141-146: Auth matrix and coverage for get_optimizations are consistent

ACTION_ACCESS["get_optimizations"] grants read/manage in line with other collection read endpoints, and test_get_optimizations plus the generic coverage tests ensure the new REST path is included. No further changes needed.

Also applies to: 987-991

lib/segment/tests/integration/payload_index_test.rs (1)

10-11: ProgressTracker wiring in mmap segment test matches new builder API

Importing ProgressTracker and passing ProgressTracker::new_for_test() into SegmentBuilder::build is the correct minimal adaptation for this test; no behavioral or correctness issues detected.

Also applies to: 296-300

lib/collection/src/collection_manager/optimizers/indexing_optimizer.rs (1)

295-295: Tests correctly updated for the new optimize(progress) signature

The test module imports ProgressTracker and consistently passes ProgressTracker::new_for_test() into all index_optimizer.optimize(...) calls, matching the extended trait API without altering test semantics. This looks good.

Also applies to: 395-403, 545-553, 562-571, 692-701, 813-821, 984-991

lib/collection/src/collection_manager/optimizers/segment_optimizer.rs (1)

10-11: ProgressTracker integration in SegmentOptimizer is sound and well-scoped

Threading ProgressTracker through optimize, optimize_segment_propagate_changes, and build_new_segment, with subtasks for "copy_data", "populate_vector_storages", and "wait_cpu_permit", cleanly instruments the heavy phases without changing existing optimization, proxy, or cancellation semantics. Passing the tracker down into SegmentBuilder::build is consistent with the new progress API, and the added code respects existing locking and error-handling patterns.

Also applies to: 415-545, 605-613, 849-875

lib/segment/src/index/hnsw_index/hnsw.rs (4)

268-293: Well-structured progress tracking setup.

The progress subtasks are organized hierarchically: migrate, main_graph, and additional_links with per-field tracking. The use of Option for conditionally created subtasks (lines 280-281) correctly handles the case when build_main_graph is false.

455-458: Progress counter correctly uses Relaxed ordering.

The AtomicU64 counter with Relaxed ordering is appropriate here since the counter is only used for progress reporting and doesn't need to synchronize with other memory operations. Dereferencing the Arc before the loop (line 458) is good practice for avoiding repeated atomic reference counting.

564-566: Good: field progress tracker is started before processing.

The field-specific progress tracker is properly started before processing begins, and the shared counter is created for progress updates during block processing.

755-756: Signature change correctly propagates the counter.

The build_filtered_graph function now accepts a counter: &AtomicU64 parameter for progress tracking. This is correctly used to increment progress on line 802.

lib/segment/src/segment_constructor/segment_builder.rs (4)

485-496: Clean hierarchical progress tracking setup.

The subtasks are well-organized with clear naming. The payload index subtasks include both the schema name and field name (line 491) which provides good debugging context.

576-580: Progress tracking for indexed fields is correctly structured.

Each indexed field gets its own progress tracker that is started before set_indexed is called. The progress tracker is implicitly dropped when the loop iteration ends, correctly marking the field as finished.

724-726: Quantization progress tracking is correctly wired.

The update_quantization function receives a ProgressTracker and starts it at line 726. Per-vector subtasks are created at line 751 using running_subtask, which is appropriate since quantization work starts immediately.

772-773: Explicit drop for progress_vector ensures timing accuracy.

The explicit drop(progress_vector) ensures the subtask is marked as finished immediately after quantization completes, rather than waiting for the loop iteration to end. This is good for accurate duration tracking.

lib/collection/src/collection_manager/optimizers/mod.rs (4)

30-34: New IndexingProgressViews struct provides clear separation of states.

The struct separates ongoing and completed optimization progress views, which is useful for API consumers to distinguish between active work and historical data.

78-92: The progress_views method correctly categorizes tracker states.

The method iterates through trackers in reverse order (most recent first) and correctly categorizes them:

Optimizing → ongoing

Done, Cancelled, Error → completed

The lock on tracker.state is held briefly, minimizing contention.

109-124: Clean API design returning (Tracker, ProgressTracker) tuple.

The start function now returns both the read-only Tracker (with its embedded ProgressView) and the write-only ProgressTracker. This separation of concerns follows the pattern established in new_progress_tracker() and ensures the caller has appropriate access to both reading and writing progress.

138-138: Telemetry correctly uses progress_view.started_at().

The change from a direct start_at field to progress_view.started_at() maintains the same telemetry output while centralizing the timestamp in the progress tracking system.

lib/common/common/src/progress_tracker.rs (9)

1-22: Excellent documentation explaining design philosophy.

The module-level documentation clearly explains:

The hierarchical structure with a visual diagram

The non-critical nature of progress tracking

The panic vs. fallback behavior difference between debug/release builds

This is important context for maintainers.

36-40: Redundant started_at field is a good performance optimization.

The comment correctly notes that started_at is redundant (also stored in the root node's state), but keeping it outside the lock enables faster access for the common started_at() call without acquiring the mutex.

164-200: Path-based navigation handles errors gracefully in release builds.

The subtask_impl method correctly:

Acquires the lock once

Navigates to the parent using the path

Creates a new child with the appropriate initial state

Returns a new tracker with an extended path

The fallback path [usize::MAX, usize::MAX] at lines 196-198 ensures that subsequent operations on an invalid tracker will fail gracefully (the path won't match any valid node).

207-226: Debug assertion prevents double track_progress calls.

Line 211-214 includes a debug assertion that catches the usage error of calling track_progress multiple times on the same node. This is good defensive programming for development.

247-251: Drop implementation ensures proper cleanup.

The Drop implementation calls ProgressNode::finish() to transition the node state appropriately:

Pending → Skipped (task was never started)

InProgress → Finished (normal completion)

Finished/Skipped → no change (idempotent)

This ensures accurate duration tracking even if the caller forgets to explicitly finish the task.

253-260: Path navigation is simple and efficient.

The get_mut method iterates through the path indices to navigate to nested children. The use of Option return type and early return via ? operator handles missing nodes gracefully.

274-286: Duration calculation uses monotonic Instant for accuracy.

The rendering correctly uses the stored Instant for duration calculation (line 284: duration.as_secs_f64()), which provides accurate elapsed time even if the system clock changes. The DateTime<Utc> timestamps are used for human-readable display.

301-324: Finish logic correctly handles all state transitions.

The finish method handles all four states appropriately:

Pending → Skipped: Task was created but never started

InProgress → Finished: Normal completion with duration calculation

Finished / Skipped → no-op: Already in terminal state

The duration is calculated from started_instant to finished_instant (lines 310, 314), ensuring accurate timing even with clock adjustments.

327-428: Tests exercise key scenarios including drop behavior.

The test at line 334-397 covers:

Nested subtask creation

Progress tracking with known/unknown totals

State transitions (pending, in-progress, finished, skipped)

Drop behavior for completing and skipping tasks

The #[expect(unused_variables)] attribute correctly documents that the test relies on drop behavior.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (6)

lib/segment/src/index/hnsw_index/hnsw.rs (1)
585-586: Consider documenting that per-field counters don't have a known total.

The track_progress(None) call indicates the total is unknown for per-field indexing. This is fine since payload blocks are iterated lazily, but a brief comment explaining this design choice would help future maintainers.
-                let counter = field_progress.track_progress(None);
+                // Total is unknown as we iterate payload blocks lazily
+                let counter = field_progress.track_progress(None);
Also applies to: 641-641
lib/collection/src/collection/collection_ops.rs (1)
444-444: Consider making the root label a constant.

The hardcoded string "Segment Optimizing" is used as the root label for the progress tree snapshot. Consider extracting it as a constant for maintainability and potential reuse.
+const OPTIMIZATIONS_PROGRESS_ROOT: &str = "Segment Optimizing";
+
 impl Collection {
     // ... existing methods ...
 
     pub async fn optimizations(
         &self,
         completed_limit: Option<usize>,
     ) -> CollectionResult<OptimizationsResponse> {
         // ... existing code ...
-        let root = "Segment Optimizing";
+        let root = OPTIMIZATIONS_PROGRESS_ROOT;
         Ok(OptimizationsResponse {
             ongoing: all_ongoing.into_iter().map(|v| v.snapshot(root)).collect(),
             completed: all_completed.map(|c| c.into_iter().map(|v| v.snapshot(root)).collect()),
         })
     }
 }
src/actix/api/collections_api.rs (1)
234-241: Align completed_limit type with downstream usage to avoid truncation.

OptimizationsParam.completed_limit is Option<u64>, but you immediately cast it to usize for optimizations(completed_limit). On 32‑bit builds a large u64 value would silently truncate when cast to usize. Unless you explicitly want a 64‑bit public type here, it’s simpler and safer to keep everything as usize and drop the cast.

You could change the definition like this:
-#[derive(Deserialize, Copy, Clone, Validate)]
-struct OptimizationsParam {
-    completed: Option<bool>,
-    completed_limit: Option<u64>,
-}
-
-const DEFAULT_OPTIMIZATIONS_COMPLETED_LIMIT: u64 = 16;
+#[derive(Deserialize, Copy, Clone, Validate)]
+struct OptimizationsParam {
+    completed: Option<bool>,
+    completed_limit: Option<usize>,
+}
+
+const DEFAULT_OPTIMIZATIONS_COMPLETED_LIMIT: usize = 16;
Then the as usize in get_optimizations can go away.
docs/redoc/master/openapi.json (2)
16175-16249: Schema shape is solid; align number formats and add examples for better DX.

Consider using "format": "float" for duration_sec to match the predominant float usage elsewhere in this spec, or consistently switch related durations to double.

Add a brief example to OptimizationsResponse and ProgressTree to help client authors visualize the tree.

Optional: add a note that children may be empty (current required + array type already allows this).

Example snippet to add under "OptimizationsResponse":
{
  "ongoing": [
    {
      "name": "optimize_segment",
      "started_at": "2025-12-07T12:34:56Z",
      "finished_at": null,
      "duration_sec": null,
      "done": 1200,
      "total": 5000,
      "children": [
        { "name": "build_hnsw", "started_at": "2025-12-07T12:35:00Z", "finished_at": null, "done": 400, "total": 5000, "children": [] }
      ]
    }
  ],
  "completed": null
}
Based on learnings, examples/descriptions should be added in the Rust schema so they propagate here.

1818-1917: New endpoint looks good; tighten param schema and document zero semantics.

Nice addition and envelope wiring. Two small spec nits:

completed_limit: add a numeric format to match server type (likely uint64) and an example; also clarify what 0 means (return none vs unlimited).

Optionally cap with a sensible maximum if the server enforces one.

Proposed diff for the parameter schema:
           "schema": {
-            "type": "integer",
+            "type": "integer",
+            "format": "uint64",
             "minimum": 0,
-            "default": 16
+            "default": 16,
+            "example": 10
           }
Since this spec is generated, please update the source (e.g., lib/api/src/rest/schema.rs or openapi-collections.ytt.yaml) rather than editing this file directly. Run:
#!/bin/bash
# Verify server-side param type/defaults and any max bound
rg -n --hidden -C3 -S 'completed_limit' \
  lib/api src openapi | sed -n '1,200p'
Based on learnings, this JSON is auto-generated from REST schemas; doc/format tweaks should be applied in the Rust schema or templates.
tests/openapi/test_optimizations.py (1)

14-24: Clear assertion of default optimizations response contract

The first call validates that, for a freshly set up collection and no completed query flag, the endpoint returns ongoing as an empty list and omits completed. This tightly encodes the intended API contract (field absence vs. empty list) and will catch accidental schema/drift changes; the structure and checks look solid.

You might later add a second test that seeds an optimization so this path also covers the non-empty ongoing case, but that’s not required for this PR.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da59034 and 1239985.

📒 Files selected for processing (17)

docs/redoc/master/openapi.json (2 hunks)
lib/collection/src/collection/collection_ops.rs (3 hunks)
lib/collection/src/collection_manager/optimizers/mod.rs (5 hunks)
lib/collection/src/operations/types.rs (2 hunks)
lib/collection/src/shards/forward_proxy_shard.rs (3 hunks)
lib/collection/src/shards/local_shard/mod.rs (1 hunks)
lib/collection/src/shards/proxy_shard.rs (3 hunks)
lib/collection/src/shards/queue_proxy_shard.rs (2 hunks)
lib/collection/src/shards/replica_set/mod.rs (3 hunks)
lib/collection/src/shards/shard.rs (2 hunks)
lib/segment/src/index/hnsw_index/hnsw.rs (14 hunks)
openapi/openapi-collections.ytt.yaml (1 hunks)
src/actix/api/collections_api.rs (3 hunks)
src/schema_generator.rs (2 hunks)
tests/consensus_tests/auth_tests/test_jwt_access.py (2 hunks)
tests/openapi/test_optimizations.py (1 hunks)
tests/openapi_consistency_check.sh (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (9)

tests/openapi_consistency_check.sh
lib/collection/src/shards/local_shard/mod.rs
lib/collection/src/shards/replica_set/mod.rs
lib/collection/src/shards/proxy_shard.rs
openapi/openapi-collections.ytt.yaml
lib/collection/src/shards/queue_proxy_shard.rs
lib/collection/src/shards/forward_proxy_shard.rs
tests/consensus_tests/auth_tests/test_jwt_access.py
lib/collection/src/shards/shard.rs

🧰 Additional context used

📓 Path-based instructions (1)

**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust
In new code, don't use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice - use bytemuck or zerocopy crates instead
Prefer explicit exhaustive matches over catch-all _ arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using : _ over using .. in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)

Files:

src/schema_generator.rs
lib/segment/src/index/hnsw_index/hnsw.rs
lib/collection/src/operations/types.rs
src/actix/api/collections_api.rs
lib/collection/src/collection/collection_ops.rs
lib/collection/src/collection_manager/optimizers/mod.rs

🧠 Learnings (9)

📚 Learning: 2025-08-10T18:26:12.443Z

Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: docs/redoc/master/openapi.json:13645-13652
Timestamp: 2025-08-10T18:26:12.443Z
Learning: In qdrant, docs/redoc/master/openapi.json is auto-generated from the REST schemas. To change field docs, edit lib/api/src/rest/schema.rs (e.g., add doc comments or #[schemars(description = ...)]). Specifically, UpdateVectors.update_filter lacked a description and should state: "If specified, only update vectors for points that match this filter; points not matching the filter are left unchanged."

Applied to files:

docs/redoc/master/openapi.json

📚 Learning: 2025-10-13T22:58:03.121Z

Learnt from: generall
Repo: qdrant/qdrant PR: 7400
File: lib/segment/src/id_tracker/simple_id_tracker.rs:234-241
Timestamp: 2025-10-13T22:58:03.121Z
Learning: SimpleIdTracker in lib/segment/src/id_tracker/simple_id_tracker.rs is being deprecated and should not receive fixes related to version tracking or recovery logic, as it has a different version storage structure that is incompatible with newer trackers.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs
lib/collection/src/collection_manager/optimizers/mod.rs

📚 Learning: 2025-08-10T18:30:02.986Z

Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs
lib/collection/src/collection/collection_ops.rs

📚 Learning: 2025-03-12T09:25:16.340Z

Learnt from: JojiiOfficial
Repo: qdrant/qdrant PR: 5951
File: lib/api/src/grpc/qdrant.rs:6493-6496
Timestamp: 2025-03-12T09:25:16.340Z
Learning: The `payload_index_io_read` field in `HardwareUsage` was introduced in PR #5951 and is safe to modify as it was not used in production before this PR.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-05-30T15:26:14.488Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6479
File: lib/segment/src/index/hnsw_index/graph_layers_builder.rs:137-182
Timestamp: 2025-05-30T15:26:14.488Z
Learning: In the subgraph_connectivity method in graph_layers_builder.rs, the previous_visited_points vector and visited BitVec entry point marking are automatically re-initialized at the start of each retry loop iteration, so manual clearing is not needed.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-08-06T09:56:59.311Z

Learnt from: xzfc
Repo: qdrant/qdrant PR: 6982
File: lib/segment/src/index/hnsw_index/graph_links/view.rs:217-220
Timestamp: 2025-08-06T09:56:59.311Z
Learning: In Qdrant's GraphLinksView implementation (lib/segment/src/index/hnsw_index/graph_links/view.rs), methods like links() and links_with_vectors() intentionally use unwrap() and can panic for performance reasons, maintaining consistency across similar methods. Error handling improvements are considered secondary to feature completion and require benchmarking before implementation.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-04-20T18:32:43.471Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6403
File: lib/segment/src/index/struct_payload_index.rs:485-497
Timestamp: 2025-04-20T18:32:43.471Z
Learning: In StructPayloadIndex, checking if a field exists in `self.config.indexed_fields` is sufficient to determine if an index should exist, as the initialization process in `load_all_fields()` ensures that `field_indexes` is synchronized with the configuration, automatically rebuilding any missing indexes.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-09-08T08:47:43.795Z

Learnt from: timvisee
Repo: qdrant/qdrant PR: 7188
File: lib/collection/src/update_handler.rs:810-813
Timestamp: 2025-09-08T08:47:43.795Z
Learning: In the Qdrant codebase, CollectionId implements traits that allow &CollectionId to convert Into<String>, so passing &collection_name to functions requiring Into<String> works correctly without needing .clone() or .to_string().

Applied to files:

lib/collection/src/collection/collection_ops.rs

📚 Learning: 2025-07-11T11:35:21.549Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Applied to files:

lib/collection/src/collection_manager/optimizers/mod.rs

🧬 Code graph analysis (4)

lib/segment/src/index/hnsw_index/hnsw.rs (2)

lib/segment/src/json_path/parse.rs (1)

json_path (28-40)

lib/segment/src/index/hnsw_index/graph_layers_builder.rs (1)

new (318-333)

tests/openapi/test_optimizations.py (1)

tests/openapi/helpers/helpers.py (1)

request_with_validation (41-116)

src/actix/api/collections_api.rs (2)

src/actix/helpers.rs (1)

time (141-147)

lib/collection/src/operations/verification/mod.rs (1)

new_unchecked_verification_pass (26-28)

lib/collection/src/collection_manager/optimizers/mod.rs (1)

lib/common/common/src/progress_tracker.rs (1)

new_progress_tracker (124-145)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: rust-tests (windows-latest)
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: rust-tests (macos-latest)
GitHub Check: e2e-tests
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-consistency
GitHub Check: test-consensus-compose
GitHub Check: integration-tests-consensus
GitHub Check: integration-tests
GitHub Check: lint

🔇 Additional comments (14)

lib/collection/src/collection_manager/optimizers/mod.rs (3)

30-34: LGTM!

The new IndexingProgressViews struct cleanly separates ongoing from completed progress views. The Default derive provides a sensible empty state.

77-92: LGTM - good use of explicit match.

The match expression explicitly handles all TrackerStatus variants without a catch-all _ arm, which aligns with coding guidelines and ensures new variants are handled when added.

109-124: LGTM - API redesign for progress tracking.

The start method now returns a tuple (Tracker, ProgressTracker) separating the read-only view stored in Tracker from the write handle returned to callers. This is a clean separation of concerns.

lib/segment/src/index/hnsw_index/hnsw.rs (4)

268-296: LGTM - Well-structured progress subtask hierarchy.

The conditional creation of progress subtasks based on build_main_graph and payload_m settings is appropriate. The subtask naming with field type and name ("{field_type}:{field}") provides useful context for observability.

416-419: Unwraps are safe due to guard condition.

The .unwrap() calls are safe because they're within the if build_main_graph block, and progress_main_graph/progress_migrate are created with build_main_graph.then(...) at lines 280-281, guaranteeing Some when build_main_graph is true.

458-461: LGTM - Atomic progress counter with appropriate ordering.

Using Ordering::Relaxed for the progress counter is appropriate here since:

The counter is purely for progress reporting (no synchronization dependencies)

Eventual consistency is acceptable for UI/observability purposes

Also applies to: 479-480

746-759: LGTM - Counter parameter added to build_filtered_graph.

The counter: &AtomicU64 parameter is cleanly integrated into the function signature and used consistently at line 805.

src/schema_generator.rs (1)

20-22: LGTM!

The new OptimizationsResponse type is correctly added to both the imports and the AllDefinitions struct for OpenAPI schema generation.

Also applies to: 100-100

lib/collection/src/collection/collection_ops.rs (1)

418-449: LGTM - Clean aggregation of shard progress.

The implementation correctly:

Conditionally collects completed views only when completed_limit is Some

Sorts both lists by started_at in descending order (newest first)

Safely truncates using the guarded unwrap() at line 442

The comment at lines 440-442 correctly explains the invariant.

lib/collection/src/operations/types.rs (1)

286-296: LGTM - Well-designed API response type.

The OptimizationsResponse struct correctly uses Option<Vec<ProgressTree>> for completed with skip_serializing_if = "Option::is_none", allowing the API to:

Omit completed entirely when ?completed=false (None)

Return an empty or populated array when ?completed=true (Some)

The doc comments clearly explain this behavior.

src/actix/api/collections_api.rs (2)

14-14: RBAC AccessRequirements import is correctly wired.

The new use storage::rbac::AccessRequirements; is used in get_optimizations for check_collection_access and aligns with existing access patterns.

282-282: Service registration for get_optimizations fits existing routing layout.

Registering .service(get_optimizations) after get_cluster_info keeps the /collections/{name}/optimizations route grouped with other collection sub‑routes and shouldn’t interfere with existing patterns.

tests/openapi/test_optimizations.py (2)

7-11: Autouse collection fixture is well-structured and safe

Using an autouse fixture with basic_collection_setup before the test and drop_collection after yield keeps tests isolated and avoids cross-test collection state leaks. This matches the existing helper patterns and looks correct.

26-35: Good coverage of completed=true behavior

The second call with query_params={"completed": "true"} correctly exercises the alternate response shape, asserting both ongoing and completed keys are present and empty. This pairs well with the previous checks and ensures the boolean query param is wired and serialized as expected.

src/actix/api/collections_api.rs

+#[get("/collections/{name}/optimizations")]
+fn get_optimizations(
+    dispatcher: web::Data<Dispatcher>,
+    collection: Path<CollectionPath>,
+    ActixAccess(access): ActixAccess,
+    params: Query<OptimizationsParam>,
+) -> impl Future<Output = HttpResponse> {
+    let completed_limit = params.completed.unwrap_or(false).then(|| {
+        params
+            .completed_limit
+            .unwrap_or(DEFAULT_OPTIMIZATIONS_COMPLETED_LIMIT) as usize
+    });
+
+    helpers::time(async move {
+        let pass = new_unchecked_verification_pass();
+        let collection_pass =
+            access.check_collection_access(&collection.name, AccessRequirements::new())?;
+        Ok(dispatcher
+            .toc(&access, &pass)
+            .get_collection(&collection_pass)
+            .await?
+            .optimizations(completed_limit)
+            .await?)
+    })
+}


coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (6)

docs/redoc/master/openapi.json (4)
1837-1856: Tighten completed/completed_limit parameter schemas.

Set an explicit unsigned format, add an example, and (optionally) cap the upper bound to protect the endpoint from huge responses.

Apply:
       {
         "name": "completed_limit",
         "in": "query",
         "description": "Maximum number of completed optimizations to return. Ignored if completed is false.",
         "required": false,
         "schema": {
-          "type": "integer",
+          "type": "integer",
+          "format": "uint64",
           "minimum": 0,
-          "default": 16
+          "default": 16,
+          "example": 10
+          // Optionally:
+          // "maximum": 1000
         }
       }
If you accept this, please update the Rust OpenAPI schema source (not this generated JSON). Based on learnings, docs/redoc/master/openapi.json is generated from lib/api/src/rest/schema.rs.

16175-16198: Prefer stable array shape for OptimizationsResponse.completed.

Returning null forces clients to branch; an empty list communicates “no data” without shape changes. Consider making it non-nullable with a default empty array and clarify ordering basis.

Apply:
   "OptimizationsResponse": {
     "description": "Optimizations progress for the collection",
     "type": "object",
     "required": ["ongoing"],
     "properties": {
       "ongoing": {
         "description": "Ongoing optimizations from newest to oldest.",
         "type": "array",
         "items": { "$ref": "#/components/schemas/ProgressTree" }
       },
       "completed": {
-        "description": "Completed optimizations from newest to oldest.",
-        "type": "array",
-        "items": { "$ref": "#/components/schemas/ProgressTree" },
-        "nullable": true
+        "description": "Completed optimizations from newest to oldest (by finished_at).",
+        "type": "array",
+        "items": { "$ref": "#/components/schemas/ProgressTree" },
+        "nullable": false,
+        "default": []
       }
     }
   }
Note: adjust the description to state whether “newest” means by finished_at (completed) and by started_at (ongoing). Please change this in the Rust schema source so the generator picks it up. Based on learnings, ...

16199-16249: ProgressTree: make shapes predictable (children default []), and consider requiring started_at.

children is required—ensure leaf nodes are always [] and advertise it with default + minItems.

If every operation has a start time, make started_at required and non-nullable; keep finished_at nullable for in‑flight nodes.

Apply:
 "ProgressTree": {
   "type": "object",
   "required": [
-    "children",
-    "name"
+    "children",
+    "name"
+    // Optionally also require started_at if always present:
+    // "started_at"
   ],
   "properties": {
     "name": {
       "description": "Name of the operation.",
       "type": "string"
     },
     "started_at": {
-      "description": "When the operation started.",
-      "type": "string",
-      "format": "date-time",
-      "nullable": true
+      "description": "When the operation started.",
+      "type": "string",
+      "format": "date-time"
     },
     "finished_at": {
       "description": "When the operation finished.",
       "type": "string",
       "format": "date-time",
       "nullable": true
     },
     "duration_sec": {
       "description": "For finished operations, how long they took, in seconds.",
       "type": "number",
       "format": "double",
       "nullable": true
     },
     "done": {
       "description": "Number of completed units of work, if applicable.",
       "type": "integer",
       "format": "uint64",
       "minimum": 0,
       "nullable": true
     },
     "total": {
       "description": "Total number of units of work, if applicable and known.",
       "type": "integer",
       "format": "uint64",
       "minimum": 0,
       "nullable": true
     },
     "children": {
       "description": "Child operations.",
       "type": "array",
+      "minItems": 0,
+      "default": [],
       "items": { "$ref": "#/components/schemas/ProgressTree" }
     }
   }
 }
Please reflect this in the Rust schema annotations so generation stays consistent. Based on learnings, ...

1818-1826: Minor: endpoint doc polish.

Summary/description are good; consider stating ordering criteria (“newest to oldest by started_at/finished_at”) in the endpoint-level description and that completed=true will populate result.completed (empty list if none).

If you agree, tweak the Rust doc comments driving OpenAPI generation accordingly. Based on learnings, ...

Also applies to: 1880-1914
lib/segment/src/index/hnsw_index/hnsw.rs (2)
458-462: Avoid shadowing and immediately dropping the track_progress guard; rely on auto‑deref for the counter

In the main‑graph section you currently do:
let counter = progress_main_graph
    .track_progress(Some(first_few_ids.len() as u64 + ids.len() as u64));
let counter = counter.deref();
This shadows the value returned by track_progress(), dropping it right away and keeping only the dereferenced counter. If track_progress() returns a guard object with RAII semantics (very plausible for a progress API), this pattern is at least confusing and could prematurely signal completion before any increments happen.

You don’t need the explicit .deref() here; you can keep the original value and use Rust’s auto‑deref both for method calls and for passing a reference, just like you already do with the field‑level counter in build_filtered_graph.

A simpler, more idiomatic version would be:
-            progress_main_graph.start();
-            let counter = progress_main_graph
-                .track_progress(Some(first_few_ids.len() as u64 + ids.len() as u64));
-            let counter = counter.deref();
+            progress_main_graph.start();
+            let counter = progress_main_graph
+                .track_progress(Some(first_few_ids.len() as u64 + ids.len() as u64));
and keep counter.fetch_add(1, Ordering::Relaxed); as‑is; the call site will auto‑deref through the wrapper into the underlying AtomicU64.

Similarly, in build_filtered_graph the new counter: &AtomicU64 parameter and counter.fetch_add(1, Ordering::Relaxed); look good; they’ll continue to work if the caller passes a reference to whatever wrapper type track_progress() returns, via deref coercion.

Please confirm that ProgressTracker::track_progress doesn’t rely on the guard’s Drop semantics in a way that would be broken by this shadowing pattern.

Also applies to: 479-481, 746-759, 784-806

268-297: Progress subtasks on GPU paths and additional‑links could be made more faithful (optional)

A couple of subtle UX points for the new progress wiring:

progress_migrate and progress_main_graph subtasks are created upfront when config.m > 0, but if the main graph is successfully built on GPU (build_main_graph flipped to false), the if build_main_graph { … } block (where you call .start() and set up track_progress) is skipped entirely. This leaves “migrate” and “main_graph” subtasks present in the tree but never started/completed for GPU‑only builds.

In build_filtered_graph, the CPU path correctly increments the counter per inserted point, but the GPU variant build_filtered_graph_on_gpu returns early with no use of the per‑field counter. For additional‑links builds that go through GPU, the field’s progress will remain at zero even though work was done.

These aren’t correctness issues, but they can make the exposed progress tree confusing for GPU builds. If you want more accurate reporting, you could:

Lazily create progress_migrate / progress_main_graph only inside the CPU build_main_graph branch, or only call subtask(...) when you know you’ll actually use them.

In the GPU paths, either:

Call track_progress(Some(total)) and advance the counter before/after GPU calls (e.g., increment by points_to_index.len()), or

Expose a small progress hook in the GPU builders so they can update the same counter.

If you’re okay with CPU‑only progress in the first iteration, this can be deferred.

Also applies to: 389-414, 416-420, 425-426, 454-455, 498-500, 567-585, 629-642, 769-782

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1239985 and 6cba1e1.

📒 Files selected for processing (17)

docs/redoc/master/openapi.json (2 hunks)
lib/collection/src/collection/collection_ops.rs (3 hunks)
lib/collection/src/collection_manager/optimizers/mod.rs (5 hunks)
lib/collection/src/operations/types.rs (2 hunks)
lib/collection/src/shards/forward_proxy_shard.rs (3 hunks)
lib/collection/src/shards/local_shard/mod.rs (1 hunks)
lib/collection/src/shards/proxy_shard.rs (3 hunks)
lib/collection/src/shards/queue_proxy_shard.rs (2 hunks)
lib/collection/src/shards/replica_set/mod.rs (3 hunks)
lib/collection/src/shards/shard.rs (2 hunks)
lib/segment/src/index/hnsw_index/hnsw.rs (14 hunks)
openapi/openapi-collections.ytt.yaml (1 hunks)
src/actix/api/collections_api.rs (3 hunks)
src/schema_generator.rs (2 hunks)
tests/consensus_tests/auth_tests/test_jwt_access.py (2 hunks)
tests/openapi/test_optimizations.py (1 hunks)
tests/openapi_consistency_check.sh (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (10)

lib/collection/src/operations/types.rs
lib/collection/src/shards/replica_set/mod.rs
lib/collection/src/shards/local_shard/mod.rs
lib/collection/src/shards/shard.rs
tests/openapi_consistency_check.sh
tests/consensus_tests/auth_tests/test_jwt_access.py
openapi/openapi-collections.ytt.yaml
lib/collection/src/shards/queue_proxy_shard.rs
lib/collection/src/shards/forward_proxy_shard.rs
src/actix/api/collections_api.rs

🧰 Additional context used

📓 Path-based instructions (1)

**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust
In new code, don't use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice - use bytemuck or zerocopy crates instead
Prefer explicit exhaustive matches over catch-all _ arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using : _ over using .. in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)

Files:

lib/collection/src/shards/proxy_shard.rs
lib/collection/src/collection/collection_ops.rs
lib/collection/src/collection_manager/optimizers/mod.rs
lib/segment/src/index/hnsw_index/hnsw.rs
src/schema_generator.rs

🧠 Learnings (10)

📚 Learning: 2025-08-15T11:42:00.297Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

lib/collection/src/shards/proxy_shard.rs

📚 Learning: 2025-09-08T08:47:43.795Z

Learnt from: timvisee
Repo: qdrant/qdrant PR: 7188
File: lib/collection/src/update_handler.rs:810-813
Timestamp: 2025-09-08T08:47:43.795Z
Learning: In the Qdrant codebase, CollectionId implements traits that allow &CollectionId to convert Into<String>, so passing &collection_name to functions requiring Into<String> works correctly without needing .clone() or .to_string().

Applied to files:

lib/collection/src/collection/collection_ops.rs

📚 Learning: 2025-08-10T18:30:02.986Z

Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.

Applied to files:

lib/collection/src/collection/collection_ops.rs

📚 Learning: 2025-10-13T22:58:03.121Z

Learnt from: generall
Repo: qdrant/qdrant PR: 7400
File: lib/segment/src/id_tracker/simple_id_tracker.rs:234-241
Timestamp: 2025-10-13T22:58:03.121Z
Learning: SimpleIdTracker in lib/segment/src/id_tracker/simple_id_tracker.rs is being deprecated and should not receive fixes related to version tracking or recovery logic, as it has a different version storage structure that is incompatible with newer trackers.

Applied to files:

lib/collection/src/collection_manager/optimizers/mod.rs
lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-07-11T11:35:21.549Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Applied to files:

lib/collection/src/collection_manager/optimizers/mod.rs

📚 Learning: 2025-03-12T09:25:16.340Z

Learnt from: JojiiOfficial
Repo: qdrant/qdrant PR: 5951
File: lib/api/src/grpc/qdrant.rs:6493-6496
Timestamp: 2025-03-12T09:25:16.340Z
Learning: The `payload_index_io_read` field in `HardwareUsage` was introduced in PR #5951 and is safe to modify as it was not used in production before this PR.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-05-30T15:26:14.488Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6479
File: lib/segment/src/index/hnsw_index/graph_layers_builder.rs:137-182
Timestamp: 2025-05-30T15:26:14.488Z
Learning: In the subgraph_connectivity method in graph_layers_builder.rs, the previous_visited_points vector and visited BitVec entry point marking are automatically re-initialized at the start of each retry loop iteration, so manual clearing is not needed.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-08-06T09:56:59.311Z

Learnt from: xzfc
Repo: qdrant/qdrant PR: 6982
File: lib/segment/src/index/hnsw_index/graph_links/view.rs:217-220
Timestamp: 2025-08-06T09:56:59.311Z
Learning: In Qdrant's GraphLinksView implementation (lib/segment/src/index/hnsw_index/graph_links/view.rs), methods like links() and links_with_vectors() intentionally use unwrap() and can panic for performance reasons, maintaining consistency across similar methods. Error handling improvements are considered secondary to feature completion and require benchmarking before implementation.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-04-20T18:32:43.471Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6403
File: lib/segment/src/index/struct_payload_index.rs:485-497
Timestamp: 2025-04-20T18:32:43.471Z
Learning: In StructPayloadIndex, checking if a field exists in `self.config.indexed_fields` is sufficient to determine if an index should exist, as the initialization process in `load_all_fields()` ensures that `field_indexes` is synchronized with the configuration, automatically rebuilding any missing indexes.

Applied to files:

lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-08-10T18:26:12.443Z

Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: docs/redoc/master/openapi.json:13645-13652
Timestamp: 2025-08-10T18:26:12.443Z
Learning: In qdrant, docs/redoc/master/openapi.json is auto-generated from the REST schemas. To change field docs, edit lib/api/src/rest/schema.rs (e.g., add doc comments or #[schemars(description = ...)]). Specifically, UpdateVectors.update_filter lacked a description and should state: "If specified, only update vectors for points that match this filter; points not matching the filter are left unchanged."

Applied to files:

docs/redoc/master/openapi.json

🧬 Code graph analysis (3)

lib/collection/src/shards/proxy_shard.rs (5)

lib/collection/src/shards/forward_proxy_shard.rs (1)

optimizers_log (385-387)

lib/collection/src/shards/local_shard/mod.rs (1)

optimizers_log (962-964)

lib/collection/src/shards/queue_proxy_shard.rs (1)

optimizers_log (224-226)

lib/collection/src/shards/replica_set/mod.rs (1)

optimizers_log (1259-1262)

lib/collection/src/shards/shard.rs (1)

optimizers_log (239-249)

tests/openapi/test_optimizations.py (1)

tests/openapi/helpers/helpers.py (1)

request_with_validation (41-116)

lib/collection/src/collection_manager/optimizers/mod.rs (1)

lib/common/common/src/progress_tracker.rs (1)

new_progress_tracker (124-145)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: Build Qdrant Edge Python bindings
GitHub Check: rust-tests (windows-latest)
GitHub Check: rust-tests (macos-latest)
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: lint
GitHub Check: integration-tests
GitHub Check: test-consistency
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: e2e-tests
GitHub Check: test-consensus-compose
GitHub Check: integration-tests-consensus

🔇 Additional comments (10)

tests/openapi/test_optimizations.py (2)

7-11: LGTM on fixture setup.

The autouse=True fixture correctly handles collection setup and teardown for all tests in this module.

14-35: Tests cover the basic API contract; consider adding a test with actual optimization activity.

The test correctly validates both scenarios:

Default response (no completed param) returns ongoing: [] without a completed key.

With completed=true, both ongoing and completed are present as empty lists.

For more comprehensive coverage, consider adding a test that triggers an optimization (e.g., via data insertion + index creation) to verify the response structure when optimizations are actually in progress or completed.

src/schema_generator.rs (2)

20-22: LGTM on import addition.

OptimizationsResponse is correctly imported alongside other types from the same module.

100-100: LGTM on schema registration.

The new field bp: OptimizationsResponse follows the established naming convention and ensures the type is included in the generated OpenAPI schema.

lib/collection/src/shards/proxy_shard.rs (2)

11-11: LGTM on new imports.

The imports for ParkingMutex and TrackerLog are correctly added to support the new accessor method.

Also applies to: 26-26

171-173: LGTM on accessor implementation.

The optimizers_log accessor correctly delegates to the wrapped shard, consistent with the proxy pattern used throughout this file and matching the signatures in ForwardProxyShard, QueueProxyShard, and LocalShard.

lib/collection/src/collection/collection_ops.rs (2)

1-1: LGTM on new imports.

The std::cmp::{self, Reverse} and IndexingProgressViews imports are correctly added to support the new optimizations method.

Also applies to: 11-11

418-449: LGTM on the optimizations method implementation.

The implementation correctly:

Aggregates progress views across all shard replicas with local optimizer logs.

Conditionally collects completed entries only when completed_limit is specified.

Sorts both ongoing and completed entries by started_at in descending order (newest first).

Applies truncation to completed entries respecting the limit.

The .unwrap() on line 442 is safe due to the invariant that all_completed and completed_limit are both Some or both None, as documented in the comment.

lib/collection/src/collection_manager/optimizers/mod.rs (2)

30-34: IndexingProgressViews and progress_views() grouping look solid

The new IndexingProgressViews type and TrackerLog::progress_views() implementation cleanly separate ongoing vs. completed optimizations, with explicit status matching and latest-first ordering. The logic is straightforward and bounded by KEEP_LAST_TRACKERS, so no functional or performance concerns from this change.

Also applies to: 78-92

104-106: ProgressView integration into Tracker is consistent and telemetry-safe

Storing a ProgressView on Tracker, constructing it via new_progress_tracker() in Tracker::start, and sourcing start_at from progress_view.started_at() keeps optimizer timing in sync with the progress tree. Returning (Tracker, ProgressTracker) from start makes the read/write split explicit without changing external telemetry shape.

Also applies to: 108-124, 132-139

* Add progress_tracker.rs * Pass progress tracker around * Populate progress tracker with actual data * Expose progress on `/collections/{name}/optimizations` endpoint

xzfc force-pushed the index-progress branch 3 times, most recently from ed3eec6 to a658332 Compare December 5, 2025 02:07

xzfc marked this pull request as ready for review December 5, 2025 02:20

coderabbitai bot reviewed Dec 5, 2025

View reviewed changes

docs/redoc/master/openapi.json Show resolved Hide resolved

lib/common/common/src/progress_tracker.rs Outdated Show resolved Hide resolved

lib/segment/src/index/hnsw_index/hnsw.rs Outdated Show resolved Hide resolved

openapi/openapi-collections.ytt.yaml Outdated Show resolved Hide resolved

xzfc force-pushed the index-progress branch from a658332 to 0f2fc28 Compare December 5, 2025 02:52

coderabbitai bot reviewed Dec 5, 2025

View reviewed changes

qdrant deleted a comment from coderabbitai bot Dec 8, 2025

generall reviewed Dec 8, 2025

View reviewed changes

lib/segment/src/index/hnsw_index/hnsw.rs Outdated Show resolved Hide resolved

generall reviewed Dec 8, 2025

View reviewed changes

lib/collection/src/operations/types.rs Outdated Show resolved Hide resolved

xzfc added 3 commits December 8, 2025 23:05

Add progress_tracker.rs

4c79bf9

Pass progress tracker around

691a13e

Populate progress tracker with actual data

7c18125

xzfc force-pushed the index-progress branch from 0f2fc28 to da59034 Compare December 8, 2025 23:09

coderabbitai bot reviewed Dec 8, 2025

View reviewed changes

xzfc force-pushed the index-progress branch from da59034 to 1239985 Compare December 8, 2025 23:17

coderabbitai bot reviewed Dec 8, 2025

View reviewed changes

Expose progress on /collections/{name}/optimizations endpoint

6cba1e1

xzfc force-pushed the index-progress branch from 1239985 to 6cba1e1 Compare December 8, 2025 23:32

coderabbitai bot reviewed Dec 8, 2025

View reviewed changes

generall approved these changes Dec 9, 2025

View reviewed changes

xzfc merged commit 77c7171 into dev Dec 9, 2025
15 checks passed

xzfc deleted the index-progress branch December 9, 2025 00:15

coderabbitai bot mentioned this pull request Dec 9, 2025

Expose segment UUIDs in telemetry response #7735

Merged

xzfc mentioned this pull request Dec 15, 2025

Register trackers only after successfull start #7766

Merged

coderabbitai bot mentioned this pull request Dec 15, 2025

OptimizationPlanner #7752

Merged

timvisee pushed a commit that referenced this pull request Dec 18, 2025

Indexing progress (#7625)

6c0d10a

* Add progress_tracker.rs * Pass progress tracker around * Populate progress tracker with actual data * Expose progress on `/collections/{name}/optimizations` endpoint

coderabbitai bot mentioned this pull request Jan 4, 2026

Prevent unoptimized updates #7643

Merged

3 tasks

coderabbitai bot mentioned this pull request Feb 10, 2026

refactor/opt #8065

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing progress#7625

Indexing progress#7625
xzfc merged 4 commits intodevfrom
index-progress

xzfc commented Nov 27, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Dec 8, 2025 •

edited

Loading

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xzfc commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xzfc commented Nov 27, 2025 •

edited

Loading

coderabbitai bot commented Dec 8, 2025 •

edited

Loading