Skip to content

Log snapshot download duration#7715

Merged
KShivendu merged 5 commits intodevfrom
snapshot-download-duration
Dec 8, 2025
Merged

Log snapshot download duration#7715
KShivendu merged 5 commits intodevfrom
snapshot-download-duration

Conversation

@KShivendu
Copy link
Member

@KShivendu KShivendu commented Dec 8, 2025

Logging file size, duration, and download rate makes the system more transparent. Doing this for full, partial, and s3 snapshot downloads

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Dec 8, 2025
}

let mut stream = response.bytes_stream();
let mut total_bytes_downloaded = 0u64;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we just do file.len() at the end?

Copy link
Member Author

@KShivendu KShivendu Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will have to do file.metadata().await?.len() which can be unreliable due to flushing and will be a sys call. It's fine like this imo. It will measure some response bytes too, which is okay.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 8, 2025

📝 Walkthrough

Walkthrough

The changes introduce comprehensive download instrumentation across multiple snapshot-related modules. A new public constant BYTES_IN_MB is added to the common module, then utilized throughout snapshot download operations in the collection, storage, and API layers to record and log download timing metrics including duration and speed in Mbps alongside byte tracking. The instrumentation maintains existing error handling and verification logic while adding measurement capabilities.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • Consistent instrumentation pattern: The same timing and byte-tracking logic is applied homogeneously across multiple files, reducing the cognitive load per file
  • Straightforward calculations: Duration computation, byte-to-megabyte conversion, and Mbps calculation are basic arithmetic operations with minimal complexity
  • Limited control flow changes: Most modifications involve adding statements rather than altering existing logic paths

Areas that may require extra attention during review:

  • Verify the BYTES_IN_MB constant value (1_048_576) is correct for all intended calculations
  • Confirm timing precision (start/end capture) is consistent across all four modified files
  • Check that the directory creation added to snapshot_storage_ops.rs doesn't introduce race conditions or unnecessary filesystem operations
  • Ensure logging statements at debug level won't cause unexpected performance overhead in high-throughput scenarios
  • Verify that downstream variable bindings correctly handle the new bytes_downloaded value in snapshot_api.rs recovery flows

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Log snapshot download duration' directly matches the main change: adding duration logging to snapshot downloads across multiple files.
Description check ✅ Passed The description accurately describes the PR's purpose: logging file size, duration, and download rate for full, partial, and S3 snapshot downloads.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch snapshot-download-duration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
lib/segment/src/common/mod.rs (1)

247-248: BYTES_IN_MB constant is correct and consistent

BYTES_IN_MB matches 1024 * 1024 and fits existing usage; no behavioral issues. If you want to avoid duplicating the literal, you could optionally define it in terms of BYTES_IN_KB (e.g., BYTES_IN_KB * BYTES_IN_KB), but it’s fine as-is.

lib/collection/src/operations/snapshot_storage_ops.rs (1)

9-9: Download metrics and directory handling look good; consider small sequencing/type tweaks

The added timing/throughput logging and parent directory creation are solid and improve observability and robustness; the main behavior looks correct.

Two minor follow-ups to consider:

  1. Log after integrity check instead of before it
    Right now the "Object storage snapshot download completed" debug log is emitted before the file-size consistency check. If the size mismatch branch is hit, logs will still say “completed”, which can be slightly misleading when debugging corrupted/partial downloads. Reordering so that the debug log comes after the file_meta.len() != total_size as u64 check would keep logs aligned with actual success.

  2. Optional: align counter type with on-disk length
    total_size is inferred as usize and later compared to file_meta.len() via as u64. If you ever expect very large snapshots, you may want to make total_size a u64 to match file_meta.len() and avoid any theoretical usize overflow on 32‑bit targets. If you do that, you can also drop the as u64 cast in the comparison. Based on learnings, this mirrors the overflow‑resilience approach used elsewhere in the repo.

If you’d like to apply (1), an example diff for the tail of the function would look like:

-    // ensure flush
-    file.flush()
-        .await
-        .map_err(|e| CollectionError::service_error(format!("Failed to flush file: {e}")))?;
-
-    let download_duration = download_start_time.elapsed();
-    let total_size_mb = total_size as f64 / BYTES_IN_MB as f64;
-    let download_speed_mbps = total_size_mb / download_duration.as_secs_f64();
-    log::debug!(
-        "Object storage snapshot download completed: path={}, size={:.2} MB, duration={:.2}s, speed={:.2} MB/s",
-        target_path.display(),
-        total_size_mb,
-        download_duration.as_secs_f64(),
-        download_speed_mbps
-    );
-
-    // check len to file len
-    let file_meta = tokio_fs::metadata(target_path).await?;
+    // ensure flush
+    file.flush()
+        .await
+        .map_err(|e| CollectionError::service_error(format!("Failed to flush file: {e}")))?;
+
+    // check len to file len before logging success
+    let file_meta = tokio_fs::metadata(target_path).await?;
     if file_meta.len() != total_size as u64 {
         return Err(CollectionError::service_error(format!(
             "Downloaded file size does not match the expected size: {} != {}",
             file_meta.len(),
             total_size
         )));
     }
-    Ok(())
+
+    let download_duration = download_start_time.elapsed();
+    let total_size_mb = total_size as f64 / BYTES_IN_MB as f64;
+    let download_speed_mbps = if download_duration.as_secs_f64() > 0.0 {
+        total_size_mb / download_duration.as_secs_f64()
+    } else {
+        0.0
+    };
+    log::debug!(
+        "Object storage snapshot download completed: path={}, size={:.2} MB, duration={:.2}s, speed={:.2} MB/s",
+        target_path.display(),
+        total_size_mb,
+        download_duration.as_secs_f64(),
+        download_speed_mbps
+    );
+
+    Ok(())

Also applies to: 200-258

lib/storage/src/content_manager/snapshots/download.rs (1)

7-7: Remote snapshot download metrics look correct; minor naming/logging nit

The added timing and byte‑counting logic is straightforward and correctly computes MB and MB/s using BYTES_IN_MB; the overall behavior of download_file is preserved.

One small nit: the variable is called download_speed_mbps, but the log message says "speed={:.2} MB/s". To avoid confusion between megabits (Mbps) and megabytes (MB/s), consider renaming the variable (e.g., download_speed_mb_s) or adjusting the log string to match whatever unit you intend to standardize on across the codebase.

Also applies to: 33-33, 56-61, 66-75

src/actix/api/snapshot_api.rs (1)

21-21: Partial snapshot download‑from‑peer metrics and flow look good

The added timing/byte tracking inside recover_partial_snapshot_from is wired correctly:

  • download_start_time scopes the measurement to the remote partial snapshot fetch.
  • total_bytes_downloaded is accumulated per chunk and returned via StorageResult::Ok((collection, partial_snapshot_temp_path, total_bytes_downloaded)).
  • The match on create_partial_snapshot_result cleanly preserves the EmptyPartialSnapshot fast‑path and error propagation, and metrics are only logged for real downloads.
  • The debug log includes path, size in MB, duration, speed, and shard_id, which is useful for tracing.

Given that very similar MB/s logging now exists in several places, you might eventually consider a small shared helper (e.g., a utility that takes start_time, bytes, and a label/context) to avoid copy‑pasting the same conversion and format string, but that’s purely a future cleanup and not required for this PR.

Also applies to: 722-813

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fe46d0 and be855d3.

📒 Files selected for processing (4)
  • lib/collection/src/operations/snapshot_storage_ops.rs (3 hunks)
  • lib/segment/src/common/mod.rs (1 hunks)
  • lib/storage/src/content_manager/snapshots/download.rs (3 hunks)
  • src/actix/api/snapshot_api.rs (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust
In new code, don't use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice - use bytemuck or zerocopy crates instead
Prefer explicit exhaustive matches over catch-all _ arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using : _ over using .. in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)

Files:

  • lib/segment/src/common/mod.rs
  • lib/collection/src/operations/snapshot_storage_ops.rs
  • src/actix/api/snapshot_api.rs
  • lib/storage/src/content_manager/snapshots/download.rs
🧠 Learnings (1)
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

  • lib/segment/src/common/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: Build Qdrant Edge Python bindings
  • GitHub Check: e2e-tests
  • GitHub Check: test-consistency
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests-consensus
  • GitHub Check: integration-tests
  • GitHub Check: lint

@KShivendu KShivendu merged commit 1962d89 into dev Dec 8, 2025
15 checks passed
@KShivendu KShivendu deleted the snapshot-download-duration branch December 8, 2025 19:43
timvisee pushed a commit that referenced this pull request Dec 18, 2025
* Log snapshot download duration

* only 2 digits in filesize

* Log for normal snapshots too

* fmt

* Log download stats for s3 snapshot API and cleanup
@timvisee timvisee mentioned this pull request Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants