Conversation
📝 WalkthroughWalkthroughThe changes introduce a new function 📜 Recent review detailsConfiguration used: CodeRabbit UI ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (4)
🧰 Additional context used🧬 Code Definitions (2)lib/segment/benches/hnsw_search_graph.rs (2)
lib/segment/benches/hnsw_build_asymptotic.rs (1)
⏰ Context from checks skipped due to timeout of 90000ms (13)
🔇 Additional comments (12)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
| { | ||
| use indicatif::{ParallelProgressIterator as _, ProgressStyle}; | ||
|
|
||
| let path = Path::new(env!("CARGO_TARGET_TMPDIR")) |
There was a problem hiding this comment.
TIL about this variable 📝
Regarding the docs:
Cargo initially creates this directory but doesn’t manage its content in any way, this is the responsibility of the test code.
Would it make sense to add a note here that updating the generation code should also edit the filename of the cached layers?
There was a problem hiding this comment.
Proper cache invalidation logic is overkill for benchmarks, so instead I rely on a person running the benchmarks to clear cache manually when needed. Unlike tests, benchmarks are not running unattended, and you'll be reading their console output anyway.
Tho, I've added a bunch of comments and logs to so it's harder to miss now.
$ cargo bench -p segment --profile perf --bench hnsw_build_asymptotic
Finished `perf` profile [optimized] target(s) in 0.15s
Running benches/hnsw_build_asymptotic.rs (target/perf/deps/hnsw_build_asymptotic-d9f4dcc37843d3a7)
Benchmarking hnsw-index-build-asymptotic/build-n-search-hnsw-5k: Warming up for 3.0000 s
Loading cached links (built 2days 7h 21m ago) from "/home/user/qdrant/target/tmp/segment/hnsw_build_asymptotic/5000-16-16-64-true-Cosine/graph.bin".
Delete the directory above if code related to HNSW graph building is changed
hnsw-index-build-asymptotic/build-n-search-hnsw-5k
time: [26.508 µs 26.817 µs 27.120 µs]
005ece8 to
6424d3b
Compare
6424d3b to
9193317
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
lib/segment/benches/fixture.rs (3)
49-54: Consider validating the integrity of the cached graph before loading.If the disk contents are partial or corrupted, loading might cause runtime errors or incorrect benchmark data. Adding basic checks or versioning at load time (e.g., verifying a hash) could improve robustness.
45-47: Provide a way to parameterize the random seed for reproducible experiments.Currently, the seed is hardcoded to
42. Offering a configurability option in benchmarks (e.g., via environment variables or function arguments) could help users run multiple controlled experiments or unify seeds across different benches.
88-92: Enhance error messaging inupdated_ago().Returning
"???"when an error occurs may hinder debugging efforts. Consider propagating the actual error or including more descriptive information to easily diagnose issues related to file metadata retrieval.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
Cargo.toml(1 hunks)lib/segment/Cargo.toml(1 hunks)lib/segment/benches/fixture.rs(1 hunks)lib/segment/benches/hnsw_build_asymptotic.rs(6 hunks)lib/segment/benches/hnsw_search_graph.rs(2 hunks)lib/segment/src/index/hnsw_index/hnsw.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/segment/src/index/hnsw_index/hnsw.rs
🧰 Additional context used
🧬 Code Definitions (1)
lib/segment/benches/hnsw_search_graph.rs (1)
lib/segment/benches/fixture.rs (1)
make_cached_graph(22-86)
🔇 Additional comments (12)
lib/segment/Cargo.toml (1)
32-32: Adding humantime dependency looks appropriate.This new dependency will help with human-readable time durations and timestamps, which is useful for the benchmark caching mechanism being implemented.
Cargo.toml (1)
180-180: Adding humantime to workspace dependencies is consistent.Adding the humantime dependency to the workspace ensures consistent versioning across the project. The version 2.1.0 is the latest stable release.
lib/segment/benches/hnsw_search_graph.rs (2)
10-11: Good refactoring to use the new fixture module.Moving the graph creation and caching logic to a shared fixture module improves code organization and reusability.
Also applies to: 22-22
31-32: Clean improvement to the benchmark implementation.Replacing the complex graph creation and caching logic with a single call to
fixture::make_cached_graphgreatly simplifies the code while maintaining the same functionality. This makes the benchmark more maintainable.lib/segment/benches/hnsw_build_asymptotic.rs (7)
4-5: Good use of LazyCell and fixture module.Using LazyCell for expensive setup operations is an excellent optimization. It ensures the setup code only runs when needed, which can significantly improve benchmark performance when using filters.
Also applies to: 21-21
28-32: Effective caching strategy for 5K vector benchmark.The LazyCell approach ensures that the graph is only created once and properly cached. The explicit drop after using the setup helps with memory management.
Also applies to: 44-44
46-51: Good implementation of the 1M vector benchmark setup.Similar to the 5K setup, this effectively caches the large 1M vector graph. The NUM_VECTORS constant is now used with a value of 1,000,000, which matches the benchmark name.
Also applies to: 78-78
52-62: Benchmark name now accurately reflects vector count.Changing from generic "build-n-search-hnsw" to specific "build-n-search-hnsw-1M" makes the benchmark name more descriptive and matches the actual implementation.
63-76: Updated point scoring benchmark with appropriate vector range.The modification to use NUM_VECTORS as the upper bound for the random range is more consistent with the actual vector count being used in the benchmark.
145-153: Optimized vector generation with LazyCell.Using LazyCell for vector generation in the basic_scoring_vectors benchmark is a good optimization. It ensures vectors are only generated once and explicitly dropped after use.
Also applies to: 163-163
166-174: Consistent LazyCell pattern for the 10x benchmark.The same LazyCell pattern is applied consistently to the 10x benchmark, maintaining code consistency and optimization benefits.
Also applies to: 184-184
lib/segment/benches/fixture.rs (1)
34-40: Add a note to maintain the cached layers' filename when generation code changes.This echoes an earlier comment suggesting that changing the generation logic should also involve updating the cached file naming scheme. Otherwise, stale or incompatible cached files might lead to confusing or incorrect results.
* Update hnsw_build_asymptotic benchmark * Comments and logs
This PR updates
hnsw_build_asymptoticto avoid waiting too much on setup.fixture.rs.LazyCellto avoid calling it for filtered out benchmarks.