Conversation
9332d25 to
b77c3c9
Compare
7ac2a33 to
6f62e59
Compare
00e5d19 to
5c78bcc
Compare
|
Local experiments with bfb:
The slowdown seems a bit excessive in that case. |
|
it seems we can only update in batches of 10k between optimizations |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/collection/src/update_workers/optimization_worker.rs (1)
223-255: Verify signaling behavior on all optimization outcomes.The
optimization_finished_sender.send(())is placed inside the callback closure that runs after optimization completes. However, looking atlaunch_optimization, this callback is invoked:
- On success (line 373)
- When no IO permit is available (line 316-317, only if handles is empty)
The callback is not invoked on cancellation, error, or panic (lines 376-408). This means the update worker might wait indefinitely if an optimization fails without triggering another optimization cycle.
Verification script
#!/bin/bash # Verify where the callback is invoked in launch_optimization rg -n "callback\(\)" lib/collection/src/update_workers/optimization_worker.rs -A2 -B2
🤖 Fix all issues with AI agents
In @docs/redoc/master/openapi.json:
- Around line 10221-10227: The OptimizersConfigDiff schema should not include a
default for diff fields and the description should use “If set to true …”
wording; update the generator/source schema that produces OptimizersConfigDiff
so the prevent_unoptimized property omits "default": null (remove any default
value) and change its description to start with "If set to true, service
will..." to avoid implying a default or prompting clients to send null. Ensure
changes are made in the REST schema generator/source (not by editing
docs/redoc/master/openapi.json directly) so future JSON is regenerated
correctly.
- Around line 7588-7594: Update the doc comment for the prevent_unoptimized
Option<bool> field to explicitly state the boolean semantics: replace ambiguous
"If this option is set..." with "If set to true, the service will try to prevent
creation of large unoptimized segments; if false or unset, this behavior is
disabled." Apply this change to the doc comment for prevent_unoptimized in the
OptimizersBuilder definition (lib/collection/src/optimizers_builder.rs) and in
the ConfigDiff/operations definition
(lib/collection/src/operations/config_diff.rs), and ensure the comment mentions
that the default/unset state is disabled (Option::None -> JSON null).
In @lib/api/src/grpc/qdrant.rs:
- Around line 858-865: The docstring for the proto field prevent_unoptimized (an
optional bool) is ambiguous; update the comment in
lib/api/src/grpc/proto/collections.proto for the corresponding field to read "If
set to `true`, service will try to prevent creation of large unoptimized
segments..." (and adjust subsequent sentences if needed to reflect true/false
semantics and "Default is disabled"), then re-run prost-build to regenerate
lib/api/src/grpc/qdrant.rs so the GENERATED comment matches the clarified
boolean semantics for prevent_unoptimized.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
docs/redoc/master/openapi.jsonlib/api/src/grpc/proto/collections.protolib/api/src/grpc/qdrant.rslib/api/src/rest/schema.rslib/collection/benches/batch_query_bench.rslib/collection/benches/batch_search_bench.rslib/collection/src/operations/config_diff.rslib/collection/src/operations/conversions.rslib/collection/src/optimizers_builder.rslib/collection/src/shards/local_shard/indexed_only.rslib/collection/src/shards/local_shard/mod.rslib/collection/src/shards/local_shard/telemetry.rslib/collection/src/shards/replica_set/update.rslib/collection/src/tests/fixtures.rslib/collection/src/update_handler.rslib/collection/src/update_workers/optimization_worker.rslib/collection/src/update_workers/update_worker.rslib/collection/tests/integration/common/mod.rslib/shard/src/segment_holder/mod.rslib/storage/tests/integration/alias_tests.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicitSomeType::from(x)over implicitx.into()in Rust
In new code, don't usetransmute_from_u8,transmute_to_u8,transmute_from_u8_to_slice,transmute_from_u8_to_mut_slice,transmute_to_u8_slice- usebytemuckorzerocopycrates instead
Prefer explicit exhaustive matches over catch-all_arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using: _over using..in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)
Files:
lib/collection/benches/batch_query_bench.rslib/storage/tests/integration/alias_tests.rslib/collection/src/update_workers/update_worker.rslib/collection/src/shards/replica_set/update.rslib/api/src/grpc/qdrant.rslib/collection/src/update_workers/optimization_worker.rslib/collection/src/shards/local_shard/indexed_only.rslib/collection/src/operations/config_diff.rslib/shard/src/segment_holder/mod.rslib/collection/tests/integration/common/mod.rslib/api/src/rest/schema.rslib/collection/src/shards/local_shard/mod.rslib/collection/src/optimizers_builder.rslib/collection/src/update_handler.rslib/collection/src/operations/conversions.rslib/collection/src/tests/fixtures.rslib/collection/benches/batch_search_bench.rslib/collection/src/shards/local_shard/telemetry.rs
🧠 Learnings (12)
📚 Learning: 2025-11-25T11:56:33.176Z
Learnt from: CR
Repo: qdrant/qdrant PR: 0
File: .github/review-rules.md:0-0
Timestamp: 2025-11-25T11:56:33.176Z
Learning: Applies to **/*.rs : Prefer explicit field ignoring using `: _` over using `..` in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)
Applied to files:
lib/storage/tests/integration/alias_tests.rslib/collection/src/tests/fixtures.rs
📚 Learning: 2025-09-16T19:14:17.614Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7183
File: lib/api/src/grpc/qdrant.rs:4263-4273
Timestamp: 2025-09-16T19:14:17.614Z
Learning: In qdrant, lib/api/src/grpc/qdrant.rs is auto-generated by prost-build; do not edit it directly. Make changes in lib/api/src/grpc/proto/points.proto (e.g., add [deprecated=true], doc comments, or encoding options), then regenerate the Rust code.
Applied to files:
lib/api/src/grpc/qdrant.rs
📚 Learning: 2025-08-10T18:26:12.443Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: docs/redoc/master/openapi.json:13645-13652
Timestamp: 2025-08-10T18:26:12.443Z
Learning: In qdrant, docs/redoc/master/openapi.json is auto-generated from the REST schemas. To change field docs, edit lib/api/src/rest/schema.rs (e.g., add doc comments or #[schemars(description = ...)]). Specifically, UpdateVectors.update_filter lacked a description and should state: "If specified, only update vectors for points that match this filter; points not matching the filter are left unchanged."
Applied to files:
lib/api/src/grpc/qdrant.rsdocs/redoc/master/openapi.json
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.
Applied to files:
lib/collection/src/update_workers/optimization_worker.rslib/collection/src/shards/local_shard/indexed_only.rs
📚 Learning: 2025-09-01T11:42:06.964Z
Learnt from: timvisee
Repo: qdrant/qdrant PR: 7157
File: lib/shard/src/segment_holder/mod.rs:808-814
Timestamp: 2025-09-01T11:42:06.964Z
Learning: In Qdrant's segment holder, panicking when no segments exist during flush_all is intentional and preferred over graceful error handling, as having zero segments could permanently corrupt the WAL by acknowledging u64::MAX. The maintainers consider this condition impossible and use the panic as a fail-fast safety mechanism to prevent data corruption.
Applied to files:
lib/collection/src/update_workers/optimization_worker.rs
📚 Learning: 2025-09-01T11:19:26.371Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7193
File: lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs:17-30
Timestamp: 2025-09-01T11:19:26.371Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs, the ChunkedMmapVectors underlying implementation does not validate against zero-sized vectors, so adding such validation in QuantizedChunkedMmapStorage::new is unnecessary and would be redundant.
Applied to files:
lib/collection/src/shards/local_shard/indexed_only.rs
📚 Learning: 2025-08-15T11:41:10.629Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
Applied to files:
lib/collection/src/shards/local_shard/indexed_only.rs
📚 Learning: 2025-08-11T00:37:34.100Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.
Applied to files:
lib/collection/src/shards/local_shard/indexed_only.rs
📚 Learning: 2025-08-21T13:45:05.899Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.899Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.
Applied to files:
lib/collection/src/shards/local_shard/indexed_only.rs
📚 Learning: 2025-08-20T15:03:18.522Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7100
File: lib/segment/src/index/field_index/full_text_index/inverted_index/mutable_inverted_index.rs:82-100
Timestamp: 2025-08-20T15:03:18.522Z
Learning: The merge_postings_iterator function in lib/segment/src/index/field_index/full_text_index/inverted_index/postings_iterator.rs handles deduplication internally, so there's no need to add .dedup() when using it for OR-merge operations in inverted indexes.
Applied to files:
lib/shard/src/segment_holder/mod.rs
📚 Learning: 2025-06-14T20:35:10.603Z
Learnt from: generall
Repo: qdrant/qdrant PR: 6635
File: lib/segment/src/index/field_index/full_text_index/text_index.rs:784-832
Timestamp: 2025-06-14T20:35:10.603Z
Learning: Functions gated with `#[cfg(feature = "testing")]` within `#[cfg(test)]` modules are safe from compilation issues since both the function and its call sites are restricted to test builds. The additional feature gate is often used for optional heavy test utilities.
Applied to files:
lib/collection/src/shards/local_shard/mod.rs
📚 Learning: 2025-11-18T18:20:20.021Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7520
File: lib/quantization/src/p_square.rs:41-49
Timestamp: 2025-11-18T18:20:20.021Z
Learning: In lib/quantization/src/p_square.rs, IvanPleshkov prefers using defensive double-checks (e.g., `let Ok(x) = NotNan::new(x) else { return; }`) over `.unwrap()` even when a prior check guarantees success, to avoid panic-causing code and maintain defensive programming practices.
Applied to files:
lib/collection/src/operations/conversions.rs
🧬 Code graph analysis (2)
lib/collection/src/update_workers/update_worker.rs (2)
lib/collection/src/shards/local_shard/indexed_only.rs (1)
get_largest_unindexed_segment_vector_size(80-109)lib/collection/src/shards/local_shard/mod.rs (1)
segments(307-309)
lib/collection/src/shards/local_shard/telemetry.rs (1)
lib/collection/src/shards/local_shard/indexed_only.rs (1)
get_index_only_excluded_vectors(14-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Test Python bindings
- GitHub Check: test-consistency
- GitHub Check: e2e-tests
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: integration-tests
- GitHub Check: lint
🔇 Additional comments (36)
lib/shard/src/segment_holder/mod.rs (1)
106-110: LGTM!Clean addition of
iter_appendable()that follows the existingiter()pattern. The method provides a focused iterator over appendable segments only, which is useful for the optimization coordination logic.lib/api/src/rest/schema.rs (1)
1462-1474: LGTM!The
Eqderivation is appropriate for these types, and the newThreads(usize)variant correctly extendsMaxOptimizationThreadsto accept explicit thread counts. TheFromimplementation at lines 1482-1488 properly handles both variants.lib/api/src/grpc/proto/collections.proto (1)
309-315: LGTM!The
prevent_unoptimizedfield is correctly added with:
- Sequential field number (10)
- Appropriate
optional booltype for a feature defaulting to disabled- Clear documentation explaining the throttling behavior
This aligns well with the PR's goal of preventing large unoptimized segments by delaying updates.
lib/collection/src/shards/local_shard/indexed_only.rs (2)
1-75: LGTM!The
get_index_only_excluded_vectorsfunction correctly:
- Computes threshold from optimizer and HNSW configs
- Iterates segments to identify unindexed vectors exceeding the threshold
- Handles errors gracefully by logging and excluding from counts
- Returns a complete map with zero counts for vectors that have no exclusions
The logic properly distinguishes between indexed and unindexed segments, and the threshold calculation correctly converts KB to bytes.
77-109: LGTM!The
get_largest_unindexed_segment_vector_sizefunction correctly scans all segments to find the maximum size of unindexed vectors. The nestedmax()calls properly aggregate across vectors within segments and then across segments. Error handling is appropriate with logging and safeNonefallback.lib/collection/tests/integration/common/mod.rs (1)
35-35: LGTM!Correctly initializes
prevent_unoptimized: Nonein the test optimizer config, which disables the feature by default in tests. This is consistent with the field's semantics (disabled by default) and other test fixtures in the PR.lib/collection/src/shards/local_shard/mod.rs (2)
18-18: LGTM!New
indexed_onlymodule declaration follows the existing pattern for public modules in this file.
246-265: Threshold computation logic is correct.The implementation properly:
- Only computes the threshold when
prevent_unoptimizedis explicitlySome(true)- Uses the existing
get_indexing_threshold_kb()method to obtain the threshold value- Passes
Nonewhen the feature is disabled (default behavior)This aligns with the PR's goal of throttling updates when unoptimized segments exceed the indexing threshold.
lib/storage/tests/integration/alias_tests.rs (1)
45-45: LGTM!Test configuration correctly initializes the new
prevent_unoptimizedfield toNone, maintaining the default disabled behavior for this integration test.lib/collection/src/tests/fixtures.rs (1)
26-26: LGTM!The test fixture constant correctly includes the new field with
Noneas the default, ensuring existing tests continue to operate with the feature disabled.lib/collection/src/shards/replica_set/update.rs (1)
857-868: LGTM!Test configuration correctly adds the new
prevent_unoptimized: Nonefield, maintaining consistency with other test fixtures in the codebase.lib/collection/benches/batch_query_bench.rs (1)
66-66: LGTM!The new
prevent_unoptimized: Nonefield correctly extends theOptimizersConfigstruct initialization. Setting it toNonemaintains the existing benchmark behavior (feature disabled by default).lib/collection/benches/batch_search_bench.rs (1)
84-84: LGTM!Consistent with the changes in
batch_query_bench.rs, this correctly adds the newprevent_unoptimizedfield initialized toNone.lib/collection/src/optimizers_builder.rs (3)
85-92: LGTM!The new
prevent_unoptimizedfield is well-documented, explaining:
- The throttling behavior when enabled
- The trade-off (increased delay between update submission and application)
- The default (disabled)
Using
Option<bool>with#[serde(default)]correctly handles backward compatibility for existing configurations.
126-132: Good refactoring to extract threshold logic.Extracting
get_indexing_threshold_kb()as a public method encapsulates the threshold computation logic and enables its reuse by other components (e.g., the newindexed_onlymodule and update handler for theprevent_unoptimizedfeature).
134-135: LGTM!The refactored
optimizer_thresholdsnow delegates toget_indexing_threshold_kb(), reducing duplication and ensuring consistent threshold computation across the codebase.lib/collection/src/shards/local_shard/telemetry.rs (2)
7-7: LGTM!Import cleanup correctly reflects the relocation of
get_index_only_excluded_vectorsto theindexed_onlymodule. The unused imports (BYTES_IN_KB,VectorNameBuf) are now handled within that module.Also applies to: 13-13
50-53: Good modularization.The delegation to
indexed_only::get_index_only_excluded_vectorsconsolidates indexing-threshold related logic into a dedicated module. This improves code organization and enables the newprevent_unoptimizedfeature to reuse the same threshold computations.lib/collection/src/operations/conversions.rs (3)
327-361: LGTM!The
prevent_unoptimizedfield is properly destructured from the gRPC type and propagated to the internalOptimizersConfigDiff. The pattern follows the existing field handling in this conversion.
405-416: LGTM!The
prevent_unoptimizedfield is correctly destructured fromOptimizersConfigand propagated to the gRPCOptimizersConfigDiffin theCollectionInfoconversion. The handling is consistent with other optional fields in this struct.Also applies to: 509-520
587-628: LGTM!The
TryFrom<OptimizersConfigDiff> for OptimizersConfigproperly destructures and assignsprevent_unoptimized. The field propagation is consistent with the existing conversion pattern.lib/collection/src/operations/config_diff.rs (6)
105-106: LGTM!Deriving
PartialEqinstead of a manual implementation is cleaner and reduces maintenance burden. TheEqimpl at line 198 properly accompanies this.
161-169: LGTM!The new
prevent_unoptimizedfield is well-documented, explaining the behavior and trade-offs clearly. The#[serde(default)]attribute correctly handles missing fields during deserialization.
171-196: LGTM!The
Hashimplementation correctly includesprevent_unoptimizedin the hashed state, maintaining consistency with the struct's fields. The explicit field destructuring follows the coding guidelines.
248-276: LGTM!The
DiffConfigimplementation correctly uses.or()semantics forprevent_unoptimized, which means the diff value takes precedence whenSome, otherwise the original config value is preserved. This is consistent with how other optional fields are handled.
437-465: LGTM!The
From<OptimizersConfig>implementation correctly propagatesprevent_unoptimizeddirectly since both types useOption<bool>.
540-582: LGTM!Tests are properly updated to initialize
prevent_unoptimized: Nonein the base configurations, ensuring the new field doesn't affect existing test behavior.lib/collection/src/update_handler.rs (4)
13-13: LGTM!Adding the
watchimport is appropriate for the new coordination mechanism between optimization and update workers.
109-113: LGTM!The
prevent_unoptimized_thresholdfield is well-documented, clearly explaining its purpose and behavior.
134-179: LGTM!The constructor properly accepts and stores
prevent_unoptimized_threshold. The initialization follows the existing pattern for other configuration fields.
181-224: LGTM!The watch channel coordination is well-designed:
optimization_finished_sendergoes to the optimization worker to signal completionoptimization_finished_receivergoes to the update worker to wait for signalsoptimization_handlesis shared to allow the update worker to check if optimizations are still runningThis enables the update worker to pause when unoptimized segments exceed the threshold and resume when optimization completes.
lib/collection/src/update_workers/optimization_worker.rs (2)
19-19: LGTM!The
watchimport and newoptimization_finished_senderparameter are properly added to enable coordination with the update worker.Also applies to: 58-58
175-186: LGTM!The sender is cloned and passed to
process_optimizationon each iteration. This is correct sincewatch::SenderimplementsCloneand multiple senders can coexist.lib/collection/src/update_workers/update_worker.rs (3)
9-21: LGTM!The new imports are appropriate for the coordination mechanism:
TokioMutexfor async-safe access to optimization handleswatchfor receiving optimization completion signalsStoppableTaskHandlefor checking optimization task statusget_largest_unindexed_segment_vector_sizefor threshold checking
24-36: LGTM!The function signature is properly extended with the new coordination parameters. The naming is clear and follows existing conventions.
52-68: LGTM!The integration of
wait_for_optimizationbefore processing updates is correct. If waiting fails, the error is properly reported back to the caller via the feedback channel, and the operation is skipped withcontinue.
| }, | ||
| "prevent_unoptimized": { | ||
| "description": "If this option is set, service will try to prevent creation of large unoptimized segments. When enabled, updates may be delayed if there are unoptimized segments larger than indexing threshold. Updates will be resumed when optimization is completed and segments are optimized below the threshold. Using this option may lead to increased delay between submitting an update and its application. Default is disabled.", | ||
| "default": null, | ||
| "type": "boolean", | ||
| "nullable": true | ||
| } |
There was a problem hiding this comment.
Avoid default: null in OptimizersConfigDiff (diff objects shouldn’t imply defaults).
For “diff” schemas, defaults are typically omitted (the field being absent already means “no change”). Keeping default: null may lead some clients to always send null / mis-handle patch semantics. Also apply the same “If set to true …” wording fix here. (Based on learnings, update the REST schema source rather than editing this JSON.)
Proposed schema tweak (apply in generator source, not here)
"prevent_unoptimized": {
- "description": "If this option is set, service will try to prevent creation of large unoptimized segments. When enabled, updates may be delayed if there are unoptimized segments larger than indexing threshold. Updates will be resumed when optimization is completed and segments are optimized below the threshold. Using this option may lead to increased delay between submitting an update and its application. Default is disabled.",
- "default": null,
- "type": "boolean",
- "nullable": true
+ "description": "If set to true, the service will try to prevent creation of large unoptimized segments. Updates may be delayed if there are unoptimized segments larger than the indexing threshold; updates resume once optimization completes and segments are below the threshold.",
+ "type": "boolean",
+ "nullable": true
}🤖 Prompt for AI Agents
In @docs/redoc/master/openapi.json around lines 10221 - 10227, The
OptimizersConfigDiff schema should not include a default for diff fields and the
description should use “If set to true …” wording; update the generator/source
schema that produces OptimizersConfigDiff so the prevent_unoptimized property
omits "default": null (remove any default value) and change its description to
start with "If set to true, service will..." to avoid implying a default or
prompting clients to send null. Ensure changes are made in the REST schema
generator/source (not by editing docs/redoc/master/openapi.json directly) so
future JSON is regenerated correctly.
| // True, if we can proceed with updates | ||
| largest_unoptimized_segment_size <= optimization_threshold |
There was a problem hiding this comment.
Isn't there an edge case where our current update will make it go over the threshold - in which case we have linking again?
Case:
- segment is 9kb
- update adds 2gb
- segment is 11kb and disappears in
indexed_only=truesearch
There was a problem hiding this comment.
yes, the same problem with mass updates, we can't really estimate volume before execution
There was a problem hiding this comment.
Naive suggestion: how about a boolean marker to still include it in indexed_only=true search, even if slightly over the threshold?
E.g. set the flag for updates like this, and unset it for regular updates. Then multiply the full scan threshold by two if the flag is set for this segment. Or just include it unconditionally.
ea85777 to
a6de38c
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
lib/api/src/rest/schema.rs (1)
1469-1474: Validate/define semantics forThreads(usize)(especially0) + document the accepted shapes.With
#[serde(untagged)], this becomes “either"auto"or a number”. That’s good, but please ensureThreads(0)is either rejected or clearly defined, and consider documenting any upper bound/capping behavior (to prevent accidental misconfig that could exhaust resources).Proposed doc-only tweak (keeps wire format, improves OpenAPI)
#[derive(Debug, Deserialize, Serialize, Clone, Copy, PartialEq, Eq, Hash, JsonSchema)] #[serde(untagged)] pub enum MaxOptimizationThreads { + /// Use a predefined setting (e.g. `"auto"`). Setting(MaxOptimizationThreadsSetting), + /// Use an explicit number of optimization threads (must be >= 1). Threads(usize), }lib/collection/src/operations/config_diff.rs (1)
105-196: Critical:OptimizersConfigDiffnow violates theEq/Hashcontract due toOption<f64>+ derivedPartialEq.
Hashusesf64::to_le_bytes, which distinguishes0.0and-0.0, but derivedPartialEqconsiders them equal — this can breakHashMap/HashSetbehavior. (Additionally,Eq+ derivedPartialEqis generally risky withf64because of NaN reflexivity.)Suggestion: restore a manual
PartialEqconsistent with the existingHash(or changeHashto match standardf64hashing semantics).Proposed fix (make `PartialEq` consistent with current `Hash`)
-#[derive(Debug, Deserialize, Serialize, JsonSchema, Validate, Clone, PartialEq)] +#[derive(Debug, Deserialize, Serialize, JsonSchema, Validate, Clone)] pub struct OptimizersConfigDiff { @@ } +impl PartialEq for OptimizersConfigDiff { + fn eq(&self, other: &Self) -> bool { + let Self { + deleted_threshold, + vacuum_min_vector_number, + default_segment_number, + max_segment_size, + #[expect(deprecated)] + memmap_threshold, + indexing_threshold, + flush_interval_sec, + max_optimization_threads, + prevent_unoptimized, + } = self; + + deleted_threshold.map(f64::to_le_bytes) == other.deleted_threshold.map(f64::to_le_bytes) + && vacuum_min_vector_number == &other.vacuum_min_vector_number + && default_segment_number == &other.default_segment_number + && max_segment_size == &other.max_segment_size + && memmap_threshold == &other.memmap_threshold + && indexing_threshold == &other.indexing_threshold + && flush_interval_sec == &other.flush_interval_sec + && max_optimization_threads == &other.max_optimization_threads + && prevent_unoptimized == &other.prevent_unoptimized + } +} + impl std::hash::Hash for OptimizersConfigDiff { fn hash<H: std::hash::Hasher>(&self, state: &mut H) { let Self { deleted_threshold, @@ deleted_threshold.map(f64::to_le_bytes).hash(state); @@ } }
🤖 Fix all issues with AI agents
In `@lib/collection/src/update_workers/update_worker.rs`:
- Around line 52-68: The code currently ignores Err from
Self::wait_for_optimization when sender is None; change the handling so errors
are never silently dropped: after awaiting wait_for_optimization (the call using
prevent_unoptimized_threshold, &segments_clone, optimization_handles.clone(),
&mut optimization_finished_receiver), match on operation_result and for Err(err)
always handle it — if let Some(feedback) = sender then call
feedback.send(Err(err)).unwrap_or_else(|_| log::debug!(...)) as before, else log
the error with log::warn! or log::error! including op_num and err; in both
cases, continue the loop to skip the operation. Ensure you reference
operation_result, sender/feedback, and op_num in the log message.
- Around line 129-183: The comparison in wait_for_optimization uses
optimization_threshold (KB) against get_largest_unindexed_segment_vector_size()
(bytes); convert the threshold to bytes before comparison. Update
wait_for_optimization to compute a byte_threshold from the Option<usize>
optimization_threshold (e.g., using
optimization_threshold.saturating_mul(BYTES_IN_KB>) or similar) and compare
largest_unoptimized_segment_size <= byte_threshold; keep the Option handling and
all other logic unchanged. Ensure references to optimization_threshold in that
function use the converted byte value and use the constant BYTES_IN_KB (or
define it) to prevent overflow with saturating_mul.
♻️ Duplicate comments (2)
docs/redoc/master/openapi.json (2)
7588-7594: ClarifyOption<bool>semantics (“set” vs “set to true”) inprevent_unoptimizeddescription.The current text (“If this option is set…”) is ambiguous for a nullable boolean. Prefer explicitly documenting the
truebehavior and thatfalse/unset disables it. Also, since this file is generated, apply the change in the REST schema source, not in this JSON. (Based on learnings, update the generator source rather than editing this JSON.)Proposed wording (apply in generator source)
- "description": "If this option is set, service will try to prevent creation of large unoptimized segments. When enabled, updates may be delayed if there are unoptimized segments larger than indexing threshold. Updates will be resumed when optimization is completed and segments are optimized below the threshold. Using this option may lead to increased delay between submitting an update and its application. Default is disabled.", + "description": "If set to true, the service will try to prevent creation of large unoptimized segments. Updates may be delayed if there are unoptimized segments larger than the indexing threshold; updates resume once optimization completes and segments are optimized below the threshold. If false or unset, this behavior is disabled.",
10221-10227: Avoiddefault: nullfor diff/patch fields inOptimizersConfigDiff.For “diff” schemas, omitting the field already means “no change”; a
default: nullcan encourage clients to sendnulland muddle patch semantics. As above, fix in the schema generator source, not directly in this JSON. (Based on learnings, update the generator source rather than editing this JSON.)Proposed schema tweak (apply in generator source)
"prevent_unoptimized": { - "description": "If this option is set, service will try to prevent creation of large unoptimized segments. When enabled, updates may be delayed if there are unoptimized segments larger than indexing threshold. Updates will be resumed when optimization is completed and segments are optimized below the threshold. Using this option may lead to increased delay between submitting an update and its application. Default is disabled.", - "default": null, + "description": "If set to true, the service will try to prevent creation of large unoptimized segments. Updates may be delayed if there are unoptimized segments larger than the indexing threshold; updates resume once optimization completes and segments are optimized below the threshold. If false or unset, this behavior is disabled.", "type": "boolean", "nullable": true }
🧹 Nitpick comments (4)
lib/api/src/rest/schema.rs (2)
1462-1467: Add doc comments for schema clarity (OpenAPI generation).Deriving
Eqhere is fine; but since this type is part of the REST schema, please add a short doc comment explaining whatautodoes (it’ll show up in generated OpenAPI). Based on learnings, OpenAPI is generated from this file’s docs.
1476-1489: Conversion impls look correct; consider addingFrom<usize>for ergonomics.
DefaultandFrom<MaxOptimizationThreads> for Option<usize>are clear and exhaustive. If you see repetitiveMaxOptimizationThreads::Threads(n)construction at call sites, addingimpl From<usize> for MaxOptimizationThreadscan tidy usage.Optional ergonomic addition
impl From<MaxOptimizationThreads> for Option<usize> { fn from(value: MaxOptimizationThreads) -> Self { match value { MaxOptimizationThreads::Setting(MaxOptimizationThreadsSetting::Auto) => None, MaxOptimizationThreads::Threads(threads) => Some(threads), } } } + +impl From<usize> for MaxOptimizationThreads { + fn from(value: usize) -> Self { + MaxOptimizationThreads::Threads(value) + } +}lib/collection/src/optimizers_builder.rs (1)
86-93: Clarify semantics ofprevent_unoptimized: Option<bool>(tri-state) vs “enabled/disabled” wording.Current docs read like a boolean flag, but the type allows
None | Some(false) | Some(true). Consider either:
- changing the field to
bool(defaultfalse) inOptimizersConfig, keepingOption<bool>only in the diff type, or- tightening wording to “when set to
true…” (and document thatNone/falsemeans disabled).lib/collection/src/update_workers/optimization_worker.rs (1)
44-60:optimization_finished_senderis effectively a “wake-up” signal, not strictly “optimization finished”.The callback you attach is invoked in multiple situations (not only actual optimization completion), so the “optimization is finished” comment is misleading. Consider renaming to something like
optimizer_wakeup_sender/optimization_progress_sender, or at least adjust the comment to reflect “notify waiters to re-check” semantics.Also applies to: 176-187, 224-256
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
docs/redoc/master/openapi.jsonlib/api/src/grpc/proto/collections.protolib/api/src/grpc/qdrant.rslib/api/src/rest/schema.rslib/collection/benches/batch_query_bench.rslib/collection/benches/batch_search_bench.rslib/collection/src/operations/config_diff.rslib/collection/src/operations/conversions.rslib/collection/src/optimizers_builder.rslib/collection/src/shards/local_shard/indexed_only.rslib/collection/src/shards/local_shard/mod.rslib/collection/src/shards/local_shard/telemetry.rslib/collection/src/shards/replica_set/update.rslib/collection/src/tests/fixtures.rslib/collection/src/update_handler.rslib/collection/src/update_workers/optimization_worker.rslib/collection/src/update_workers/update_worker.rslib/collection/tests/integration/common/mod.rslib/shard/src/segment_holder/mod.rslib/storage/tests/integration/alias_tests.rs
🚧 Files skipped from review as they are similar to previous changes (9)
- lib/shard/src/segment_holder/mod.rs
- lib/storage/tests/integration/alias_tests.rs
- lib/collection/src/shards/local_shard/telemetry.rs
- lib/collection/benches/batch_search_bench.rs
- lib/collection/src/update_handler.rs
- lib/api/src/grpc/proto/collections.proto
- lib/api/src/grpc/qdrant.rs
- lib/collection/tests/integration/common/mod.rs
- lib/collection/src/operations/conversions.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicitSomeType::from(x)over implicitx.into()in Rust
In new code, don't usetransmute_from_u8,transmute_to_u8,transmute_from_u8_to_slice,transmute_from_u8_to_mut_slice,transmute_to_u8_slice- usebytemuckorzerocopycrates instead
Prefer explicit exhaustive matches over catch-all_arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using: _over using..in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)
Files:
lib/collection/src/update_workers/update_worker.rslib/collection/benches/batch_query_bench.rslib/collection/src/update_workers/optimization_worker.rslib/collection/src/shards/local_shard/indexed_only.rslib/collection/src/optimizers_builder.rslib/collection/src/shards/local_shard/mod.rslib/api/src/rest/schema.rslib/collection/src/tests/fixtures.rslib/collection/src/shards/replica_set/update.rslib/collection/src/operations/config_diff.rs
🧠 Learnings (13)
📚 Learning: 2025-09-01T11:42:06.964Z
Learnt from: timvisee
Repo: qdrant/qdrant PR: 7157
File: lib/shard/src/segment_holder/mod.rs:808-814
Timestamp: 2025-09-01T11:42:06.964Z
Learning: In Qdrant's segment holder, panicking when no segments exist during flush_all is intentional and preferred over graceful error handling, as having zero segments could permanently corrupt the WAL by acknowledging u64::MAX. The maintainers consider this condition impossible and use the panic as a fail-fast safety mechanism to prevent data corruption.
Applied to files:
lib/collection/src/update_workers/update_worker.rslib/collection/src/update_workers/optimization_worker.rs
📚 Learning: 2025-05-07T09:13:47.781Z
Learnt from: KShivendu
Repo: qdrant/qdrant PR: 6352
File: lib/segment/src/id_tracker/immutable_id_tracker.rs:379-393
Timestamp: 2025-05-07T09:13:47.781Z
Learning: In the Qdrant codebase, prefer fail-fast error handling by returning explicit errors rather than silently continuing with potentially corrupted state, especially in components like ID trackers that are fundamental to data integrity.
Applied to files:
lib/collection/src/update_workers/update_worker.rs
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR `#7048`, using checked_mul with u64 casting to prevent silent overflow.
Applied to files:
lib/collection/src/update_workers/optimization_worker.rslib/collection/src/shards/local_shard/indexed_only.rs
📚 Learning: 2025-09-01T11:19:26.371Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7193
File: lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs:17-30
Timestamp: 2025-09-01T11:19:26.371Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs, the ChunkedMmapVectors underlying implementation does not validate against zero-sized vectors, so adding such validation in QuantizedChunkedMmapStorage::new is unnecessary and would be redundant.
Applied to files:
lib/collection/src/shards/local_shard/indexed_only.rs
📚 Learning: 2025-08-15T11:41:10.629Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
Applied to files:
lib/collection/src/shards/local_shard/indexed_only.rs
📚 Learning: 2025-08-11T00:37:34.100Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.
Applied to files:
lib/collection/src/shards/local_shard/indexed_only.rs
📚 Learning: 2025-08-21T13:45:05.899Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.899Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.
Applied to files:
lib/collection/src/shards/local_shard/indexed_only.rs
📚 Learning: 2025-06-14T20:35:10.603Z
Learnt from: generall
Repo: qdrant/qdrant PR: 6635
File: lib/segment/src/index/field_index/full_text_index/text_index.rs:784-832
Timestamp: 2025-06-14T20:35:10.603Z
Learning: Functions gated with `#[cfg(feature = "testing")]` within `#[cfg(test)]` modules are safe from compilation issues since both the function and its call sites are restricted to test builds. The additional feature gate is often used for optional heavy test utilities.
Applied to files:
lib/collection/src/shards/local_shard/mod.rs
📚 Learning: 2025-08-10T18:26:12.443Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: docs/redoc/master/openapi.json:13645-13652
Timestamp: 2025-08-10T18:26:12.443Z
Learning: In qdrant, docs/redoc/master/openapi.json is auto-generated from the REST schemas. To change field docs, edit lib/api/src/rest/schema.rs (e.g., add doc comments or #[schemars(description = ...)]). Specifically, UpdateVectors.update_filter lacked a description and should state: "If specified, only update vectors for points that match this filter; points not matching the filter are left unchanged."
Applied to files:
lib/api/src/rest/schema.rsdocs/redoc/master/openapi.json
📚 Learning: 2025-08-10T18:25:16.206Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: docs/redoc/master/openapi.json:10626-10634
Timestamp: 2025-08-10T18:25:16.206Z
Learning: In Qdrant, docs/redoc/master/openapi.json is autogenerated from the Rust REST schema definitions (e.g., lib/api/src/rest/schema.rs). To change OpenAPI descriptions, update the Rust struct field doc comments/attributes rather than editing the JSON directly.
Applied to files:
lib/api/src/rest/schema.rsdocs/redoc/master/openapi.json
📚 Learning: 2025-11-25T11:56:33.176Z
Learnt from: CR
Repo: qdrant/qdrant PR: 0
File: .github/review-rules.md:0-0
Timestamp: 2025-11-25T11:56:33.176Z
Learning: Applies to **/*.rs : Prefer explicit field ignoring using `: _` over using `..` in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)
Applied to files:
lib/collection/src/tests/fixtures.rs
📚 Learning: 2025-08-10T18:26:33.017Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: docs/redoc/master/openapi.json:10393-10401
Timestamp: 2025-08-10T18:26:33.017Z
Learning: In qdrant, docs/redoc/master/openapi.json is auto-generated. Any description changes must be made in the source (e.g., lib/api/src/rest/schema.rs and/or lib/api/src/grpc/proto/points.proto), not edited directly in the JSON.
Applied to files:
docs/redoc/master/openapi.json
📚 Learning: 2025-12-22T16:30:33.760Z
Learnt from: coszio
Repo: qdrant/qdrant PR: 7631
File: docs/redoc/master/openapi.json:11774-11781
Timestamp: 2025-12-22T16:30:33.760Z
Learning: Qdrant’s OpenAPI (docs/redoc/master/openapi.json) is generated via schemars, and the project accepts the pattern anyOf: [ { "$ref": ... }, { "nullable": true } ] for nullable references because their autogenerated clients handle it. Prefer treating suggestions to switch to allOf + nullable as optional/non-blocking.
Applied to files:
docs/redoc/master/openapi.json
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: lint
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: Test Python bindings
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consistency
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: integration-tests-consensus
- GitHub Check: e2e-tests
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests
🔇 Additional comments (8)
lib/collection/src/shards/local_shard/indexed_only.rs (2)
80-109: LGTM!This function correctly consumes the inner iterator with
.max()before thesegment_guardgoes out of scope, avoiding any lifetime issues.
30-59: This code is safe and does not have a lifetime issue.The
segment_guardis properly held for the entire duration of the inner iterator's consumption.flat_mapforces immediate consumption of each returned iterator before moving to the next outer element, and the.for_each()call ensures the complete iterator chain is consumed whilesegment_guardremains in scope. This pattern is correct and idiomatic Rust.lib/collection/src/shards/local_shard/mod.rs (2)
18-18: LGTM!New module export for
indexed_onlyhelpers is appropriately placed.
246-265: LGTM!The threshold computation is correct and idiomatic. When
prevent_unoptimizedisSome(true), the indexing threshold is passed to theUpdateHandler; otherwiseNoneis passed, disabling the feature.lib/collection/src/tests/fixtures.rs (1)
16-27: LGTM!Test fixture correctly initializes the new
prevent_unoptimizedfield toNone, keeping the feature disabled by default in tests.lib/collection/benches/batch_query_bench.rs (1)
54-74: LGTM!Benchmark correctly initializes
prevent_unoptimized: Noneto avoid throttling during performance measurements.lib/collection/src/shards/replica_set/update.rs (1)
857-868: LGTM!Test configuration correctly adds the new
prevent_unoptimizedfield withNonedefault.lib/collection/src/optimizers_builder.rs (1)
126-148: Nice refactor: centralizing indexing-threshold normalization reduces drift.
get_indexing_threshold_kb()makes the “None → default / 0 → disabled” logic reusable and avoids repeating the match in multiple places.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| let operation_result = Self::wait_for_optimization( | ||
| prevent_unoptimized_threshold, | ||
| &segments_clone, | ||
| optimization_handles.clone(), | ||
| &mut optimization_finished_receiver, | ||
| ) | ||
| .await; | ||
|
|
||
| if let Err(err) = operation_result | ||
| && let Some(feedback) = sender | ||
| { | ||
| feedback.send(Err(err)).unwrap_or_else(|_| { | ||
| log::debug!("Can't report operation {op_num} result. Assume already not required"); | ||
| }); | ||
| continue; | ||
| }; | ||
|
|
There was a problem hiding this comment.
Don’t silently ignore wait_for_optimization() errors when there’s no feedback channel.
Right now, failures from wait_for_optimization() are only acted on when sender.is_some(). If sender is None, we proceed without even logging the failure. Consider at least logging at warn/error, or unconditionally skipping the operation (depending on intended fail-open vs fail-closed behavior).
🤖 Prompt for AI Agents
In `@lib/collection/src/update_workers/update_worker.rs` around lines 52 - 68, The
code currently ignores Err from Self::wait_for_optimization when sender is None;
change the handling so errors are never silently dropped: after awaiting
wait_for_optimization (the call using prevent_unoptimized_threshold,
&segments_clone, optimization_handles.clone(), &mut
optimization_finished_receiver), match on operation_result and for Err(err)
always handle it — if let Some(feedback) = sender then call
feedback.send(Err(err)).unwrap_or_else(|_| log::debug!(...)) as before, else log
the error with log::warn! or log::error! including op_num and err; in both
cases, continue the loop to skip the operation. Ensure you reference
operation_result, sender/feedback, and op_num in the log message.
|
After fixing the missing conversion on the threshold, we get a better throughput. previous #7643 (comment)
|
Co-authored-by: Tim Visée <tim+github@visee.me>
timvisee
left a comment
There was a problem hiding this comment.
Looks good.
We are aware the blinking problem still exists in this PR, but agreed to work on that in a separate PR to not stall this one further.
* wip: wait for optimization before applying update * implement api parameter * nits * fix deadloack when no optimizers are running * release handle mutex * Derive PartialEq * Remove unused function * fix missing kb conversion in threshold * Update lib/api/src/grpc/proto/collections.proto Co-authored-by: Tim Visée <tim+github@visee.me> * sync comment change --------- Co-authored-by: Arnaud Gourlay <arnaud.gourlay@gmail.com> Co-authored-by: timvisee <tim@visee.me> Co-authored-by: Tim Visée <tim+github@visee.me>
Motivation
Large amount of updates, especially in bulk/spiky load pattern can temporary create a large amount of unindexed vectors.
Those unindexed vectors may create higher latencies of the search requests, even though the application might not need just-in-time updates.
In order to prevent those latencies, qdrant already have
indexed_onlyparameter in the query. So it is covered from the search point of view, there is, however, a side-effect that updated but not yet indexed points might disappear from the search results. This PR addresses this problem.To prevent disappearence of the updated points from the search results, we can throttle the update operation to match the indexation speed. In other words we can delay the updates, if index is not able to keep up. In this configuration we will always have all points available in search, but the update operations migth take more time to be applied.
Implementation
Itroduce a new parameter insidestrict_mode(?) configuration, which would introduce an additional threshold:Introduce a new parameter inside optimizers config, which would enable or disable unoptimized updates.
Threshold is the same as indexing threshold (it seems there are no benefit in starting optimization earlier, as we would need to merge segments anyway)
ToDo:
strict-modeoptimizers parameterValidate that indexing threshold is larger that strcict mode parameter (otherwise we might deadlock updates)To consider