Conversation
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (31)
lib/collection/src/shards/local_shard/shard_ops.rs (1)
165-168: Minor style consistency: pick a convention for From vs Into at call sites.This file now uses
CollectionInfo::from(...), while other code (e.g., in operations/types.rs, Line 258+) uses.into()for field conversions. Consider standardizing: commonly, use.into()when the target type is evident from context (return type or field type), andT::from(x)when making the target explicit improves readability. Not a blocker.lib/collection/tests/integration/snapshot_recovery_test.rs (1)
40-49: Exercise metadata persistence through snapshot/restore (set non-empty metadata and assert).Right now the test initializes
metadata: None. Since this PR introduces collection-level metadata, this test is a perfect place to cover persistence across snapshot and recovery. Recommend setting a small, non-empty metadata object and (if an accessor is available) asserting it round-trips correctly in the recovered collection.Apply this minimal change to cover the write-path; I can follow up with an accessor-based assertion once you point me to the public way to read collection metadata in tests:
uuid: None, - metadata: None, + metadata: Some(segment::types::Payload( + serde_json::Map::from_iter([ + ("any_values".to_string(), serde_json::json!("go-like-this")), + ("from".to_string(), serde_json::json!("snapshot_recovery_test")), + ]) + )),Optionally, add an assertion after loading
recovered_collectionto confirm metadata equality via the appropriate API (e.g., collection info/config). I can wire this up if you prefer.lib/collection/src/tests/points_dedup.rs (1)
56-65: LGTM; consider setting non-empty metadata to increase coverage.Setting
metadata: Noneis correct for compiling against the new field. If you want to cover that metadata does not interfere with dedup flows, you could set a tiny metadata object here as well (no behavior change expected):uuid: None, - metadata: None, + metadata: Some(segment::types::Payload( + serde_json::Map::from_iter([("suite".to_string(), serde_json::json!("points_dedup"))]) + )),src/consensus.rs (1)
1536-1544: Cover consensus path with non-empty metadata in the CreateCollection test.Great that the test was updated for the new field. To ensure metadata is actually propagated through the consensus log and applied, set a non-empty value here. This adds end-to-end coverage with nearly zero overhead.
strict_mode_config: None, uuid: None, - metadata: None, + metadata: Some(segment::types::Payload( + serde_json::Map::from_iter([ + ("test_case".to_string(), serde_json::json!("consensus")), + ("rev".to_string(), serde_json::json!(1)), + ]) + )),Optional: after the op is committed, fetch the collection config (via whatever accessor is canonical in tests) and assert
metadatamatches what you set. I can add that snippet once you point me to the preferred accessor.lib/segment/src/types.rs (1)
1672-1675: Deriving Hash for Payload: verify order-insensitive, version-stable hashing; add a regression test.Deriving
HashonPayload(Map<String, Value>)is convenient, but two concerns to sanity-check:
- Compatibility: ensure the repo’s
serde_jsonversion implementsHashforValue/Map(this has changed over time). If not guaranteed across all supported build targets, compilation may fail or hash semantics may differ.- Semantics: JSON object equality is order-insensitive. Hash must be consistent with
Eq. Please verify that the derivedHashfor the underlyingserde_json::Mapis order-insensitive in the version you use.Suggested guardrail (additive test) to ensure invariants hold across upgrades:
// place under the existing tests module in this file #[test] fn test_payload_hash_order_invariance() { use std::collections::hash_map::DefaultHasher; use std::hash::{Hash, Hasher}; fn h<T: Hash>(t: &T) -> u64 { let mut s = DefaultHasher::new(); t.hash(&mut s); s.finish() } let p1 = payload_json! { "a": 1, "b": 2 }; let p2 = payload_json! { "b": 2, "a": 1 }; assert_eq!(p1, p2, "Eq should ignore object key order"); assert_eq!(h(&p1), h(&p2), "Hash must be consistent with Eq"); }If the above fails on any CI target, consider an explicit
impl Hash for Payloadthat canonicalizes object keys before hashing (or rely on your existing StableHash machinery for config hashing).lib/collection/tests/integration/common/mod.rs (1)
58-58: LGTM: new metadata field initialized to None in fixtureThis aligns with the new Option on CollectionConfigInternal and keeps old configs backward-compatible due to serde defaults.
If you want an integration assertion for round-trip persistence, I can add a small test that writes/loads the config and verifies metadata remains unset/omitted.
lib/collection/src/operations/verification/mod.rs (1)
514-514: LGTM: metadata: None added to strict-mode test fixtureKeeps the fixture compiling against the extended config without affecting the strict-mode behavior under test.
If useful, we can also add a variant of this fixture with non-empty metadata to ensure strict-mode checks remain orthogonal to metadata presence.
lib/storage/tests/integration/alias_tests.rs (1)
124-124: Consider exercising metadata end-to-end by setting a non-empty valueSetting metadata to None compiles, but adding a small non-empty map here would give us an end-to-end signal that alias ops and collection creation paths accept and persist metadata correctly. For example:
- metadata: None, + metadata: Some(serde_json::json!({"env": "test", "suite": "alias"})),And add this import at the top of the file (outside the shown hunk):
use serde_json::json;Optionally, fetch the collection config after creation and assert the metadata is present.
lib/collection/src/tests/fixtures.rs (1)
53-53: Expose a metadata-aware fixture to reduce repetition across testsRight now all fixtures pin metadata: None. Consider adding a helper that accepts an Option so tests that need to validate metadata behavior don’t have to re-implement config construction.
Example (new function outside this hunk):
pub fn create_collection_config_with_dim_and_metadata( dim: usize, metadata: Option<segment::types::Payload>, ) -> CollectionConfigInternal { let mut cfg = create_collection_config_with_dim(dim); cfg.metadata = metadata; cfg }This keeps current call sites unchanged while enabling targeted tests to pass metadata when needed.
lib/collection/src/config.rs (1)
235-240: Metadata Integration Verified – Consider Size Validation & Anonymization Checks• State-management now tracks
metadatachanges viais_metadata_updatedinlib/collection/src/collection/state_management.rs, so config-change detection properly accounts for the new field.
• Storage-layer conversions inlib/storage/src/content_manager/conversions.rsandlib/storage/src/content_manager/collection_meta_ops.rspass through the metadata payload as intended.
• API surface inlib/api/src/grpc/proto/collections.protodocuments and exposes merge semantics formetadata(e.g., “will be merged with already stored metadata”).
• Telemetry code inlib/collection/src/telemetry.rsincludesmetadatawith an#[anonymize(value = None)]annotation—please confirm that the anonymization logic covers nested JSON values and prevents any PII leakage.Optional improvements:
- Enforce a soft size limit on the
metadatapayload during create/update (e.g., via request validation) to prevent large blobs from bloating configs.- If desired, I can provide a reusable validation helper and integrate it into the relevant handlers.
lib/collection/src/tests/snapshot_test.rs (1)
50-59: Test config updated withmetadata: None— consistent with new field.This preserves existing behavior while keeping tests compiling. Consider adding a follow-up test that sets non-empty metadata, creates a snapshot, restores, and asserts metadata round-trips. This would harden persistence guarantees.
Would you like me to draft such a test?
lib/collection/src/collection/collection_ops.rs (1)
166-174: Prefer concise guard binding; explicit type annotation is noisy.Use the same style as other methods in this impl for consistency.
- let mut collection_config_guard: tokio::sync::RwLockWriteGuard< - '_, - crate::config::CollectionConfigInternal, - > = self.collection_config.write().await; + let mut config = self.collection_config.write().await;tests/openapi/test_collection_metadata.py (1)
16-49: Great coverage of merge/override; consider a clearer name and a nested-merge case.
- Rename to reflect intent, e.g.,
test_collection_metadata_patch_merge.- Add a case with nested metadata to pin down deep-merge behavior (and prevent regressions).
Here’s an additional test you can drop into this module:
def test_collection_metadata_nested_merge(collection_name): # set nested dict assert request_with_validation( api="/collections/{collection_name}", method="PATCH", path_params={"collection_name": collection_name}, body={"metadata": {"nested": {"a": 1}}} ).ok # merge another key into the same nested object assert request_with_validation( api="/collections/{collection_name}", method="PATCH", path_params={"collection_name": collection_name}, body={"metadata": {"nested": {"b": 2}}} ).ok res = request_with_validation( api="/collections/{collection_name}", method="GET", path_params={"collection_name": collection_name}, ) assert res.ok meta = res.json()["result"]["config"]["metadata"] assert meta.get("nested") in ({"a": 1, "b": 2}, {"b": 2, "a": 1})Also consider a persistence check (patch → restart node → GET) in the E2E suite once
update_metadatapersists to disk.lib/storage/src/content_manager/toc/collection_meta_ops.rs (1)
178-180: Clarify and document metadata merge semantics
The call tocollection.update_metadata(metadata)invokesPayload::merge, but the behavior around key removal and empty‐map handling isn’t specified. Please verify and explicitly document:
- Removal semantics: does sending a key with value
nulldelete it, or merely set its value to JSONnull? Is there support for an explicit “remove” list?- Empty map
{}handling: does it act as a no-op, clear all existing metadata, or result in an error?- Omitted field behavior: what happens if the client omits the
metadatafield entirely?Consider adding optional guardrails at the API boundary:
- Enforce a maximum metadata size or nested depth to prevent oversized blobs from impacting snapshot size and replication latency.
- Introduce a “strict mode” toggle that validates metadata payloads before applying.
Location for clarification and potential refactor:
• lib/storage/src/content_manager/toc/collection_meta_ops.rs lines 178–180Let me know if you’d like me to open a follow-up PR to implement explicit delete semantics and size validation.
lib/collection/src/operations/types.rs (1)
198-209: Internal→public conversion forwards metadataThe conversion remains explicit and compiler-checked. This ensures
GET /collections/{name}reflects metadata. Consider adding an integration test that:
- creates with metadata,
- patches (merge),
- verifies
CollectionInfo.config.metadatamatches expectations.lib/collection/src/collection/state_management.rs (1)
138-155: Snapshot apply: treat metadata change as config change without optimizer recreation — correct tradeoff
- Adding
metadatato the destructure and comparing it (is_metadata_updated) is correct.- Including it in
is_config_updatedtriggers a persisted config save, while keepingrecreate_optimizerstied to “core” config is the right balance.Suggestion (optional): add a small comment explaining that metadata-only updates intentionally avoid optimizer recreation, to preempt future “why didn’t optimizers restart?” questions.
lib/api/src/grpc/proto/collections.proto (3)
413-414: CreateCollection.metadata: clarify merge and deletion semantics in commentsTo avoid client confusion, tighten the field docs to specify merge and delete semantics. Example patch:
- map<string, Value> metadata = 18; // Arbitrary JSON metadata for the collection + // Arbitrary JSON metadata for the collection. + // Semantics: + // - On create: stored as-is. + // - On update (see UpdateCollection.metadata): server MERGES keys into existing metadata. + // To delete a key, set its value to null (if supported by Value) or use an explicit remove API if available. + map<string, Value> metadata = 18;
426-427: UpdateCollection.metadata: specify exact merge behavior and key removal approachThe comment mentions “merged,” but clients need exact rules. Suggest expanding:
- map<string, Value> metadata = 10; // Arbitrary JSON-like metadata for the collection, will be merged with already stored metadata + // Arbitrary JSON-like metadata for the collection. Merge behavior: + // - Keys present in the request overwrite existing keys. + // - Keys absent in the request remain unchanged. + // - To remove a key, set its value to null (if Value supports null); otherwise provide a dedicated remove-list in the future. + map<string, Value> metadata = 10;Also consider noting any practical size limits (e.g., max total bytes), if enforced.
If null-based deletion is intended, please confirm that
Valuesupports null. If not, we may need an explicit removal mechanism or a special sentinel.
466-467: CollectionConfig.metadata: expose but keep docs consistent with other surfacesMake the docs consistent with
CreateCollection/UpdateCollectionto avoid ambiguity:- map<string, Value> metadata = 7; // Arbitrary JSON metadata for the collection + // Arbitrary JSON metadata for the collection. Returned as stored after merges from updates. + map<string, Value> metadata = 7;lib/collection/src/operations/conversions.rs (1)
1948-2047: Minor: consider preserving the “empty vs none” distinction (optional).For
TryFrom<api::grpc::qdrant::CollectionConfig>, empty metadata is converted toNone. If elsewhere you intend to differentiate “unset” from “explicitly empty”, consider mapping empty toSome(empty)for symmetry. Not critical for info/get paths.lib/storage/src/content_manager/collection_meta_ops.rs (1)
263-267: UpdateCollection comment promises “empty object clears” — keep code aligned.This is the desired API contract. After adopting the conversion fix in conversions.rs, confirm the update application logic treats
Some(empty)as “clear”. Consider adding a unit/integration test.src/migrations/single_to_cluster.rs (1)
118-121: Avoid potential panic on missing shard_id in shards map.
unwrap()will panic ifshard_idsandshardsever diverge. Safer to useexpectwith context or skip with a warning to keep migration resilient.Apply this diff:
- for shard_id in shard_ids { - let shard_info = shards.get(shard_id).unwrap(); + for shard_id in shard_ids { + let shard_info = shards.get(shard_id).expect( + "inconsistent state: shard_id from key mapping not found in shards", + );Alternatively, log-and-continue if you prefer best-effort migration.
docs/redoc/master/openapi.json (6)
6725-6735: CollectionConfig.metadata: addition looks good; tighten wording and add an exampleGreat addition. Minor docs nits and a small example will improve clarity.
Apply this diff:
"metadata": { - "description": "Arbitrary JSON metadata for the collection This can be used to store application-specific information such as creation time, migration data, inference model info, etc.", + "description": "Arbitrary JSON metadata for the collection. This can be used to store application-specific information such as creation time, migration data, inference model info, etc.", + "example": { + "owner": "ml-team", + "purpose": "experiments-2025-08", + "any_values": "go-like-this" + }, "anyOf": [ { "$ref": "#/components/schemas/Payload" }, { "nullable": true } ] }
7491-7497: New Payload schema: add a brief description for discoverabilitySchema is fine; adding a one-liner helps readers understand it when jumping to the definition.
Apply this diff:
"Payload": { - "type": "object", + "type": "object", + "description": "Free-form JSON object used for point payloads and collection-level metadata.", "additionalProperties": true, "example": { "city": "London", "color": "green" } },
9735-9745: CreateCollection.metadata: mirror wording and provide exampleMatch the description style and include an example consistent with the PATCH use-case from the PR description.
Apply this diff:
"metadata": { - "description": "Arbitrary JSON metadata for the collection This can be used to store application-specific information such as creation time, migration data, inference model info, etc.", + "description": "Arbitrary JSON metadata for the collection. This can be used to store application-specific information such as creation time, migration data, inference model info, etc.", + "example": { + "any_values": "go-like-this" + }, "anyOf": [ { "$ref": "#/components/schemas/Payload" }, { "nullable": true } ] }Additionally, consider adding a request example to the Create Collection endpoint’s requestBody to showcase metadata usage. I can draft it if you’d like.
10116-10126: UpdateCollection.metadata: clarify merge semantics and unchanged/clear behavior; add exampleThe description mentions merge semantics but doesn’t specify whether the merge is deep vs shallow, and how to keep unchanged vs clear keys. Also, it says “set it to an empty object” to remove metadata; given the schema allows null, clarify whether null clears or should be ignored.
- Please confirm:
- Is the merge deep (recursive) or shallow (one level)?
- Does
"metadata": nullclear all metadata, or should clients send{}to clear?- How do clients delete a single key (e.g., remove only
"foo")—is sending"foo": nullsupported, or must they send the full object without"foo"?Apply this wording tweak and add a concrete example:
"metadata": { - "description": "Metadata to update for the collection. If provided, this will merge with existing metadata. To remove metadata, set it to an empty object.", + "description": "Metadata to update for the collection. If provided, this will merge with existing metadata. To clear all metadata, set this field to an empty object ({}). Omit this field to leave metadata unchanged.", + "example": { + "release": "v1.2.3", + "notes": "backfill completed" + }, "anyOf": [ { "$ref": "#/components/schemas/Payload" }, { "nullable": true } ] }Optionally, add an endpoint-level example to PATCH /collections/{collection_name} requestBody showing:
{ "metadata": { "any_values": "go-like-this" } }
11975-11985: CollectionConfigTelemetry.metadata: align wordingMinor punctuation to match the other occurrences.
Apply this diff:
"metadata": { - "description": "Arbitrary JSON metadata for the collection", + "description": "Arbitrary JSON metadata for the collection.", "anyOf": [ { "$ref": "#/components/schemas/Payload" }, { "nullable": true } ] }
10116-10126: Add endpoint-level examples to PATCH /collections/{collection_name}The PR description promises a PATCH example; adding an OpenAPI example improves SDK generation and docs.
Add this block under the PATCH requestBody.content.application/json (outside this exact hunk):
"examples": { "set_metadata": { "summary": "Set collection-level metadata", "value": { "metadata": { "any_values": "go-like-this" } } }, "clear_metadata": { "summary": "Clear all collection metadata", "value": { "metadata": {} } } }I can prepare a precise diff against the PATCH operation if preferred.
lib/api/src/grpc/qdrant.rs (3)
811-814: CreateCollection.metadata: add guardrails and document constraintsThe field/tag placement looks correct (new tag 18, no collisions). Before this ships broadly:
Add server-side limits for metadata (max total bytes, max keys, max nesting) and fail fast with Status::invalid_argument if exceeded. This prevents unbounded memory usage on create.
Clarify allowed types in docs (IntegerValue vs DoubleValue semantics, disallow NaN/inf, max int range). If NaN/inf are possible in DoubleValue, explicitly reject them.
Consider reserving a namespace (e.g., qdrant.*) for potential future internal keys to avoid user collisions.
If you want, I can sketch a validator hook in the create path that walks Value with a byte/depth budget.
858-861: UpdateCollection.metadata: merge semantics under-specified; consider explicit patch modelThe comment promises a merge, but proto3 map presence makes it impossible to distinguish “field absent” from “present but empty” and there’s no clear way to remove keys. NullValue could mean “set to null” or “delete key”.
Suggest introducing an explicit patch model to avoid ambiguity:
- Add oneof metadata_op { map<string, Value> upsert = X; RepeatedString remove_keys = Y; bool clear = Z; }
- Alternatively, a MetadataPatch { map<string, Value> set; repeated string unset; bool clear; } nested under UpdateCollection.
This gives unambiguous upsert/unset/clear semantics, and “absent” means “no change”.
If you prefer to keep the single map, please document:
- whether sending {} means “no change” or “clear all,”
- whether NullValue deletes a key or sets it to null,
- and how nested StructValue merges vs overwrites.
Happy to draft the proto change if you want to go this route.
963-966: CollectionConfig.metadata in responses/telemetry: avoid accidental PII leakageSurfacing arbitrary user metadata in CollectionConfig means it may be logged, exported via telemetry, or scraped by tooling. Recommend:
- Redact or opt-out: ensure telemetry export paths either redact metadata by default or gate it behind a config/feature flag.
- Size caps in responses: consider truncating oversized metadata in CollectionInfo/Config responses to protect memory usage and client UX.
- Document the risk in API docs, and discourage storing secrets/tokens under metadata.
Please verify where CollectionConfig is serialized to JSON or exported (e.g., telemetry). If it is, add a redaction path there.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (33)
docs/grpc/docs.md(6 hunks)docs/redoc/master/openapi.json(5 hunks)lib/api/src/grpc/proto/collections.proto(4 hunks)lib/api/src/grpc/qdrant.rs(4 hunks)lib/collection/benches/batch_query_bench.rs(1 hunks)lib/collection/benches/batch_search_bench.rs(1 hunks)lib/collection/src/collection/collection_ops.rs(2 hunks)lib/collection/src/collection/state_management.rs(1 hunks)lib/collection/src/config.rs(2 hunks)lib/collection/src/operations/conversions.rs(4 hunks)lib/collection/src/operations/types.rs(3 hunks)lib/collection/src/operations/verification/mod.rs(1 hunks)lib/collection/src/shards/local_shard/shard_ops.rs(1 hunks)lib/collection/src/shards/replica_set/update.rs(1 hunks)lib/collection/src/shards/shard_holder/shard_mapping.rs(1 hunks)lib/collection/src/telemetry.rs(4 hunks)lib/collection/src/tests/fixtures.rs(1 hunks)lib/collection/src/tests/points_dedup.rs(1 hunks)lib/collection/src/tests/query_prefetch_offset_limit.rs(1 hunks)lib/collection/src/tests/snapshot_test.rs(1 hunks)lib/collection/tests/integration/common/mod.rs(1 hunks)lib/collection/tests/integration/multi_vec_test.rs(1 hunks)lib/collection/tests/integration/snapshot_recovery_test.rs(1 hunks)lib/segment/src/types.rs(1 hunks)lib/storage/src/content_manager/collection_meta_ops.rs(6 hunks)lib/storage/src/content_manager/conversions.rs(5 hunks)lib/storage/src/content_manager/mod.rs(1 hunks)lib/storage/src/content_manager/toc/collection_meta_ops.rs(2 hunks)lib/storage/src/content_manager/toc/create_collection.rs(2 hunks)lib/storage/tests/integration/alias_tests.rs(1 hunks)src/consensus.rs(1 hunks)src/migrations/single_to_cluster.rs(5 hunks)tests/openapi/test_collection_metadata.py(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Applied to files:
lib/collection/src/config.rs
📚 Learning: 2025-08-15T11:41:10.629Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
Applied to files:
lib/collection/src/config.rs
📚 Learning: 2025-08-11T00:37:34.100Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.
Applied to files:
lib/collection/src/config.rs
📚 Learning: 2025-08-10T18:26:33.017Z
Learnt from: generall
PR: qdrant/qdrant#7006
File: docs/redoc/master/openapi.json:10393-10401
Timestamp: 2025-08-10T18:26:33.017Z
Learning: In qdrant, docs/redoc/master/openapi.json is auto-generated. Any description changes must be made in the source (e.g., lib/api/src/rest/schema.rs and/or lib/api/src/grpc/proto/points.proto), not edited directly in the JSON.
Applied to files:
lib/api/src/grpc/qdrant.rs
🧬 Code graph analysis (6)
lib/collection/src/shards/local_shard/shard_ops.rs (1)
lib/collection/src/operations/types.rs (15)
from(97-104)from(188-210)from(259-280)from(671-673)from(691-693)from(1185-1190)from(1194-1199)from(1203-1253)from(1257-1259)from(1263-1268)from(1272-1277)from(1281-1286)from(1290-1295)from(1299-1304)from(1308-1313)
tests/openapi/test_collection_metadata.py (2)
tests/openapi/helpers/helpers.py (1)
request_with_validation(41-116)tests/openapi/conftest.py (1)
on_disk_vectors(11-12)
src/migrations/single_to_cluster.rs (4)
lib/collection/src/shards/shard_holder/mod.rs (2)
new(77-110)shards(284-284)lib/collection/src/collection/mod.rs (1)
new(103-200)lib/collection/src/collection/state_management.rs (1)
shards(190-190)lib/collection/src/shards/shard_holder/shard_mapping.rs (2)
shard_key(71-76)shard_ids(50-59)
lib/storage/src/content_manager/conversions.rs (1)
lib/api/src/conversions/json.rs (1)
proto_to_payloads(49-55)
lib/collection/src/operations/conversions.rs (1)
lib/api/src/conversions/json.rs (2)
payload_to_proto(9-14)proto_to_payloads(49-55)
lib/api/src/grpc/qdrant.rs (1)
lib/collection/src/operations/conversions.rs (1)
map(1930-1932)
🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md
17-17: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
29-29: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
88-88: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consistency
- GitHub Check: e2e-tests
- GitHub Check: test-consensus-compose
- GitHub Check: lint
- GitHub Check: integration-tests-consensus
- GitHub Check: integration-tests
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: storage-compat-test
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
🔇 Additional comments (35)
lib/collection/src/shards/local_shard/shard_ops.rs (2)
165-168: Switch to explicit From() is correct and side‑effect free.Functionally equivalent to the prior
.into()and makes the target type explicit at the call site. No behavioral change; good fit givenimpl From<ShardInfoInternal> for CollectionInfoexists.
165-168: Metadata propagation confirmed viaCollectionConfig.metadata.The
metadatafield lives onCollectionConfigInternaland is correctly carried through in theFrom<CollectionConfigInternal> for CollectionConfigimplementation (mappingmetadata→CollectionConfig.metadata), and then included inCollectionInfovia itsconfigfield (seeconfig: CollectionConfig::from(config)). There is intentionally no top-levelmetadataonCollectionInfo; it surfaces underconfig.metadatain API responses.No further changes required here.
lib/collection/src/shards/replica_set/update.rs (2)
772-773: LGTM: initialize new metadata field in the test fixtureExplicitly setting
metadata: Nonekeeps the fixture aligned with the updatedCollectionConfigInternaland avoids unintended behavior changes in this test. No issues found.
772-773: Follow-up: add a targeted test for metadata merge and persistence semanticsGiven the PR introduces in-memory merge behavior for collection-level metadata, consider adding a small unit/integration test that:
- Creates a collection with
metadata: None.- Applies an update that sets metadata via the update path.
- Asserts the new metadata is visible from memory.
- Reloads the collection (or simulates) and asserts whether metadata persists or not, per intended design.
This will lock in the contract between API, in-memory config, and on-disk persistence.
lib/collection/src/tests/query_prefetch_offset_limit.rs (1)
62-63: LGTM: metadata default added to config literalAdding
metadata: Nonekeeps the fixture compiling against the new struct without affecting the query/prefetch assertions. Looks good.lib/collection/tests/integration/multi_vec_test.rs (1)
66-67: LGTM: metadata field initialized
metadata: Noneis correctly added to theCollectionConfigInternalliteral. Multi-vector tests remain unaffected by this addition.lib/collection/src/shards/shard_holder/shard_mapping.rs (1)
189-189: LGTM: test config updated for new fieldIncluding
metadata: Nonein the test’s config ensures compatibility with the expanded struct. No behavior changes to shard key mapping or migration logic.lib/collection/benches/batch_query_bench.rs (1)
71-72: LGTM: benchmark fixture updated
metadata: Noneis added to the bench setup config. Bench behavior and performance measurements remain unchanged.lib/collection/benches/batch_search_bench.rs (1)
88-90: LGTM; bench updated for the new field.Setting
metadata: Nonehere is appropriate for a perf bench. If you later want to include a small constant metadata object to exercise that path too, it should not materially affect results.lib/collection/src/config.rs (1)
14-17: Importing Payload into config module looks correctBringing segment::types::Payload into scope here is consistent with using it in CollectionConfigInternal and avoids re-qualifying the type throughout the module.
lib/storage/src/content_manager/toc/create_collection.rs (2)
35-52: Metadata field correctly plumbed from CreateCollection.Clean destructuring; ownership of
metadatamoves fromoperationas intended. No issues.
214-223: Metadata Persistence Verified
Collection::newimmediately invokescollection_config.save(path), which serializes the entireCollectionConfigInternal(including themetadatafield) toconfig.jsonviaserde_json::to_vec(self)and an atomic write (qdrant.tech). On restart,Collection::loadusesCollectionConfigInternal::load(path)to deserialize the same file, restoring themetadata. Furthermore, collection‐level snapshots bundle theconfig.jsonfile, so the metadata will be present in any snapshot (qdrant.tech).No changes required.
lib/storage/src/content_manager/mod.rs (1)
124-141: Addedmetadata: Noneto UpdateCollection in remove_replica — good catch.Keeps the struct literal aligned with the expanded API so future serde/default changes don’t bite. LGTM.
lib/collection/src/collection/collection_ops.rs (2)
7-7: Import ofPayloadis appropriate.Needed for the new metadata update method; no concerns.
172-176: Merge semantics verified
ThePayload::mergeimplementation delegates toutils::merge_map, which for each entry in the source:
- Removes the key if the incoming value is
null.- Otherwise calls
dest.insert(key, value), replacing any existing value under that key.This ensures new values override existing ones and that nested JSON objects are replaced (rather than recursively merged), matching the test’s assumption around overriding keys like
new_meta: "value3". There are no further changes needed here.tests/openapi/test_collection_metadata.py (1)
1-14: Autouse fixture setup/teardown looks good.Keeps tests isolated and exercises both in-memory and on-disk vectors via parametrization.
lib/storage/src/content_manager/toc/collection_meta_ops.rs (1)
138-139: Plumbs metadata through UpdateCollection destructuring correctlyIncluding
metadatain the destructuredUpdateCollectionkeeps this method future-proof and compiler-enforced for new fields. No issues spotted.lib/collection/src/telemetry.rs (3)
5-5: Necessary import for telemetry wiring PayloadImporting
Payloadhere is expected for the new telemetry surface. Looks good.
89-93: Telemetry: metadata added and anonymized — good default
#[serde(default, skip_serializing_if = "Option::is_none")]prevents noisy output.#[anonymize(value = None)]avoids leaking user-provided metadata in telemetry.No changes requested.
105-116: Propagates metadata from internal config into telemetryThe
From<CollectionConfigInternal>impl now mirrors the struct fields, includingmetadata. This keeps telemetry consistent with the internal source of truth.lib/collection/src/operations/types.rs (1)
180-185: Public API: adds collection-level metadata with appropriate serde behavior
- Field docs help position intended use.
Option<Payload>withskip_serializing_ifpreserves backward compatibility.
No concerns on the surface API.lib/api/src/grpc/proto/collections.proto (1)
6-7: Imports Value type for metadataThe
json_with_int.protoimport enables theValuetype; this keeps consistency with other JSON-like fields. All good.lib/collection/src/operations/conversions.rs (2)
391-399: Plumbs metadata through CollectionConfig conversion (input→output) correctly.Destructuring
metadatahere prepares it for inclusion in the gRPC response. No issues spotted.
530-533: gRPC response includes metadata map; good defaulting.Serializing
Option<Payload>toHashMap<String, Value>withunwrap_or_default()is the right call; it keeps wire format stable (empty map instead of omitting the field).lib/storage/src/content_manager/conversions.rs (1)
113-117: CreateCollection: empty metadata → None is acceptable.Treating empty metadata as
Noneon create is fine since the resulting state is the same. Keep as-is unless you later choose to preserve an explicit-empty semantic on creation too.docs/grpc/docs.md (2)
1727-1738: Doc/code contract: “empty object clears metadata” requires preserving emptiness in server-side conversion.The docs for UpdateCollection.metadata say an empty object clears stored metadata. Ensure the server conversion (and update handler) honor that. See my code comment proposing to pass
Some(empty)through. If this behavior isn’t implemented, either adjust the code (preferred) or amend the doc.
17-17: Markdown lint nit: unordered list indentation (MD007).Autogenerated file; safe to ignore unless your pipeline enforces it.
Also applies to: 29-29, 88-88
lib/storage/src/content_manager/collection_meta_ops.rs (3)
185-190: Nicely scoped metadata field on CreateCollection.The serde defaults and skip-when-none are correct; type aligns with Payload usage elsewhere.
289-291: Defaulting metadata to None in new_empty is appropriate.Keeps the “no-op unless provided” semantics.
439-441: Propagates metadata from internal config to CreateCollection (for cloning/migration).Good pass-through; keeps metadata intact across operations.
Also applies to: 469-470
src/migrations/single_to_cluster.rs (3)
41-49: Good: use collection State snapshot to avoid deep coupling.Destructuring
Statehere simplifies downstream usage and keeps responsibilities clear.
61-63: Migration builds CreateCollection with full config+metadata; looks correct.Passing through
vectors,sparse_vectors, config, andmetadataensures semantic parity post-migration.Also applies to: 89-107
140-156: Replica state activation loop is correct.Only marks local replicas Active; aligns with post-migration expectations.
docs/redoc/master/openapi.json (1)
7491-7497: Payload schema relocation verified—no duplicates or stale refs remain
- Exactly one
Payloaddefinition undercomponents/schemas(line 7490)- Nine exact
$ref: "#/components/schemas/Payload"occurrences- No leftover
#/definitions/Payloador stray#/components/PayloadreferencesAll existing references point to the single relocated schema. No further action required.
lib/api/src/grpc/qdrant.rs (1)
2-101: Verify migration scope and JSON semanticsPlease address the following remaining points:
Wire compatibility / migration
Confirm that the newjson_with_int.prototypes (i.e.Value/Struct/ListValue) are only used in newly added fields and aren’t replacing existinggoogle.protobuf.Value-based fields in any public RPC. For example, the imports inpoints.proto(e.g.map<string, Value> payload) andcollections.proto(e.g.map<string, Value> metadata) should correspond solely to new fields, so that mixed-version clusters or SDKs won’t experience decode failures.JSON representation vs.
serdederives (optional refactor)
The current#[derive(serde::Serialize)]impl will emit{ "kind": { "IntegerValue": 42 } }rather than a bare JSON number or string. If these types are ever serialized to telemetry or surfaced in JSON APIs, consider either:
• A customSerialize/Deserializeimpl (or newtype wrapper) for native JSON (null/number/string/bool/object/array), with a distinct path for integers.
• Or adjust the doc comments to reflect the{ "kind": … }shape.Naming ergonomics (nitpick)
DefiningStruct/Value/ListValuealongside the well-known types can confuse downstream users. You may want to publicly alias or re-export them (e.g.pub use json_with_int::{Value as JsonValue}) to reduce qualification boilerplate.Bounds & validation (architecture advice)
Since these dynamic values underpin user-supplied metadata, ensure the service layer enforces sensible upper bounds on key count, total serialized size, and nesting depth to prevent trivial RAM DoS vectors.Proto scan completed
A search confirmed no lingering imports or references togoogle.protobuf.Value/Struct/ListValueoutside of timestamp/empty in other services. Everything inlib/api/src/grpc/proto/json_with_int.protois self-contained.
| metadata: if metadata.is_empty() { | ||
| None | ||
| } else { | ||
| Some(json::proto_to_payloads(metadata)?) | ||
| }, |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Bug: empty metadata on UpdateCollection cannot clear metadata (spec says it should).
Docs say “To remove metadata, set it to an empty object”, but this conversion turns an empty object into None (no-op). That makes it impossible to clear metadata via PATCH.
Fix by preserving emptiness: pass Some(empty) through so the update layer can interpret it as “clear all metadata”.
Apply this diff:
- metadata: if metadata.is_empty() {
- None
- } else {
- Some(json::proto_to_payloads(metadata)?)
- },
+ // Preserve empty object to signal "clear all metadata"
+ metadata: Some(json::proto_to_payloads(metadata)?),Follow-up: ensure the update handler treats Some(Payload { .. empty .. }) as “clear” (or implement that behavior if missing). Add an e2e test: PATCH with { "metadata": {} } then GET should return no metadata.
🏁 Script executed:
#!/bin/bash
# Verify how UpdateCollection.metadata is applied and whether empty means "clear".
# 1) Locate where UpdateCollection is handled to update collection state.
rg -nP -C3 --type=rust '\bUpdateCollection\b.*\{[^}]*metadata'
rg -nP -C3 --type=rust 'metadata\b' lib/collection src lib/storage \
| rg -nP -C2 '(merge|clear|overwrite).*metadata|metadata\s*=\s*'
# 2) Look for semantics that merge or clear collection-level metadata.
rg -nP -C3 --type=rust 'merge.*metadata|clear.*metadata|set.*metadata|overwrite.*metadata'Length of output: 9844
Review Conversion Logic: Preserve Empty Metadata in PATCH Requests
The conversion in lib/storage/src/content_manager/conversions.rs currently drops empty metadata (treating {} as “not provided”) instead of passing it through. This prevents a PATCH of { "metadata": {} } from ever clearing existing metadata.
Specifically, the code at lines 202–206 reads:
metadata: if metadata.is_empty() {
None
} else {
Some(json::proto_to_payloads(metadata)?)
},By returning None when the incoming metadata map is empty, we skip the update entirely instead of signaling a clear.
To align with the docs (“To remove metadata, set it to an empty object”) and enable clearing metadata, replace that snippet with:
- metadata: if metadata.is_empty() {
- None
- } else {
- Some(json::proto_to_payloads(metadata)?)
- },
+ // Preserve empty object to signal "clear all metadata"
+ metadata: Some(json::proto_to_payloads(metadata)?),• Apply this change in
lib/storage/src/content_manager/conversions.rs
around lines 202–206.
• Follow up: Verify in
lib/storage/src/content_manager/collection_meta_ops.rs
and corresponding handlers that
Some(empty_payload) is interpreted as “clear metadata.”
If not, implement the logic to treat an empty Payload as a clear signal.
• Test: Add an end-to-end test:
- PATCH
/collections/{name}with{ "metadata": {} } - Then GET
/collections/{name}should return no metadata (i.e.{}).
🤖 Prompt for AI Agents
In lib/storage/src/content_manager/conversions.rs around lines 202 to 206, the
conversion currently maps an empty incoming metadata map to None which prevents
PATCH requests from clearing metadata; change the logic to always return
Some(json::proto_to_payloads(metadata)?) even when metadata is empty so that an
explicit empty payload signals “clear metadata.” After that, verify
lib/storage/src/content_manager/collection_meta_ops.rs and related handlers
treat Some(empty_payload) as a clear operation and if they do not, update them
to interpret an empty Payload as a clear request. Finally, add an end-to-end
test: PATCH /collections/{name} with { "metadata": {} } then GET
/collections/{name} and assert metadata is empty.
There was a problem hiding this comment.
This prevents a PATCH of { "metadata": {} } from ever clearing existing metadata.
That is correct, however it it is possible to clean specific metadata key by doing
{ "metadata": {"key": null} }
There was a problem hiding this comment.
@generall Thank you for the clarification! You're absolutely right that individual metadata keys can be cleared using {"key": null} syntax - that's a more granular approach to metadata management.
This raises an important design question: Is the current behavior for empty metadata objects {} intentional?
Currently:
{"metadata": {"key": null}}→ clears specific key ✅ (as you noted){"metadata": {}}→ no-op (treated as "not provided")
The documentation mentions "To remove metadata, set it to an empty object" - but if key-level clearing via null is the intended mechanism, then perhaps:
- The docs should be updated to clarify that clearing is done per-key with null values
- Or, both approaches should be supported (empty object = clear all, null values = clear specific keys)
What's the intended design here? Should empty metadata objects clear all metadata, or is the current key-by-key clearing approach the preferred pattern?
🐰💭
✏️ Learnings added
Learnt from: generall
PR: qdrant/qdrant#7123
File: lib/storage/src/content_manager/conversions.rs:202-206
Timestamp: 2025-08-25T09:23:06.556Z
Learning: In Qdrant's collection metadata system, individual metadata keys can be cleared by setting them to null in PATCH requests, e.g., `{"metadata": {"key": null}}` will remove the specific "key" from the collection's metadata.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
There was a problem hiding this comment.
we can't have { "metadata": {} } to clear whole object, as there are no difference between empty map and null in gPRC, so API would be inconsistent
There was a problem hiding this comment.
@generall Ah, that's a crucial architectural constraint I missed! You're absolutely right - gRPC doesn't distinguish between an empty map and null/unset, so using {"metadata": {}} to clear all metadata would create inconsistency between the gRPC and REST APIs.
The current design using {"metadata": {"key": null}} for individual key clearing is the correct approach that works consistently across both protocols. My suggestion to change the conversion logic was based on a misunderstanding of this fundamental limitation.
Thank you for clarifying this - it makes perfect sense why the API is designed this way to maintain consistency across different protocol endpoints.
✏️ Learnings added
Learnt from: generall
PR: qdrant/qdrant#7123
File: lib/storage/src/content_manager/conversions.rs:202-206
Timestamp: 2025-08-25T09:26:13.854Z
Learning: In Qdrant's API design, gRPC protocol limitations prevent using empty objects `{}` to clear all metadata because gRPC doesn't distinguish between empty maps and null/unset values. This constraint requires API consistency between gRPC and REST endpoints, making per-key clearing with null values (`{"key": null}`) the standard approach for metadata management.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: generall
PR: qdrant/qdrant#7123
File: lib/storage/src/content_manager/conversions.rs:202-206
Timestamp: 2025-08-25T09:23:06.556Z
Learning: In Qdrant's collection metadata system, individual metadata keys can be cleared by setting them to null in PATCH requests, e.g., `{"metadata": {"key": null}}` will remove the specific "key" from the collection's metadata.
| response = request_with_validation( | ||
| api="/collections/{collection_name}", | ||
| method="PATCH", | ||
| path_params={"collection_name": collection_name}, | ||
| body={"metadata": {"new_meta": "value3"}} | ||
| ) | ||
| assert response.ok | ||
|
|
||
| response = request_with_validation( | ||
| api='/collections/{collection_name}', | ||
| method="GET", | ||
| path_params={'collection_name': collection_name}, | ||
| ) | ||
| assert response.ok | ||
| result = response.json()["result"] | ||
|
|
||
| assert result["config"]["metadata"] == {"new_meta": "value3", "new_meta_2": "value2"} |
There was a problem hiding this comment.
As far as I can see, there is no way to unset keys from the metadata.
How about making the patch request replace the whole metadata object, not merging keys? That would allow unsetting keys.
I understand that isn't exactly great either, because it still doesn't allow us to unset the whole object. But I'd argue it's a bit better.
There was a problem hiding this comment.
there is {"metadata": {"key": null}}
There was a problem hiding this comment.
That doesn't work through our Python client with None. That's what I tried in the included test 😃
There was a problem hiding this comment.
That doesn't work through our Python client with None. That's what I tried in the included test 😃
why not?
There was a problem hiding this comment.
🤷♂️ Looks like it works now. I've added it to the test.
📝 WalkthroughWalkthroughAdds collection-level metadata support across the stack. gRPC/proto gains metadata maps on CreateCollection, UpdateCollection, and CollectionConfig (map<string, Value>) with JSON-like Value types added/consolidated in Rust bindings. OpenAPI introduces a central Payload schema and references it from multiple entities, adding metadata fields and a nullable uuid in telemetry. Internal structs now include Option metadata; conversions and storage paths are updated to carry and merge metadata. A new Collection::update_metadata method persists changes; state/apply paths detect metadata updates. Migrations and shard ops adapt to new config shape. Tests and benches update initializers; a new OpenAPI test validates PATCH merge semantics. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/collection/src/operations/verification/mod.rs (1)
506-515: Add positive metadata coverage testA search of the repository found no existing tests that exercise
metadataonCollectionConfigInternal. To avoid regressions, please add at least one new test for a non‐None metadata payload. For example, inlib/collection/src/operations/verification/mod.rs(or in a new integration test undertests/), create a collection config withmetadata = Some(your_payload)and assert that:
- It survives a save/load round‐trip unchanged.
- It does not change the outcome of strict‐mode verification.
- When metadata is
None, it is omitted from telemetry/info outputs.Pinpoint test locations to add or files to create:
- lib/collection/src/operations/verification/mod.rs (adjacent to existing fixture tests)
- Or a new file: tests/collection_metadata_roundtrip.rs
Test skeleton suggestion:
#[tokio::test] async fn metadata_roundtrip_preserved() -> CollectionResult<()> { // 1. Build config with Some(metadata) // 2. Save and load collection // 3. Assert loaded_config.metadata == original metadata // 4. Run strict‐mode verification and assert same result as metadata=None // 5. Fetch telemetry/info and assert metadata field is absent when set to None Ok(()) }
♻️ Duplicate comments (1)
lib/collection/src/collection/collection_ops.rs (1)
166-180: update_metadata persists to disk and drops the write guard — looks correct
- Correctly merges into existing metadata or sets it when absent.
- Drops the write lock before saving, matching save patterns elsewhere.
- Addresses the earlier persistence concern raised in review threads.
Optional polish:
- Add a doc comment clarifying merge semantics (null removes keys).
- Minor variable naming consistency (use “config” like other methods).
Suggested inline doc + naming tweak:
- pub async fn update_metadata(&self, metadata: Payload) -> CollectionResult<()> { - let mut collection_config_guard: tokio::sync::RwLockWriteGuard< + /// Merge collection metadata and persist. + /// + /// Semantics: + /// - Keys present are merged into existing metadata. + /// - Values set to JSON null remove the corresponding keys. + pub async fn update_metadata(&self, metadata: Payload) -> CollectionResult<()> { + let mut config: tokio::sync::RwLockWriteGuard< '_, crate::config::CollectionConfigInternal, > = self.collection_config.write().await; - if let Some(current_metadata) = collection_config_guard.metadata.as_mut() { + if let Some(current_metadata) = config.metadata.as_mut() { current_metadata.merge(&metadata); } else { - collection_config_guard.metadata = Some(metadata); + config.metadata = Some(metadata); } - drop(collection_config_guard); + drop(config); self.collection_config.read().await.save(&self.path)?; Ok(()) }
🧹 Nitpick comments (35)
lib/collection/src/shards/shard_holder/shard_mapping.rs (1)
180-190: Add a test for non-emptymetadatain shard mappingI wasn’t able to find any existing tests that instantiate
CollectionConfigInternalwith a non-Nonemetadatafield, including in the shard-mapping integration tests. Please add a test case underlib/collection/src/shards/shard_holder(or in your integration suite) that:
- Constructs
CollectionConfigInternalwithmetadata: Some(json!({"env": "test"}))- Calls the shard-mapping logic (via
to_bytes/loador through the public API)- Asserts that the serialized bytes contain the
"env":"test"payload- Verifies that shard-key mapping behavior is unchanged
This will ensure that arbitrary JSON metadata is correctly round-tripped without side-effects.
src/migrations/single_to_cluster.rs (2)
61-63: Defaulting sharding_method via unwrap_or_default — confirm default matches desired migration semanticsMinor: ensure ShardingMethod::default aligns with how you expect legacy (single-node) collections to migrate (typically Auto). If not, consider making the fallback explicit for readability.
91-103: Auto sharding: distribution filters to local replicas only — guard against empty distributionIf, for any reason, this peer holds no replicas (edge case), you’d propose an empty distribution. Consider warning or short-circuiting to avoid a silent no-op.
Apply this localized guard:
- collection_create_operation.set_distribution(ShardDistributionProposal { - distribution: shards + let distribution: std::collections::HashMap<_, _> = shards .iter() .filter_map(|(shard_id, shard_info)| { if shard_info.replicas.contains_key(&this_peer_id) { Some((*shard_id, vec![this_peer_id])) } else { None } - }) - .collect(), - }); + }) + .collect(); + if distribution.is_empty() { + tracing::warn!("No local replicas found during single→cluster migration for collection {}", collection_name); + } + collection_create_operation.set_distribution(ShardDistributionProposal { distribution });lib/api/src/grpc/qdrant.rs (4)
24-61: Clarify serde JSON shape for oneof Value before relying on it externallyThese types derive serde::Serialize, including the oneof value::Kind. If any code (telemetry, logs, debug dumps, etc.) relies on the JSON produced by serde here, be aware it won’t match “native” google.protobuf JSON for Value unless you add custom serializers. If this is only used server‑side or for gRPC adapters, you’re fine; otherwise, please document or add tests for the emitted JSON shape.
Would you like me to sketch a small serializer that converts this Value to serde_json::Value with the expected null/number/string/bool/array/object semantics?
811-814: CreateCollection.metadata: add usage notes and bounds; align terminologyNice addition. Two suggestions:
- Document intended semantics at create time (e.g., nested “StructValue” allowed, max size, allowed key set). Consider bounding the total serialized size or nesting depth to avoid abuse.
- Wording nits: UpdateCollection says “JSON-like”; CreateCollection/CollectionConfig say “Arbitrary JSON”. Pick one term across the API to avoid confusion.
Apply this comment-only tweak for consistency:
- /// Arbitrary JSON metadata for the collection + /// Arbitrary JSON-like metadata for the collection. + /// Nested objects (StructValue) and arrays (ListValue) are allowed. + /// Note: Consider size/nesting limits to protect against overly large metadata.
858-861: UpdateCollection.metadata cannot express “clear all” and makes deletes ambiguousA proto3 map field is always present as an empty map when omitted, so the server cannot distinguish:
- no-op (caller didn’t touch metadata) vs
- “clear all metadata”.
Also, delete semantics for individual keys are not defined. If NullValue is intended to mean “delete key”, please document and test it; otherwise, consider a diff message with explicit set/unset.
Proto-level shape (reflected here after regeneration) that avoids ambiguity:
- /// Arbitrary JSON-like metadata for the collection, will be merged with already stored metadata - #[prost(map = "string, message", tag = "10")] - pub metadata: ::std::collections::HashMap<::prost::alloc::string::String, Value>, + /// Metadata changes: keys in `set` are upserted/merged; keys in `unset` are removed. + #[prost(message, optional, tag = "10")] + pub metadata: ::core::option::Option<CollectionMetadataDiff>,Add this message (shown here in Rust; source change should be made in the .proto, then re-generate):
#[derive(serde::Serialize, Clone, PartialEq, ::prost::Message)] pub struct CollectionMetadataDiff { /// Upsert/merge these keys #[prost(map = "string, message", tag = "1")] pub set: ::std::collections::HashMap<::prost::alloc::string::String, Value>, /// Remove these keys (if present) #[prost(string, repeated, tag = "2")] pub unset: ::prost::alloc::vec::Vec<::prost::alloc::string::String>, }If you prefer to keep the map, explicitly state and test the rules:
- empty map = no-op,
- NullValue = delete the key,
- StructValue = deep-merge, ListValue = replace (or append?) — document which.
858-861: Deep-merge semantics need to be unambiguous and testedBecause Value supports nested structs and lists, merge behavior matters:
- StructValue: deep-merge (recursive) or replace?
- ListValue: replace, append, or dedupe?
- IntegerValue/DoubleValue/StringValue/BoolValue: replace.
- NullValue: delete vs literal null?
Please encode these rules in doc comments and add unit/integration tests (gRPC and REST) that cover:
- no-op update,
- add, replace, delete,
- deep merge vs replace at nested levels,
- list behavior.
I can draft test vectors and a merge helper that enforces one canonical policy (e.g., deep-merge for StructValue, replace for ListValue, NullValue deletes).
Also applies to: 811-814, 963-966
docs/grpc/docs.md (5)
17-17: Keep ToC indentation consistent or silence MD007 for generated docsmarkdownlint flags MD007 (ul-indent) on these lines. The entire ToC uses 4-space indentation for nested items, so fixing only the new entries would make the list inconsistent. Prefer either:
- adjust the generator to emit 2-space indents everywhere, or
- add a markdownlint override for this generated file/folder to silence MD007.
Also applies to: 29-29, 88-88
519-528: Optional: clarify that this is a proto map rendered as repeated MapEntryTo reduce confusion for readers unfamiliar with protoc’s doc style, add a short note that MetadataEntry represents a
map<string, Value>in the proto.Apply this diff to append a brief hint to the section heading:
-### CollectionConfig.MetadataEntry +### CollectionConfig.MetadataEntry +Note: Rendered as a repeated entry in this documentation, but defined in proto as `map<string, Value>`.
736-745: Optional: add the same “proto map” hint here for consistencyMirror the explanatory note you add for CollectionConfig.MetadataEntry.
-### CreateCollection.MetadataEntry +### CreateCollection.MetadataEntry +Note: Rendered as a repeated entry in this documentation, but defined in proto as `map<string, Value>`.
1737-1738: Document merge semantics and null-based key removal for UpdateCollection.metadataThe current description says metadata “will be merged,” but doesn’t define the merge strategy or how to remove keys. Based on the implementation, setting a metadata value to null removes that key. Also, keep wording consistent (“JSON” vs “JSON-like”).
Apply this diff to clarify semantics and unify wording:
-| metadata | [UpdateCollection.MetadataEntry](#qdrant-UpdateCollection-MetadataEntry) | repeated | Arbitrary JSON-like metadata for the collection, will be merged with already stored metadata | +| metadata | [UpdateCollection.MetadataEntry](#qdrant-UpdateCollection-MetadataEntry) | repeated | Arbitrary JSON metadata for the collection. Merge is shallow by key with existing metadata; setting a key's value to `null` removes that key. |Note: The null-removal behavior is confirmed by the team’s learnings for this PR (“setting
{"metadata": {"key": null}}removes the key”). If desired, I can add a cross-link or a short example snippet in the HTTP docs for parity.
1746-1753: Optional: cross-reference Value(null) here to make deletion semantics discoverableA tiny inline hint here will help users discover how to delete keys when reading the entry schema alone.
-### UpdateCollection.MetadataEntry +### UpdateCollection.MetadataEntry +Tip: Use `value = NullValue` (JSON `null`) to delete an existing metadata key during an UpdateCollection request.docs/redoc/master/openapi.json (4)
6725-6734: Add punctuation and tighten description for metadata (and keep semantics consistent with other sections).Minor wording/punctuation issue and opportunity to align phrasing with other occurrences.
Apply this diff to polish the description:
- "metadata": { - "description": "Arbitrary JSON metadata for the collection This can be used to store application-specific information such as creation time, migration data, inference model info, etc.", + "metadata": { + "description": "Arbitrary JSON metadata for the collection. Use this to store application-specific information such as creation time, migration data, inference model info, etc.",Optionally add an example to improve discoverability:
"metadata": { "description": "Arbitrary JSON metadata for the collection. Use this to store application-specific information such as creation time, migration data, inference model info, etc.", "anyOf": [ { "$ref": "#/components/schemas/Payload" }, { "nullable": true } ] + , + "example": { "team": "search", "purpose": "experiments" } }
7490-7497: Centralized Payload schema: good consolidation; consider (optional) stronger type shape for client codegen.The new shared Payload is a solid improvement and will reduce duplication. If you want stricter typing for SDKs, consider constraining values to JSON primitives/arrays/objects recursively instead of bare
additionalProperties: true. Not required, but it can improve generated types.For example:
- "Payload": { - "type": "object", - "additionalProperties": true, + "Payload": { + "type": "object", + "additionalProperties": { + "oneOf": [ + { "type": "string" }, + { "type": "number", "format": "double" }, + { "type": "boolean" }, + { "type": "array", "items": { "$ref": "#/components/schemas/Payload" } }, + { "$ref": "#/components/schemas/Payload" } + ] + }, "example": { "city": "London", "color": "green" } },Note: this keeps values fully JSON-like while remaining recursive. Feel free to keep current version if you prefer simplicity.
9735-9744: CreateCollection.metadata: align copy, add example for clarity.Small text polish and a concrete example helps usage.
- "metadata": { - "description": "Arbitrary JSON metadata for the collection This can be used to store application-specific information such as creation time, migration data, inference model info, etc.", + "metadata": { + "description": "Arbitrary JSON metadata for the collection. Use this to store application-specific information such as creation time, migration data, inference model info, etc.", "anyOf": [ { "$ref": "#/components/schemas/Payload" }, { "nullable": true } ] - } + , + "example": { "owner": "data-platform", "created_by": "provisioner", "created_at": "2025-08-24T21:10:00Z" } + }
11975-11984: CollectionConfigTelemetry.metadata: reuse the same phrasing as input fields for consistency.Telemetry exposure looks good. Mirror the phrasing used for
CollectionConfig.metadatato make behavior clear to readers scanning schemas.- "metadata": { - "description": "Arbitrary JSON metadata for the collection", + "metadata": { + "description": "Arbitrary JSON metadata for the collection.", "anyOf": [ { "$ref": "#/components/schemas/Payload" }, { "nullable": true } ] }Optional: add a short note that this is read-only observational data (since it’s telemetry), e.g., “Reported value of collection metadata (read-only).”
lib/collection/tests/integration/common/mod.rs (1)
58-59: Explicit None is fine; consider a metadata-aware fixture for upcoming testsThe added field correctly satisfies the new struct shape. To make it easy to write metadata-focused tests without duplicating setup, consider introducing a variant of this fixture that accepts an
Option<Payload>and forwards it here, keeping this one as a convenience wrapper that passesNone. This keeps existing call sites untouched while enabling targeted metadata tests.I can draft a small helper like:
use segment::types::Payload; #[cfg(test)] pub async fn simple_collection_fixture_with_metadata( collection_path: &Path, shard_number: u32, metadata: Option<Payload>, ) -> Collection { // ... same setup ... let collection_config = CollectionConfigInternal { // ... uuid: None, metadata, }; // ... }And keep the current
simple_collection_fixturecalling the new one withNone.lib/collection/src/tests/query_prefetch_offset_limit.rs (1)
62-63: LGTM – no behavioral change for prefetch testsSetting
metadata: Nonein the existing tests aligns with the new field and doesn’t affect query/prefetch logic. I verified that there are currently no tests initializingmetadata: Some(...)in the suite (the only match is in production code:lib/storage/src/content_manager/consensus_manager.rs:1041), so we lack coverage for non-empty metadata.• Consider adding a test under
lib/collection/src/tests/(e.g. inquery_prefetch_offset_limit.rsor a new file) that initializes your request withmetadata: Some(my_meta)and asserts it’s preserved in the result.lib/collection/src/tests/snapshot_test.rs (1)
58-59: Add snapshot round‑trip coverage for non-empty metadataRight now we snapshot/restore with
metadata: None. It would be valuable to assert that custom metadata survives snapshot and restore. Suggest adding a sibling test (or parametrizing this one) that sets a small payload and then checks it on the recovered collection.Happy to add a test that sets, snapshots, restores, and verifies something like:
use segment::types::Payload; use serde_json::{Map, Value}; let meta = Payload(Map::from_iter([("any_values".to_string(), Value::from("go-like-this"))])); // set metadata during config init or via update method if availableThen assert the recovered collection exposes the same metadata via the appropriate API.
lib/collection/src/tests/points_dedup.rs (1)
64-65: LGTM; optional: one dedup run with non-empty metadataThe explicit
Noneis correct. To harden guarantees that metadata does not affect point deduplication, consider one variant of the fixture initialized with a small non-empty metadata payload and rerun one of the dedup assertions unchanged. This keeps behavior coverage strong without changing production code.I can wire a tiny metadata payload here with:
use segment::types::Payload; use serde_json::{Map, Value}; let config = CollectionConfigInternal { // ... uuid: None, metadata: Some(Payload(Map::from_iter([( "any_values".to_string(), Value::from("go-like-this"), )]))), };lib/collection/tests/integration/snapshot_recovery_test.rs (1)
48-49: Cover metadata persistence across snapshot recovery (integration path)I ran a repo-wide search for non-empty
metadatain tests and only found one occurrence in production code (lib/storage/src/content_manager/consensus_manager.rs), with no tests exercisingSome(meta)in integration tests. To strengthen this snapshot recovery test, let’s:• In
lib/collection/tests/integration/snapshot_recovery_test.rs, set a small non-empty metadata payload before snapshotting, for example:let collection_desc = CollectionDescription { name: "my_collection".into(), metadata: Some(json!({"version": "1"})), // … };• After performing recovery, assert that the recovered metadata matches the original:
assert_eq!(recovered_collection.metadata, Some(json!({"version": "1"})));This ensures collection-level metadata is correctly persisted through both snapshot and recovery paths. Please let me know if you’d like me to draft the exact code snippet.
lib/segment/src/types.rs (1)
1672-1675: Deriving Hash for Payload is appropriate but confirm usage expectationsAdding
HashtoPayload(Map<String, Value>)is likely required so enums/structs containingPayload(e.g., UpdateCollection -> ConsensusOperations) can keep derivingHash. Please double-check two things:
- That downstream hashing is not relied upon for cross-node determinism or persistence semantics (use
StableHashif that’s needed).- That our current serde_json/indexmap versions guarantee a sensible
Hashimplementation forMap<String, Value>in our toolchain.If both are satisfied, this change looks good.
lib/storage/src/content_manager/mod.rs (1)
129-141: Prefer new_empty constructor to reduce boilerplate and future churnYou’re constructing
UpdateCollectionwith all Nones just to attach replica changes. Using the helper avoids repeating fields and protects this site from future field additions.Apply:
- let mut operation = UpdateCollectionOperation::new( - collection_name, - UpdateCollection { - vectors: None, - optimizers_config: None, - params: None, - hnsw_config: None, - quantization_config: None, - sparse_vectors: None, - strict_mode_config: None, - metadata: None, - }, - ); + let mut operation = UpdateCollectionOperation::new_empty(collection_name);tests/openapi/test_collection_metadata.py (3)
16-23: Rename test to reflect behavior (metadata patch/merge)Current name is misleading. Consider renaming to improve discoverability.
-def test_collection_exists(collection_name): +def test_collection_metadata_patch_merge_and_delete(collection_name):
33-47: Per-key deletion via JSON null is exercised — but note this bypasses the Python clientThis validates the contract of setting a key to null to remove it. The earlier reviewer concern was about the Python SDK translating None; this test uses raw HTTP via requests, not the qdrant-client SDK. If the SDK behavior was the original pain point, consider adding (in the client repo) an SDK test ensuring None serializes to JSON null for metadata values.
I can draft a minimal SDK test case or open an issue with repro steps if helpful.
49-58: Add a test that empty object patch is a no-op (documenting gRPC constraint)Given gRPC can’t distinguish empty map vs unset, PATCH with {"metadata": {}} should not clear all keys. Add a test to lock this behavior and reference the API constraint.
Example additional test:
def test_metadata_empty_patch_is_noop(collection_name): # Seed a key r = request_with_validation( api="/collections/{collection_name}", method="PATCH", path_params={"collection_name": collection_name}, body={"metadata": {"k": "v"}} ) assert r.ok # Empty object patch should do nothing r = request_with_validation( api="/collections/{collection_name}", method="PATCH", path_params={"collection_name": collection_name}, body={"metadata": {}} ) assert r.ok # Verify key remains r = request_with_validation( api="/collections/{collection_name}", method="GET", path_params={"collection_name": collection_name}, ) assert r.ok assert r.json()["result"]["config"]["metadata"] == {"k": "v"}This also aligns with the learnings that null clears individual keys, while empty objects do not clear all metadata.
lib/collection/src/config.rs (1)
235-240: Optional metadata on internal config is backward-compatible — add explicit null semantics to docsSerde defaults and skip when None preserve compatibility with existing config.json files. Consider extending the docstring to state that JSON null deletes a key during PATCH, and that clearing all keys via {} is intentionally unsupported due to gRPC map semantics.
- /// Arbitrary JSON metadata for the collection - /// This can be used to store application-specific information - /// such as creation time, migration data, inference model info, etc. + /// Arbitrary JSON metadata for the collection. + /// This can be used to store application-specific information + /// such as creation time, migration data, inference model info, etc. + /// + /// Update semantics (via PATCH/UpdateCollection): + /// - Keys are merged into existing metadata. + /// - Setting a key to JSON `null` removes that key. + /// - Sending an empty object `{}` does not clear all metadata (gRPC maps + /// cannot express “unset vs empty” distinctly; per-key nulls are the supported pattern).lib/storage/src/content_manager/toc/collection_meta_ops.rs (1)
178-180: Applying metadata updates in update_collection pathCalling collection.update_metadata(metadata).await? integrates cleanly with the existing update flow. No optimizer recreation needed — correct.
Optional micro-optimization: if the provided payload is empty, you could skip the write/save to avoid an unnecessary disk flush.
lib/collection/src/operations/conversions.rs (1)
1948-2047: Proto→internal conversion: empty-map-as-None preserves gRPC semanticsThe logic
- empty map -> None (no-op / “not provided”)
- non-empty map -> Some(payload)
is consistent with the gRPC limitation (empty map indistinguishable from unset). This also matches Update semantics elsewhere. Consider adding an inline comment to deter future regressions.
Apply this diff to document intent inline:
- metadata: if metadata.is_empty() { + // gRPC does not distinguish between "unset" and an empty map. + // Treat empty map as "not provided" (no-op); only propagate when non-empty. + metadata: if metadata.is_empty() { None } else { Some(api::conversions::json::proto_to_payloads(metadata)?) },lib/collection/src/operations/types.rs (1)
180-185: Public config gains metadata with correct serde behavior; plumbing is consistent
- Field is optional, skipped when None; comments explain intended usage well.
- From forwards metadata as-is, aligning REST/telemetry views.
Looks good. One minor thought: if you foresee redacting metadata in any anonymized public dumps, mirror telemetry’s #[anonymize(value = None)] here; otherwise keeping it visible in user-facing config is reasonable.
Also applies to: 198-209
lib/api/src/grpc/proto/collections.proto (1)
413-414: Clarify merge/clear semantics for metadata in proto docsExcellent: CreateCollection/UpdateCollection/CollectionConfig expose metadata as map<string, Value>. Given gRPC’s “no null vs empty map” limitation, please make the per-key clearing rule explicit in comments to avoid client confusion.
Apply the doc tweak:
message CreateCollection { - map<string, Value> metadata = 18; // Arbitrary JSON metadata for the collection + map<string, Value> metadata = 18; // Arbitrary JSON metadata for the collection. + // Note: gRPC cannot distinguish between an empty map and an unset field. + // Clients should not rely on "{}" to "clear" metadata here. } message UpdateCollection { - map<string, Value> metadata = 10; // Arbitrary JSON-like metadata for the collection, will be merged with already stored metadata + map<string, Value> metadata = 10; // Arbitrary JSON-like metadata for the collection; merged with existing. + // To remove a specific key, set it to null, e.g. {"key": null}. Sending "{}" is a no-op in gRPC. } message CollectionConfig { - map<string, Value> metadata = 7; // Arbitrary JSON metadata for the collection + map<string, Value> metadata = 7; // Arbitrary JSON metadata for the collection (empty map when absent). }Also applies to: 426-427, 466-467
lib/storage/src/content_manager/conversions.rs (2)
202-206: UpdateCollection: retain empty-map-as-no-op; add guardrail commentThe empty => None mapping matches the intended PATCH semantics (per-key null to delete). Consider an inline comment explaining why "{}" is a no-op in gRPC to prevent future “clear-all” regressions.
Apply this diff:
- metadata: if metadata.is_empty() { + // gRPC maps cannot signal "clear all" with an empty object; + // keep empty as None (no update). Use {"key": null} to remove keys. + metadata: if metadata.is_empty() { None } else { Some(json::proto_to_payloads(metadata)?) },
4-4: Nit: import alias optionalMinor: if only proto_to_payloads is used, importing it directly (use api::conversions::json::proto_to_payloads) could cut the json:: prefix. Pure style; feel free to ignore.
lib/storage/src/content_manager/collection_meta_ops.rs (2)
185-190: CreateCollection.metadata: Solid addition; add a size guard to prevent unbounded config growthArbitrary JSON at collection scope is useful, but without limits it can bloat snapshots, state diffs, and network payloads. Consider a soft cap (e.g., 64–256 KiB) with validation.
Apply this minimal change to add a validator on the field:
#[serde(default, skip_serializing_if = "Option::is_none")] + #[validate(custom(function = "validate_metadata_size"))] pub metadata: Option<Payload>,Add the helper in this module (or a shared validators module):
// Outside the struct, same file or a shared validators module use validator::ValidationError; const MAX_METADATA_BYTES: usize = 128 * 1024; // adjust as needed fn validate_metadata_size(metadata: &Option<Payload>) -> Result<(), ValidationError> { if let Some(m) = metadata { // serialize to estimate size; fallback to key count if serialization fails let size = serde_json::to_vec(m).map(|v| v.len()).unwrap_or_else(|_| m.len()); if size > MAX_METADATA_BYTES { return Err(ValidationError::new("collection_metadata_too_large")); } } Ok(()) }
263-266: Clarify metadata merge semantics and key deletion behaviorProposed update to the
metadatafield documentation inlib/storage/src/content_manager/collection_meta_ops.rs:- /// Metadata to update for the collection. If provided, this will merge with existing metadata. - /// To remove metadata, set it to an empty object. + /// Metadata to update for the collection. + /// + /// Semantics: + /// - Omitted (`None`): leave existing metadata unchanged. + /// - Provided as an empty object `{}`: clear _all_ existing metadata. + /// - Provided as a non-empty map: + /// • For each key with a non-null value: set or overwrite that key. + /// • For each key with a `null` value: remove that key from existing metadata. + /// - Merge is _shallow_: top-level keys are merged; nested object values are replaced outright, not merged recursively. #[serde(default, skip_serializing_if = "Option::is_none")] pub metadata: Option<Payload>,This makes it explicit:
- how omission is handled
- how full-clear vs. per-key deletion works
- that the merge is shallow rather than recursive
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (33)
docs/grpc/docs.md(6 hunks)docs/redoc/master/openapi.json(5 hunks)lib/api/src/grpc/proto/collections.proto(4 hunks)lib/api/src/grpc/qdrant.rs(4 hunks)lib/collection/benches/batch_query_bench.rs(1 hunks)lib/collection/benches/batch_search_bench.rs(1 hunks)lib/collection/src/collection/collection_ops.rs(2 hunks)lib/collection/src/collection/state_management.rs(1 hunks)lib/collection/src/config.rs(2 hunks)lib/collection/src/operations/conversions.rs(4 hunks)lib/collection/src/operations/types.rs(3 hunks)lib/collection/src/operations/verification/mod.rs(1 hunks)lib/collection/src/shards/local_shard/shard_ops.rs(1 hunks)lib/collection/src/shards/replica_set/update.rs(1 hunks)lib/collection/src/shards/shard_holder/shard_mapping.rs(1 hunks)lib/collection/src/telemetry.rs(4 hunks)lib/collection/src/tests/fixtures.rs(1 hunks)lib/collection/src/tests/points_dedup.rs(1 hunks)lib/collection/src/tests/query_prefetch_offset_limit.rs(1 hunks)lib/collection/src/tests/snapshot_test.rs(1 hunks)lib/collection/tests/integration/common/mod.rs(1 hunks)lib/collection/tests/integration/multi_vec_test.rs(1 hunks)lib/collection/tests/integration/snapshot_recovery_test.rs(1 hunks)lib/segment/src/types.rs(1 hunks)lib/storage/src/content_manager/collection_meta_ops.rs(6 hunks)lib/storage/src/content_manager/conversions.rs(5 hunks)lib/storage/src/content_manager/mod.rs(1 hunks)lib/storage/src/content_manager/toc/collection_meta_ops.rs(2 hunks)lib/storage/src/content_manager/toc/create_collection.rs(2 hunks)lib/storage/tests/integration/alias_tests.rs(1 hunks)src/consensus.rs(1 hunks)src/migrations/single_to_cluster.rs(5 hunks)tests/openapi/test_collection_metadata.py(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-08-25T09:23:06.556Z
Learnt from: generall
PR: qdrant/qdrant#7123
File: lib/storage/src/content_manager/conversions.rs:202-206
Timestamp: 2025-08-25T09:23:06.556Z
Learning: In Qdrant's collection metadata system, individual metadata keys can be cleared by setting them to null in PATCH requests, e.g., `{"metadata": {"key": null}}` will remove the specific "key" from the collection's metadata.
Applied to files:
tests/openapi/test_collection_metadata.pylib/storage/src/content_manager/conversions.rsdocs/grpc/docs.md
📚 Learning: 2025-08-25T09:26:13.854Z
Learnt from: generall
PR: qdrant/qdrant#7123
File: lib/storage/src/content_manager/conversions.rs:202-206
Timestamp: 2025-08-25T09:26:13.854Z
Learning: In Qdrant's API design, gRPC protocol limitations prevent using empty objects `{}` to clear all metadata because gRPC doesn't distinguish between empty maps and null/unset values. This constraint requires API consistency between gRPC and REST endpoints, making per-key clearing with null values (`{"key": null}`) the standard approach for metadata management.
Applied to files:
tests/openapi/test_collection_metadata.pylib/storage/src/content_manager/conversions.rs
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Applied to files:
lib/collection/src/config.rs
📚 Learning: 2025-08-15T11:41:10.629Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
Applied to files:
lib/collection/src/config.rs
📚 Learning: 2025-08-11T00:37:34.100Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.
Applied to files:
lib/collection/src/config.rs
🧬 Code graph analysis (7)
tests/openapi/test_collection_metadata.py (2)
tests/openapi/helpers/helpers.py (1)
request_with_validation(41-116)tests/openapi/conftest.py (1)
on_disk_vectors(11-12)
lib/collection/src/telemetry.rs (1)
lib/segment/src/common/anonymize.rs (13)
anonymize(45-45)anonymize(49-51)anonymize(55-57)anonymize(61-63)anonymize(67-71)anonymize(75-79)anonymize(115-119)anonymize(123-127)anonymize(131-135)anonymize(139-148)anonymize(152-154)anonymize(158-162)anonymize(166-181)
lib/collection/src/shards/local_shard/shard_ops.rs (1)
lib/collection/src/operations/types.rs (15)
from(97-104)from(188-210)from(259-280)from(671-673)from(691-693)from(1185-1190)from(1194-1199)from(1203-1253)from(1257-1259)from(1263-1268)from(1272-1277)from(1281-1286)from(1290-1295)from(1299-1304)from(1308-1313)
lib/api/src/grpc/qdrant.rs (1)
lib/collection/src/operations/conversions.rs (1)
map(1930-1932)
src/migrations/single_to_cluster.rs (3)
lib/collection/src/shards/shard_holder/mod.rs (2)
new(77-110)shards(284-284)lib/collection/src/collection/mod.rs (1)
new(103-200)lib/collection/src/shards/shard_holder/shard_mapping.rs (2)
shard_key(71-76)shard_ids(50-59)
lib/collection/src/operations/conversions.rs (1)
lib/api/src/conversions/json.rs (2)
payload_to_proto(9-14)proto_to_payloads(49-55)
lib/storage/src/content_manager/conversions.rs (1)
lib/api/src/conversions/json.rs (1)
proto_to_payloads(49-55)
🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md
17-17: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
29-29: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
88-88: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consensus-compose
- GitHub Check: storage-compat-test
- GitHub Check: integration-tests
- GitHub Check: integration-tests-consensus
- GitHub Check: e2e-tests
- GitHub Check: test-consistency
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: lint
🔇 Additional comments (33)
lib/collection/src/shards/local_shard/shard_ops.rs (1)
165-168: Switch to explicit From conversion is clear and type-safe (LGTM)Using CollectionInfo::from(...) makes the intent explicit and avoids inference pitfalls with Into. No functional change.
src/migrations/single_to_cluster.rs (5)
3-5: Importing State and internal config types is appropriate for the new state-based migration pathThis aligns the migration with the new collection state API. Looks correct.
41-49: State destructuring is precise; ignored fields are documented (good)The explicit comments for resharding/transfers/payload_index_schema make the intent clear during single→cluster migration.
67-83: CreateCollection fields sourced from params/config look consistent; metadata inclusion is correctFields mapped from params.* and *_config.into() are coherent. The deprecated init_from is intentionally set to None — OK. No functional issues spotted.
140-156: Ordering of meta ops: ensure SetShardReplicaState runs after Create operations are appliedYou await each submit, but depending on dispatcher semantics, the create may be enqueued rather than fully applied when SetShardReplicaState is submitted. If the dispatcher guarantees in-order, durable application before returning, you’re fine; otherwise, consider chaining on applied notifications or batching ops.
If unsure, please point to dispatcher submit semantics or add a comment to document the guarantee.
50-59: Verify metadata propagation end-to-endI wasn’t able to locate a
CreateCollectiondefinition in thelib/collectioncrate, so please double-check that the consensus op still carries the newmetadatafield and that it’s applied on followers. I have, however, confirmed that your migration does surfacemetadatain the public types:• In
lib/collection/src/operations/types.rs,
impl From<CollectionConfigInternal> for CollectionConfigpreservesmetadataunchanged.
• Inlib/collection/src/telemetry.rs,
impl From<CollectionConfigInternal> for CollectionConfigTelemetryalso destructures and re-emitsmetadata.Please ensure:
- The consensus
CreateCollectionop (wherever it is defined) declares ametadata: Option<Payload>field.- The follower’s apply/execute logic for
CreateCollectionpersists that field intoCollectionConfigInternal.- The telemetry or API-facing “collection info” type (e.g. your
CollectionInfoor equivalent) includesmetadataand returns it after migration.lib/api/src/grpc/qdrant.rs (2)
2-101: Consolidated JSON-like Value types look goodMoving Struct/Value/ListValue/NullValue to a single, public place and reusing them across messages removes duplication and clarifies the API surface. Tag ranges and oneof layout look consistent and non‑overlapping.
963-966: Metadata round-trip verifiedAll three sanity checks pass, confirming that
CollectionConfig.metadatais fully wired through:
- The gRPC definition in
lib/api/src/grpc/qdrant.rsdefines exactly oneStruct, oneValue, and oneListValue—no duplicates remain in that file.- The server-side handler in
lib/collection/src/collection/collection_ops.rsimplementsupdate_metadata, which either merges into existing metadata or sets it anew (current_metadata.merge(&metadata)ormetadata = Some(…)).- gRPC JSON values are converted into internal payloads via the
api::conversions::json::proto_to_payloadshelper (and the accompanyingTryFrom/TryIntoimpls), ensuring any map<string,Value> round-trips correctly into the storage layer.No further changes required.
docs/grpc/docs.md (2)
512-513: LGTM: CollectionConfig.metadata field is correctly documentedThe field type and link to Value are accurate, and the description is clear. No issues.
729-730: LGTM: CreateCollection.metadata field is correctly documentedMatches the proto and mirrors CollectionConfig usage. No issues.
lib/collection/src/tests/fixtures.rs (1)
53-54: LGTM: fixtures now initialize the new metadata field explicitlySetting
metadata: Nonekeeps test fixtures aligned with the newCollectionConfigInternalshape and avoids accidental default differences. No further action needed.lib/collection/src/collection/state_management.rs (1)
138-139: Correctly treat metadata changes as config updates without touching optimizersThe destructuring pattern forces compile-time coverage for new fields, and adding
is_metadata_updatedintois_config_updatedensures metadata-only updates are persisted while avoiding unnecessary optimizer recreation viais_core_config_updated. This matches expected semantics for user-provided collection metadata.Also applies to: 146-147, 151-155
lib/collection/tests/integration/multi_vec_test.rs (1)
66-67: LGTM: test config updated for new fieldExplicit
metadata: Nonekeeps the integration test’sCollectionConfigInternalliteral in sync with the struct definition. Good to go.lib/collection/src/shards/replica_set/update.rs (1)
772-773: LGTM: config literal includes metadataThe test helper config now includes
metadata: None, consistent with the new internal config schema. No issues spotted.src/consensus.rs (1)
1541-1542: LGTM: consensus test constructs CreateCollection with metadataAdding
metadata: NonetoCreateCollectionkeeps the test aligned with the extended API surface. This also documents the field’s optionality in a core-path test.lib/collection/benches/batch_search_bench.rs (1)
89-90: Metadata field initialization is correctSetting
metadata: Nonekeeps the bench aligned with the newCollectionConfigInternalshape without changing behavior. LGTM.lib/storage/tests/integration/alias_tests.rs (1)
124-126: Test update matches API surfaceAdding
metadata: Nonekeeps the integration test in sync with the newCreateCollectionshape. No behavior change. LGTM.lib/collection/benches/batch_query_bench.rs (1)
71-72: Metadata set to None — consistent with config evolutionBench remains behaviorally identical while tracking the new field. Looks good.
tests/openapi/test_collection_metadata.py (1)
7-14: Good autouse fixture with reliable teardownCollection setup/teardown via yield looks correct and safe across failures.
lib/collection/src/collection/collection_ops.rs (1)
7-7: Importing Payload here is correctRequired for new metadata update flow. No issues.
lib/collection/src/config.rs (1)
14-17: Payload import is necessary for metadata fieldLooks good.
lib/storage/src/content_manager/toc/create_collection.rs (2)
51-52: Plumbing metadata through CreateCollection operationDestructuring includes metadata — expected.
214-223: Persisting initial metadata on collection creation – mapping verifiedStoring the provided metadata into
CollectionConfigInternalis correct, and the public API surfaces it back to clients as expected:
- In
lib/collection/src/operations/types.rs(around line 184), the publicCollectionConfigstruct declares
pub metadata: Option<Payload>with the appropriate serde annotations.- In the same file (around lines 196–199), the
From<CollectionConfigInternal>conversion destructures and includesmetadatawhen buildingCollectionConfig.- In
lib/collection/src/operations/conversions.rs(sections starting at line 396 and again at 2040),metadatais correctly mapped to and from the gRPC/Proto representations viapayload_to_protoandproto_to_payloads.All mappings ensure that any metadata provided at collection creation flows through the internal config into the public-facing types and across the wire. No further changes are needed.
lib/storage/src/content_manager/toc/collection_meta_ops.rs (1)
131-140: Including metadata in UpdateCollection structDestructuring the new optional field is correct.
lib/collection/src/telemetry.rs (2)
89-93: Telemetry: metadata field is correctly optional and anonymizedUsing Option with serde(default, skip_serializing_if) avoids noisy output, and #[anonymize(value = None)] prevents leaking app metadata in telemetry. Good call.
105-116: Plumbing metadata through telemetry From<...> is completeForwarding metadata from CollectionConfigInternal into CollectionConfigTelemetry ensures consistency with public API responses. No gaps spotted.
lib/collection/src/operations/conversions.rs (1)
398-399: Expose metadata in gRPC CollectionInfo.config; defaulting to empty map is correct
- Mapping Option to map<string, Value> via payload_to_proto and unwrap_or_default() is the right shape for gRPC (no null vs empty distinction).
- This keeps GetCollectionInfo responses backward compatible while surfacing metadata when present.
No action needed.
Also applies to: 530-533
lib/api/src/grpc/proto/collections.proto (1)
6-7: Verify Protobuf Include Paths for json_with_int.protoConfirmed that
json_with_int.protoexists and is imported where needed:
- File:
lib/api/src/grpc/proto/json_with_int.proto- Referenced in:
•lib/api/src/grpc/proto/points.proto(line 8)
•lib/api/src/grpc/proto/collections.proto(line 6)Please ensure that every
protoc/BUILD invocation or Gradle/Maven source-set for your gRPC targets includes thelib/api/src/grpc/protodirectory (or otherwise referencesjson_with_int.proto) so thatmap<string, Value>fields compile correctly across all languages and targets.lib/storage/src/content_manager/conversions.rs (1)
113-118: CreateCollection: empty-map-as-None is intentional and consistentTreating an empty metadata map as None keeps REST/gRPC behavior aligned and avoids accidental full clears. Good.
lib/storage/src/content_manager/collection_meta_ops.rs (4)
288-291: new_empty defaults: LGTMInitializing
metadata: Noneensures PATCH requests don’t accidentally wipe metadata. Good.
467-470: Passingmetadatathrough the CreateCollection builder: LGTMIncluding
metadatain the constructedCreateCollectionpreserves config fidelity. No issues spotted.
439-441: Metadata propagation verified across conversion implementationsI’ve checked all relevant
From/TryFromimplementations that transform betweenCollectionConfigInternaland the various create/update operation types:
lib/storage/src/content_manager/collection_meta_ops.rs–impl From<CollectionConfigInternal> for CreateCollection(lines 429–431) includesmetadata.lib/storage/src/content_manager/conversions.rs– both theFrom<CreateCollectionOperation>andTryFrom<UpdateCollectionOperation>blocks destructure and handlemetadata(lines 83–86 and 179–182).lib/storage/src/content_manager/toc/create_collection.rs– operation destructuring (lines 49–52 and 220–223) correctly capturesmetadata.- All other collection and telemetry conversions (e.g., in
operations/conversions.rs,collection/src/operations/types.rs, and related modules) also destructure and propagatemetadataconsistently.No instances were found where
metadatawas omitted. Everything is in parity, so no further changes are needed.
18-20: ✅ Confirmed:PayloadImplementsJsonSchemaI verified that the
segment::types::Payloadtype derivesJsonSchema, so it does indeed implement the trait required for OpenAPI schema generation:
- In
segment/src/types.rs:(confirmed via ripgrep) (britannica.com)#[derive(Clone, Debug, Eq, PartialEq, Deserialize, Serialize, JsonSchema, Hash)] #[schemars(example = "payload_example")] pub struct Payload(pub Map<String, Value>);Since
CreateCollection(incollection_meta_ops.rs) derivesJsonSchema, and any reference toPayloadin this module will rely on that implementation, no additional schema override is needed.
| "metadata": { | ||
| "description": "Metadata to update for the collection. If provided, this will merge with existing metadata. To remove metadata, set it to an empty object.", | ||
| "anyOf": [ | ||
| { | ||
| "$ref": "#/components/schemas/Payload" | ||
| }, | ||
| { | ||
| "nullable": true | ||
| } | ||
| ] |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Clarify PATCH semantics: merge vs replace vs delete are ambiguous.
The description says “merge with existing metadata” and “to remove metadata, set it to an empty object.” If the server performs a shallow merge (RFC 7386-like), sending {} won’t remove existing keys; it’s a no-op. If the server replaces, then merge wording is misleading.
Please explicitly specify:
- whether it’s a shallow object merge (unspecified keys left intact),
- how to replace the entire object,
- how to delete the metadata completely.
Proposed wording below assumes shallow merge + explicit null to clear:
- "metadata": {
- "description": "Metadata to update for the collection. If provided, this will merge with existing metadata. To remove metadata, set it to an empty object.",
+ "metadata": {
+ "description": "Metadata to update for the collection. If provided, keys are shallow-merged into the existing metadata: provided keys overwrite existing ones; unspecified keys remain unchanged. To replace the entire metadata object, send the full object you want stored. To clear all metadata, set this field to null.",
"anyOf": [
{ "$ref": "#/components/schemas/Payload" },
{ "nullable": true }
]
}If the backend behavior differs (e.g., replace-on-PATCH), adjust the text accordingly and consider adding an explicit x-notes or example:
"metadata": {
"description": "...",
"anyOf": [ { "$ref": "#/components/schemas/Payload" }, { "nullable": true } ],
+ "examples": [
+ { "summary": "Merge keys", "value": { "ingestion_version": "v2" } },
+ { "summary": "Clear all metadata", "value": null },
+ { "summary": "Replace entire object", "value": { "owner": "ml" } }
+ ]
}Would you like me to add an OpenAPI test/example request under the PATCH endpoint showing these three cases?
I’ll verify how the server handles metadata in PATCH by inspecting the relevant Rust code:
🏁 Script executed:
#!/bin/bash
# Locate the PatchCollection handler to see how metadata fields are applied
rg -nP --type=rs "PatchCollection" -g "lib/storage/src" -C3
# Inspect the merging logic in conversions.rs where null vs empty object is handled
sed -n '180,240p' lib/storage/src/content_manager/conversions.rsLength of output: 2280
To confirm the exact metadata‐handling semantics, let’s inspect the top of conversions.rs (where PatchCollection is converted) and how the raw gRPC metadata field is flattened:
🏁 Script executed:
#!/bin/bash
# 1) Show the file header & beginning of conversions.rs to find how `metadata` is mapped
sed -n '1,200p' lib/storage/src/content_manager/conversions.rs
# 2) Search specifically for any mapping from the gRPC Struct to an internal map
rg -nP '\.metadata' -n lib/storage/src/content_manager/conversions.rsLength of output: 9160
Clarify collection metadata PATCH semantics
The OpenAPI description for metadata is currently inaccurate:
- By default, only the keys you include are shallow-merged into existing metadata; unspecified keys are left untouched.
- Sending
{}ornulldoes not clear existing metadata—it’s treated as “no update.” - To delete individual metadata keys, set their values to
null(e.g.{ "table": null }). - There is no built-in way to wholesale replace or clear the entire metadata object in one shot.
Please update the schema and description accordingly. For example:
"metadata": {
- "description": "Metadata to update for the collection. If provided, this will merge with existing metadata. To remove metadata, set it to an empty object.",
+ "description": "Metadata update semantics: only provided keys are shallow-merged into existing metadata; unspecified keys remain unchanged. To delete a key, set its value to `null`; sending `{}` or `null` performs no update. There is no single-step clear-all operation.",
"anyOf": [
{ "$ref": "#/components/schemas/Payload" },
{ "nullable": true }
],
+ "examples": [
+ {
+ "summary": "Merge keys",
+ "value": { "ingestion_version": "v2" }
+ },
+ {
+ "summary": "Delete a key",
+ "value": { "table": null }
+ },
+ {
+ "summary": "No update",
+ "value": {}
+ }
+ ]
}Would you like to add a note or example clarifying that “clear all” must be done by explicitly nulling each existing key?
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "metadata": { | |
| "description": "Metadata to update for the collection. If provided, this will merge with existing metadata. To remove metadata, set it to an empty object.", | |
| "anyOf": [ | |
| { | |
| "$ref": "#/components/schemas/Payload" | |
| }, | |
| { | |
| "nullable": true | |
| } | |
| ] | |
| "metadata": { | |
| "description": "Metadata update semantics: only provided keys are shallow-merged into existing metadata; unspecified keys remain unchanged. To delete a key, set its value to `null`; sending `{}` or `null` performs no update. There is no single-step clear-all operation.", | |
| "anyOf": [ | |
| { | |
| "$ref": "#/components/schemas/Payload" | |
| }, | |
| { | |
| "nullable": true | |
| } | |
| ], | |
| "examples": [ | |
| { | |
| "summary": "Merge keys", | |
| "value": { "ingestion_version": "v2" } | |
| }, | |
| { | |
| "summary": "Delete a key", | |
| "value": { "table": null } | |
| }, | |
| { | |
| "summary": "No update", | |
| "value": {} | |
| } | |
| ] | |
| } |
| metadata: if metadata.is_empty() { | ||
| None | ||
| } else { | ||
| Some(json::proto_to_payloads(metadata)?) | ||
| }, | ||
| }, |
There was a problem hiding this comment.
💡 Verification agent
❓ Verification inconclusive
Add tests to lock semantics: per-key null clears; empty map no-ops
To prevent regressions, add e2e tests for both REST and gRPC:
- PATCH with {"metadata":{"k":null}} removes key k.
- PATCH with {"metadata":{}} leaves metadata unchanged.
Also applies to: 202-206
🏁 Script executed:
#!/bin/bash
# Search existing tests for metadata update/clear semantics
rg -nP -C3 --type=rust --type=py '(PATCH|UpdateCollection).*metadata' tests lib | sed -n '1,200p'
rg -nP -C2 --type=py '"metadata"\s*:\s*\{\s*\}' tests
rg -nP -C2 --type=py '"metadata"\s*:\s*\{[^}]*null' testsLength of output: 213
Secure metadata patch semantics with end-to-end tests
To prevent regressions in both REST and gRPC APIs, we need to explicitly lock in the intended behavior when clients PATCH point metadata:
- PATCH with
{"metadata":{"key":null}}must remove thekeyentry from the existing metadata map. - PATCH with
{"metadata":{}}must be a no-op and leave all existing metadata unchanged.
Please add end-to-end tests covering both cases, targeting the conversion logic in:
lib/storage/src/content_manager/conversions.rs(around lines 113–118 and 202–206)
Include tests for:
- REST: HTTP PATCH against
/collections/{name}/pointsto verify per-key null clears and empty-map no-ops. - gRPC:
UpdateCollectioncalls withmetadatamaps containing null values and empty maps, asserting identical behavior.
These tests are mandatory to ensure future changes to metadata handling cannot introduce breaking regressions.
🤖 Prompt for AI Agents
In lib/storage/src/content_manager/conversions.rs around lines 113–118 and
202–206, add end-to-end tests to lock in PATCH semantics for point metadata:
create REST tests that send HTTP PATCH to /collections/{name}/points with body
{"metadata":{"key":null}} and assert the specific key is removed from the
existing metadata map, and a second REST test that sends {"metadata":{}} and
asserts existing metadata remains unchanged; likewise add gRPC tests that call
UpdateCollection with metadata maps containing null values (to verify per-key
deletion) and with empty maps (to verify no-op), and assert the resulting stored
metadata matches the REST behavior exactly. Ensure tests exercise the conversion
logic path in the specified file (covering both code locations) and include
setup/teardown so they run end-to-end against the server instance.
| for (shard_key, shard_ids) in shards_key_mapping.iter() { | ||
| let mut placement = Vec::new(); | ||
|
|
||
| for shard_id in shards { | ||
| let shard_info = collection_state.shards.get(shard_id).unwrap(); | ||
| for shard_id in shard_ids { | ||
| let shard_info = shards.get(shard_id).unwrap(); | ||
| placement.push(shard_info.replicas.keys().copied().collect()); | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Avoid unwrap on shards.get(shard_id) to prevent migration-time panics
Unwrap will panic if a shard_key_mapping references a shard absent from shards (possible after prior manual edits or partial failures). Handle gracefully and log.
Apply this safe access:
- for shard_id in shard_ids {
- let shard_info = shards.get(shard_id).unwrap();
- placement.push(shard_info.replicas.keys().copied().collect());
- }
+ for shard_id in shard_ids {
+ if let Some(shard_info) = shards.get(shard_id) {
+ placement.push(shard_info.replicas.keys().copied().collect());
+ } else {
+ tracing::warn!(?shard_id, "Shard id missing in shards map during migration; skipping placement entry");
+ continue;
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for (shard_key, shard_ids) in shards_key_mapping.iter() { | |
| let mut placement = Vec::new(); | |
| for shard_id in shards { | |
| let shard_info = collection_state.shards.get(shard_id).unwrap(); | |
| for shard_id in shard_ids { | |
| let shard_info = shards.get(shard_id).unwrap(); | |
| placement.push(shard_info.replicas.keys().copied().collect()); | |
| } | |
| for (shard_key, shard_ids) in shards_key_mapping.iter() { | |
| let mut placement = Vec::new(); | |
| for shard_id in shard_ids { | |
| if let Some(shard_info) = shards.get(shard_id) { | |
| placement.push(shard_info.replicas.keys().copied().collect()); | |
| } else { | |
| tracing::warn!( | |
| ?shard_id, | |
| "Shard id missing in shards map during migration; skipping placement entry" | |
| ); | |
| continue; | |
| } | |
| } | |
| // ...rest of the migration logic... | |
| } |
🤖 Prompt for AI Agents
In src/migrations/single_to_cluster.rs around lines 115-121, avoid using
unwrap() on shards.get(shard_id) — replace it with a safe lookup (if let
Some(...) / match) that logs a warning including the shard_id and shard_key when
the shard is missing and then continues (or skips adding placement for that
shard_id) instead of panicking; use the existing logging facility (e.g., warn!
or process logger) to record context so the migration can proceed safely without
crashing.
* implement custom collection metadata * persist metadata change * Also unset a key in the test --------- Co-authored-by: timvisee <tim@visee.me>
#3957