Skip to content

Move SerdeWal from collection into shard crate#7146

Merged
ffuugoo merged 4 commits intodevfrom
qdrant-edge-extract-serde-wal
Aug 26, 2025
Merged

Move SerdeWal from collection into shard crate#7146
ffuugoo merged 4 commits intodevfrom
qdrant-edge-extract-serde-wal

Conversation

@ffuugoo
Copy link
Contributor

@ffuugoo ffuugoo commented Aug 26, 2025

Another one for Qdrant on Edge. Only moving files and minimal cleanup, no code changes.

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you successfully ran tests with your changes locally?

@ffuugoo ffuugoo requested review from generall and timvisee August 26, 2025 11:28
@ffuugoo ffuugoo added this to the Qdrant on Edge milestone Aug 26, 2025
@coderabbitai

This comment was marked as resolved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
lib/shard/src/wal.rs (1)

58-58: Doc fix: “does not guarantee durability”

The sentence currently says “does guarantee durability”, which reverses the meaning.

Apply this diff:

-    /// Write a record to the WAL but does guarantee durability.
+    /// Write a record to the WAL but does not guarantee durability.
lib/collection/src/update_handler.rs (1)

831-842: Flush error handling only catches thread panics, not IO errors from flush

flush_async() returns a std::thread::JoinHandle<std::io::Result<()>>. The current code checks only join()’s panic path (Err(_)) and ignores Ok(Err(io_err)), potentially proceeding after a failed WAL flush.

Apply this diff to handle both failure modes and also fix a small variable naming typo:

-            trace!("Attempting flushing");
-            let wal_flash_job = wal.lock().await.flush_async();
-
-            if let Err(err) = wal_flash_job.join() {
-                error!("Failed to flush wal: {err:?}");
-                segments
-                    .write()
-                    .report_optimizer_error(WalError::WriteWalError(format!(
-                        "WAL flush error: {err:?}"
-                    )));
-                continue;
-            }
+            trace!("Attempting flushing");
+            let wal_flush_job = wal.lock().await.flush_async();
+
+            match wal_flush_job.join() {
+                Ok(Ok(())) => {} // all good
+                Ok(Err(err)) => {
+                    error!("Failed to flush wal: {err:?}");
+                    segments
+                        .write()
+                        .report_optimizer_error(WalError::WriteWalError(format!(
+                            "WAL flush IO error: {err:?}"
+                        )));
+                    continue;
+                }
+                Err(err) => {
+                    error!("WAL flush thread panicked: {err:?}");
+                    segments
+                        .write()
+                        .report_optimizer_error(WalError::WriteWalError(format!(
+                            "WAL flush thread panic: {err:?}"
+                        )));
+                    continue;
+                }
+            }
🧹 Nitpick comments (5)
lib/shard/Cargo.toml (1)

25-25: Prefer workspace-managed dependency for consistency

Most deps use { workspace = true }. Consider aligning rmp-serde to the workspace to avoid drift and simplify upgrades.

Apply this diff if rmp-serde is already defined in the workspace:

-rmp-serde = "~1.3"
+rmp-serde = { workspace = true }
lib/shard/src/wal.rs (3)

60-65: Avoid panic on serialization failure in write()

serde_cbor::to_vec(...).unwrap() will panic on malformed/unsupported data. Prefer returning WalError::WriteWalError instead of panicking.

Apply this diff:

-        // ToDo: Replace back to faster rmp, once this https://github.com/serde-rs/serde/issues/2055 solved
-        let binary_entity = serde_cbor::to_vec(&entity).unwrap();
+        // ToDo: Replace back to faster rmp, once this https://github.com/serde-rs/serde/issues/2055 solved
+        let binary_entity = serde_cbor::to_vec(&entity)
+            .map_err(|err| WalError::WriteWalError(format!("failed to serialize record: {err}")))?;
         self.wal
             .append(&binary_entity)
             .map_err(|err| WalError::WriteWalError(format!("{err:?}")))

85-93: Deserialization failures currently hard-panic

Both entry(...).expect(...) and from_slice(...).expect(...) will crash on IO or format errors (e.g., partial writes, corruption, or version skew not covered by the fallback). Consider mapping to WalError and letting callers decide how to handle bad records.

If you want a minimally invasive tweak without changing external behavior, at least switch the expect messages to include the index and path for more actionable crash reports. I can provide a full error-propagating diff if desired.


166-172: Error variant mismatch when persisting first-index file

A failure to write first-index is classified as TruncateWalError. That’s misleading; it’s an IO/write error.

Apply this diff:

-        .map_err(|err| {
-            WalError::TruncateWalError(format!("failed to write first-index file: {err:?}"))
-        })?;
+        .map_err(|err| {
+            WalError::WriteWalError(format!("failed to write first-index file: {err:?}"))
+        })?;
lib/collection/src/update_handler.rs (1)

831-833: Nit: rename wal_flash_job -> wal_flush_job

Typo in variable name; rename improves readability and greppability.

(Handled by the diff above.)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9b1bbac and 7a64ff3.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • lib/collection/src/lib.rs (0 hunks)
  • lib/collection/src/operations/types.rs (1 hunks)
  • lib/collection/src/shards/local_shard/mod.rs (1 hunks)
  • lib/collection/src/shards/local_shard/shard_ops.rs (1 hunks)
  • lib/collection/src/update_handler.rs (1 hunks)
  • lib/collection/src/wal_delta.rs (5 hunks)
  • lib/shard/Cargo.toml (1 hunks)
  • lib/shard/src/lib.rs (1 hunks)
  • lib/shard/src/wal.rs (5 hunks)
💤 Files with no reviewable changes (1)
  • lib/collection/src/lib.rs
🧰 Additional context used
🧬 Code graph analysis (1)
lib/shard/src/wal.rs (2)
lib/collection/src/shards/local_shard/mod.rs (2)
  • new (168-250)
  • new (1229-1234)
lib/collection/src/wal_delta.rs (1)
  • new (33-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: test-consistency
  • GitHub Check: e2e-tests
  • GitHub Check: lint
  • GitHub Check: storage-compat-test
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests-consensus
🔇 Additional comments (12)
lib/shard/Cargo.toml (1)

28-28: LGTM: required deps added for moved WAL module

serde_cbor, thiserror, and wal additions make sense for the new shard::wal surface.

Also applies to: 31-31, 34-34

lib/shard/src/wal.rs (2)

136-159: Visibility change of ack() is appropriate for cross-crate usage

Making ack public is necessary now that collection calls it through shard::wal. No concerns.


24-24: Underscored PhantomData field rename is a nice cleanup

Renaming to _record removes the unused-field lint without altering behavior.

Also applies to: 54-55

lib/collection/src/update_handler.rs (1)

18-18: Import hop to shard::wal is correct

WalError now lives under shard::wal; this update matches the crate move.

lib/collection/src/operations/types.rs (1)

41-41: LGTM: WalError path updated

Importing WalError from shard::wal aligns with the module move. The existing From<WalError> for CollectionError continues to work unchanged.

lib/shard/src/lib.rs (1)

6-6: Good: expose shard::wal as a public module

This is the right place for WAL now that it conceptually belongs to shards.

lib/collection/src/shards/local_shard/shard_ops.rs (1)

75-85: Updated match arm for ClockRejected is correct after the move.

Catching Err(shard::wal::WalError::ClockRejected) preserves existing behavior and surfaces UpdateStatus::ClockRejected as before.

Given the Err(err) => return Err(err.into()) path now receives shard::wal::WalError, please confirm a From<WalError> for CollectionError exists (otherwise this won’t compile). You can reuse the script from the other comment (step 3).

lib/collection/src/wal_delta.rs (5)

4-4: Import path updated to shard::wal::SerdeWal — aligns with crate reorg.

This matches the new ownership of the WAL implementation in the shard crate.


63-64: Returning WalError::ClockRejected at the boundary is correct.

The early rejection on clock drift continues to be enforced before writing to WAL. No behavioral change, just the new path.


132-144: Test helper append_from signature migrated to shard::wal::Result — OK.

The test-only API tracks the new result type; downstream tests already unwrap on it.


272-272: Tests import SerdeWal from shard::wal — consistent with the move.

All test fixtures here now use the new path; looks good.


53-70: Verification Complete: lock_and_write call-sites updated

All usages of the revised lock_and_write signature have been adjusted to the new shard::wal::Result return type:

  • lib/collection/src/shards/local_shard/shard_ops.rs – the main call in LocalShard::update now matches on
    Err(shard::wal::WalError::ClockRejected) as intended.
  • lib/collection/src/wal_delta.rs – internal operations and test scaffolding invoke .await.unwrap() on the Result, aligning with the updated API.

No remaining call-sites lack the proper handling or unwrapping of the new Result type. The implementation and tests are consistent with the API change.

coderabbitai[bot]

This comment was marked as off-topic.

@ffuugoo ffuugoo force-pushed the qdrant-edge-extract-serde-wal branch from 24272f6 to e2dc630 Compare August 26, 2025 12:47
coderabbitai[bot]

This comment was marked as off-topic.

@ffuugoo ffuugoo merged commit 6fa24e1 into dev Aug 26, 2025
16 checks passed
@ffuugoo ffuugoo deleted the qdrant-edge-extract-serde-wal branch August 26, 2025 14:27
timvisee pushed a commit that referenced this pull request Aug 26, 2025
@ffuugoo ffuugoo mentioned this pull request Aug 26, 2025
9 tasks
@coderabbitai coderabbitai bot mentioned this pull request Sep 3, 2025
9 tasks
@coderabbitai coderabbitai bot mentioned this pull request Oct 20, 2025
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants