Skip to content

Dont lock WAL while serialization#8093

Merged
IvanPleshkov merged 3 commits intodevfrom
dont-lock-WAL-while-serialization
Feb 12, 2026
Merged

Dont lock WAL while serialization#8093
IvanPleshkov merged 3 commits intodevfrom
dont-lock-WAL-while-serialization

Conversation

@IvanPleshkov
Copy link
Contributor

@IvanPleshkov IvanPleshkov commented Feb 9, 2026

This PR is a try to fix an empty update queue issue, where the upate queue is empty under pressure on some configurations.

Local tests didnt show differences but it seems this changes are nice to merge anyway.

The core idea is making WAL lock more efficient. Currently, we lock WAL for IO and for serialization.

Actually, it's not needed to lock the WAL for expensive serialization; this PR fixes it by introducing the serializable record entity,

@IvanPleshkov IvanPleshkov added this to the Update queue milestone Feb 9, 2026
@IvanPleshkov IvanPleshkov changed the title dont lock WAL while serialization Dont lock WAL while serialization Feb 10, 2026
@IvanPleshkov IvanPleshkov marked this pull request as ready for review February 10, 2026 11:28
@qdrant qdrant deleted a comment from coderabbitai bot Feb 10, 2026
/// (this is used to extend recoverable history and allow WAL shard transfers)
const INCREASED_RETENTION_FACTOR: usize = 10;

pub struct WalSerializedRecord<R> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd like WalRawRecord a bit better

impl<R: DeserializeOwned + Serialize> WalSerializedRecord<R> {
pub fn new(record: &R) -> Result<Self> {
// ToDo: Replace back to faster rmp, once this https://github.com/serde-rs/serde/issues/2055 solved
let record = serde_cbor::to_vec(record).map_err(|err| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This vec seems to over allocate (by powers of two). But we only keep these in memory for a short period, don't we?

@timvisee
Copy link
Member

It makes using the WAL slightly more difficult, but I understand the benefit. Not locking the WAL during (de)serialization can be significant with very large operations.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@lib/shard/src/wal.rs`:
- Around line 40-46: The error message in Wal::new (the new function) is
misleading because serde_cbor::to_vec failing is a serialization error before
any WAL I/O; update the WalError::WriteWalError constructed in the map_err for
serde_cbor::to_vec to state that serialization of the provided record (type R)
failed — e.g., "Failed to serialize record for WAL (Serialize impl error):
{err:?}" — and keep including the original err details to aid debugging; locate
the serde_cbor::to_vec(...) map_err in pub fn new(record: &R) -> Result<Self>
and replace the misleading "corrupted WAL or version mismatch" text accordingly.
🧹 Nitpick comments (2)
lib/shard/src/wal.rs (2)

34-37: WalRawRecord lacks common trait derives.

Consider deriving Debug (and possibly Clone) for WalRawRecord<R> to aid diagnostics and flexibility. Without Debug, it can't be used in debug-format assertions or error messages.

Proposed change
+#[derive(Debug, Clone)]
 pub struct WalRawRecord<R> {
     record: Vec<u8>,
     _phantom: PhantomData<R>,
 }

53-58: Redundant trait bound on deserialize.

The where R: DeserializeOwned clause is already specified on the impl block (line 39). The additional bound on this method is unnecessary.

Proposed fix
-    pub fn deserialize(&self) -> Result<R>
-    where
-        R: DeserializeOwned,
-    {
+    pub fn deserialize(&self) -> Result<R> {
         Self::deserialize_from(&self.record)
     }

Same applies to deserialize_from at line 60-62.

Comment on lines +40 to +46
pub fn new(record: &R) -> Result<Self> {
// ToDo: Replace back to faster rmp, once this https://github.com/serde-rs/serde/issues/2055 solved
let record = serde_cbor::to_vec(record).map_err(|err| {
WalError::WriteWalError(format!(
"Can't serialize entry, probably corrupted WAL or version mismatch: {err:?}"
))
})?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Misleading error message on the serialization (write) path.

The error says "corrupted WAL or version mismatch" but this is the serialization path — the data hasn't touched the WAL yet. A serialization failure here is more likely a bug in the operation type's Serialize impl, not a corrupted WAL.

Suggested fix
         let record = serde_cbor::to_vec(record).map_err(|err| {
             WalError::WriteWalError(format!(
-                "Can't serialize entry, probably corrupted WAL or version mismatch: {err:?}"
+                "Can't serialize WAL entry: {err:?}"
             ))
         })?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn new(record: &R) -> Result<Self> {
// ToDo: Replace back to faster rmp, once this https://github.com/serde-rs/serde/issues/2055 solved
let record = serde_cbor::to_vec(record).map_err(|err| {
WalError::WriteWalError(format!(
"Can't serialize entry, probably corrupted WAL or version mismatch: {err:?}"
))
})?;
pub fn new(record: &R) -> Result<Self> {
// ToDo: Replace back to faster rmp, once this https://github.com/serde-rs/serde/issues/2055 solved
let record = serde_cbor::to_vec(record).map_err(|err| {
WalError::WriteWalError(format!(
"Can't serialize WAL entry: {err:?}"
))
})?;
🤖 Prompt for AI Agents
In `@lib/shard/src/wal.rs` around lines 40 - 46, The error message in Wal::new
(the new function) is misleading because serde_cbor::to_vec failing is a
serialization error before any WAL I/O; update the WalError::WriteWalError
constructed in the map_err for serde_cbor::to_vec to state that serialization of
the provided record (type R) failed — e.g., "Failed to serialize record for WAL
(Serialize impl error): {err:?}" — and keep including the original err details
to aid debugging; locate the serde_cbor::to_vec(...) map_err in pub fn
new(record: &R) -> Result<Self> and replace the misleading "corrupted WAL or
version mismatch" text accordingly.

@qdrant qdrant deleted a comment from coderabbitai bot Feb 12, 2026
@timvisee
Copy link
Member

@IvanPleshkov there still seems to be a merge conflict.

@IvanPleshkov IvanPleshkov force-pushed the dont-lock-WAL-while-serialization branch from d2aaaac to a3b7490 Compare February 12, 2026 13:47
@IvanPleshkov IvanPleshkov merged commit d8c8ba6 into dev Feb 12, 2026
15 checks passed
@IvanPleshkov IvanPleshkov deleted the dont-lock-WAL-while-serialization branch February 12, 2026 15:34
@qdrant qdrant deleted a comment from coderabbitai bot Feb 12, 2026
timvisee pushed a commit that referenced this pull request Feb 13, 2026
* dont lock WAL while serialization

* review remarks

* remove error log, its already logged in update worker
@timvisee timvisee mentioned this pull request Feb 17, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants