Skip to content

WAL replay honors applied_seq#8008

Merged
IvanPleshkov merged 3 commits intodevfrom
wal-replay-applied-seq
Feb 3, 2026
Merged

WAL replay honors applied_seq#8008
IvanPleshkov merged 3 commits intodevfrom
wal-replay-applied-seq

Conversation

@agourlay
Copy link
Member

@agourlay agourlay commented Jan 28, 2026

INFO storage::content_manager::toc: Loading collection: benchmark
DEBUG collection::optimizers_builder: Removing temp_segments directory: "./storage/collections/benchmark/0/temp_segments"
DEBUG collection::shards::local_shard: Recovering shard ./storage/collections/benchmark/0 starting reading WAL from 267948 up to 274626 (last_applied_seq:Some(274561))
Recovering collection benchmark [00:00:00] █████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 6678/6678 (eta:0s)
INFO collection::shards::local_shard: Loading remaining 774673 WAL entries from:274626 into update queue

How to test

Run Qdrant with QDRANT__STORAGE__UPDATE_QUEUE_SIZE=1000000

PUT collections/benchmark
{
  "vectors": {
    "size": 256,
    "distance": "Dot"
  },
  "optimizers_config": {
    "prevent_unoptimized": true
  }
}
./bfb -n 20M -d 256 -b 1 --skip-create

Wait a bit so it results in:

  • a large amount of pending updates in the WAL
  • a large amount of pending updates in the update channel
  • no timeouts or slowdown should be observed by bfb

Just restart to trigger the WAL replay which should be very fast by leveraging the last applied_seq.

@agourlay agourlay added this to the Update queue milestone Jan 28, 2026
@agourlay agourlay force-pushed the wal-replay-applied-seq branch from f56d1ce to e4883ac Compare January 30, 2026 16:25
@IvanPleshkov
Copy link
Contributor

Added a test for this feature.
The test performs uploads into the shard, drops it, reload and checks that the update queue is not empty.
There is a trick inside. Test changes persistent seq num manually if it's too big (which reproduces each time in test).

@IvanPleshkov IvanPleshkov marked this pull request as ready for review February 3, 2026 12:16
@IvanPleshkov IvanPleshkov changed the title (WIP) WAL replay honors applied_seq WAL replay honors applied_seq Feb 3, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@lib/collection/src/shards/local_shard/mod.rs`:
- Around line 684-693: The code treats last_wal_index as an exclusive upper
bound (computed via from + wal.len(false)) but later uses an inclusive range,
producing an off-by-one; update the logic so the upper bound is handled
consistently: either make last_wal_index the inclusive last index (compute as
from + wal.len(false) - 1) or keep it exclusive and change any inclusive ranges
to exclusive (e.g., use to..last_wal_index instead of to..=last_wal_index);
adjust/remove any matching off-by-one arithmetic around wal_entries_to_replay,
from, last_wal_index, and the enqueue loop (also in the similar block around the
808-827 region) so the logged count matches the actual enqueued entries.

In `@lib/collection/src/update_workers/applied_seq.rs`:
- Around line 54-63: The op_num_upper_bound method can overflow when adding
APPLIED_SEQ_SAVE_INTERVAL + 1 to self.op_num (u64); update the calculation in
op_num_upper_bound to use saturating_add on the loaded op_num (e.g., call
self.op_num.load(...).saturating_add(...)) so the adjusted value never wraps,
while keeping the existing file.is_some() check and returning Some(adjusted) or
None as before.
🧹 Nitpick comments (1)
lib/collection/src/tests/wal_recovery_test.rs (1)

384-395: Consider a longer queue-drain timeout to reduce flakes.

A 2s limit can be tight on slower CI; consider increasing it (e.g., 10s) or wrapping the loop in tokio::time::timeout for clearer failure behavior.

Proposed tweak
-    let timeout = std::time::Duration::from_secs(2);
+    let timeout = std::time::Duration::from_secs(10);

Comment on lines +684 to +693

let from = wal.first_index();
let last_wal_index = from + wal.len(false);
let to = self
.applied_seq_handler
.op_num_upper_bound()
.unwrap_or(last_wal_index);
let to = std::cmp::min(to, last_wal_index);
let wal_entries_to_replay = to - from;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix off-by-one when enqueueing remaining WAL entries.

last_wal_index = from + wal.len(false) is the exclusive upper bound. Using to..=last_wal_index enqueues one extra op_num and makes the logged count inconsistent with the actual number of entries.

Proposed fix (use exclusive end)
-        let last_wal_index = from + wal.len(false);
+        let wal_end_exclusive = from + wal.len(false);
         let to = self
             .applied_seq_handler
             .op_num_upper_bound()
-            .unwrap_or(last_wal_index);
-        let to = std::cmp::min(to, last_wal_index);
+            .unwrap_or(wal_end_exclusive);
+        let to = std::cmp::min(to, wal_end_exclusive);
         let wal_entries_to_replay = to - from;
@@
-        if to < last_wal_index {
+        if to < wal_end_exclusive {
             log::info!(
                 "Loading remaining {} WAL entries from:{to} into update queue",
-                last_wal_index - to
+                wal_end_exclusive - to
             );
             let update_sender = self.update_sender.load();
             // TODO use proper collection's hardware measurement
             let hw_measurements = HwMeasurementAcc::disposable();
-            for op_num in to..=last_wal_index {
+            for op_num in to..wal_end_exclusive {
                 update_sender
                     .send(UpdateSignal::Operation(OperationData {
                         op_num,
                         operation: None,
                         sender: None,
                         hw_measurements: hw_measurements.clone(),
                     }))
                     .await?;
             }
         }

Also applies to: 808-827

🤖 Prompt for AI Agents
In `@lib/collection/src/shards/local_shard/mod.rs` around lines 684 - 693, The
code treats last_wal_index as an exclusive upper bound (computed via from +
wal.len(false)) but later uses an inclusive range, producing an off-by-one;
update the logic so the upper bound is handled consistently: either make
last_wal_index the inclusive last index (compute as from + wal.len(false) - 1)
or keep it exclusive and change any inclusive ranges to exclusive (e.g., use
to..last_wal_index instead of to..=last_wal_index); adjust/remove any matching
off-by-one arithmetic around wal_entries_to_replay, from, last_wal_index, and
the enqueue loop (also in the similar block around the 808-827 region) so the
logged count matches the actual enqueued entries.

Comment on lines +54 to +63
/// Get the op_num upper bound for the last_applied_seq adjusted to the persistence interval
///
/// Returns None if the handler is not active.
pub fn op_num_upper_bound(&self) -> Option<u64> {
if self.file.is_some() {
let adjusted = self.op_num.load(Ordering::Relaxed) + APPLIED_SEQ_SAVE_INTERVAL + 1;
Some(adjusted)
} else {
None
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, locate and read the file
cat -n lib/collection/src/update_workers/applied_seq.rs | head -100

Repository: qdrant/qdrant

Length of output: 4049


🏁 Script executed:

# Let me check how op_num_upper_bound is used and if there are similar patterns
rg -A 3 -B 3 "op_num_upper_bound" lib/collection/src/

Repository: qdrant/qdrant

Length of output: 1363


🏁 Script executed:

# Also check if there are other arithmetic operations on atomic u64s that might overflow
rg "\.load\(.*Ordering" lib/collection/src/update_workers/ | head -20

Repository: qdrant/qdrant

Length of output: 680


🏁 Script executed:

# Check the context around APPLIED_SEQ_SAVE_INTERVAL usage
rg -B 5 -A 5 "APPLIED_SEQ_SAVE_INTERVAL" lib/collection/src/

Repository: qdrant/qdrant

Length of output: 9083


🏁 Script executed:

# Let me verify the method is new and check for any tests that might exercise edge cases
git diff HEAD~1 lib/collection/src/update_workers/applied_seq.rs | head -80

Repository: qdrant/qdrant

Length of output: 243


Use saturating_add to prevent u64 overflow in op_num_upper_bound.

Adding 65 to an arbitrary op_num value can wrap, which would violate the upper bound contract (result should be ≥ current value). Use saturating addition to preserve correctness.

Proposed fix
-        let adjusted = self.op_num.load(Ordering::Relaxed) + APPLIED_SEQ_SAVE_INTERVAL + 1;
+        let adjusted = self
+            .op_num
+            .load(Ordering::Relaxed)
+            .saturating_add(APPLIED_SEQ_SAVE_INTERVAL + 1);
🤖 Prompt for AI Agents
In `@lib/collection/src/update_workers/applied_seq.rs` around lines 54 - 63, The
op_num_upper_bound method can overflow when adding APPLIED_SEQ_SAVE_INTERVAL + 1
to self.op_num (u64); update the calculation in op_num_upper_bound to use
saturating_add on the loaded op_num (e.g., call
self.op_num.load(...).saturating_add(...)) so the adjusted value never wraps,
while keeping the existing file.is_some() check and returning Some(adjusted) or
None as before.

@timvisee timvisee mentioned this pull request Feb 3, 2026
5 tasks
@qdrant qdrant deleted a comment from coderabbitai bot Feb 3, 2026
@IvanPleshkov IvanPleshkov force-pushed the wal-replay-applied-seq branch from 79b7d04 to b4be85b Compare February 3, 2026 17:42
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

The pull request refactors WAL recovery in local shard to use dynamic range-based replay instead of fixed whole-WAL reading. It computes a replay window from the first WAL index to a bound derived from applied sequence state, reads only that range via a new read_range() method on SerdeWal, and queues any remaining WAL entries beyond the window as UpdateSignal operations. Supporting changes extend AppliedSeqHandler with an upper bound query method and a test-only persistence helper, and add a test validating pending WAL operations are enqueued on recovery.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • timvisee
  • generall
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'WAL replay honors applied_seq' directly and precisely describes the main change: modifying WAL replay behavior to respect the applied_seq value.
Description check ✅ Passed The description provides concrete logs, testing instructions, and expected behavior directly related to the changeset's core functionality of optimizing WAL replay using applied_seq.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch wal-replay-applied-seq

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@IvanPleshkov IvanPleshkov merged commit 52e995d into dev Feb 3, 2026
15 checks passed
@IvanPleshkov IvanPleshkov deleted the wal-replay-applied-seq branch February 3, 2026 19:48
generall pushed a commit that referenced this pull request Feb 9, 2026
* WAL replay honors applied_seq

fixes

* add integration test

* restore test after rebase

---------

Co-authored-by: Ivan Pleshkov <pleshkov.ivan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants