Skip to content

fix(lite): initialize durability notifier from close status#312

Merged
shikhar merged 1 commit intomainfrom
codex/fix-durability-notifier-closed-db
Mar 6, 2026
Merged

fix(lite): initialize durability notifier from close status#312
shikhar merged 1 commit intomainfrom
codex/fix-durability-notifier-closed-db

Conversation

@shikhar
Copy link
Member

@shikhar shikhar commented Mar 6, 2026

Summary

  • initialize DurabilityNotifier state from the subscribed DbStatus, including close_reason
  • avoid spawning the notifier task when the database is already closed, so subscribers see the closed state immediately
  • add a regression test covering a notifier spawned after Db::close()

Closes #307

Testing

  • cargo test -p s2-lite durability_notifier --lib
  • cargo nextest run -p s2-lite --all-features

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR fixes a bug in DurabilityNotifier::spawn where the initial state was unconditionally set with closed_reason: None, causing any subscriber created after Db::close() to hang indefinitely rather than receiving an immediate error.

Key changes:

  • initial_status (the full DbStatus snapshot) is now cloned from status_rx.borrow() instead of only reading durable_seq, allowing close_reason to be propagated into the initial State.
  • tokio::spawn(run_notifier(...)) is now guarded by if initial_status.close_reason.is_none(), skipping the background task entirely when the DB is already closed — subscribers on a pre-closed notifier receive Err synchronously via the existing subscribe() fast-path.
  • A regression test spawn_on_closed_db_fails_subscribers_immediately is added that builds a real slatedb::Db, closes it, creates a notifier, and asserts that the very first subscriber gets an immediate Err(CloseReason::Clean).

The fix is minimal, logically sound, and does not introduce any race conditions: the close_with_reason assertion (assert!(prev.is_none())) remains safe because run_notifier is only spawned when close_reason starts as None, ensuring close_with_reason is called at most once per notifier instance.

Confidence Score: 5/5

  • This PR is safe to merge — it is a targeted bug fix with a focused regression test and no API surface changes.
  • The change is small (one file, ~15 logical lines changed), well-reasoned, and comes with a concrete regression test against a real slatedb instance. The fix correctly handles the race between DB closure and notifier creation, the existing assert!(prev.is_none()) invariant in close_with_reason remains unviolated, and the subscribe() fast-path's priority order (closed_reason checked before durable_seq) ensures the test for subscribe(0, ...) on a closed DB yields Err as expected.
  • No files require special attention.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant DN as DurabilityNotifier
    participant Task as run_notifier task
    participant DB as slatedb watch channel

    Caller->>DN: spawn(db)
    DN->>DB: subscribe() returns status_rx
    DN->>DB: borrow() returns initial_status
    alt initial_status.close_reason is None
        DN->>Task: tokio::spawn(run_notifier)
        DN-->>Caller: Self { state: closed_reason=None }
        loop on each status change
            Task->>DB: changed().await
            Task->>Task: notify_waiters(durable_seq)
            opt close_reason is Some
                Task->>Task: close_with_reason then exit
            end
        end
    else initial_status.close_reason is Some
        Note over Task: Task NOT spawned
        DN-->>Caller: Self { state: closed_reason=Some }
    end

    Caller->>DN: subscribe(target_seq, callback)
    alt closed_reason is Some
        DN->>Caller: callback(Err(reason)) synchronously
    else last_durable_seq >= target_seq
        DN->>Caller: callback(Ok(durable_seq)) synchronously
    else pending
        DN->>DN: enqueue sorted Waiter
    end
Loading

Last reviewed commit: e4f731c

@shikhar shikhar merged commit 9036086 into main Mar 6, 2026
18 checks passed
@shikhar shikhar deleted the codex/fix-durability-notifier-closed-db branch March 6, 2026 18:03
@s2-release-plz s2-release-plz bot mentioned this pull request Mar 6, 2026
shikhar pushed a commit that referenced this pull request Mar 6, 2026
## 🤖 New release

* `s2-lite`: 0.29.20 -> 0.29.21 (✓ API compatible changes)
* `s2-sdk`: 0.24.7 -> 0.24.8 (✓ API compatible changes)
* `s2-cli`: 0.29.20 -> 0.29.21

<details><summary><i><b>Changelog</b></i></summary><p>

## `s2-lite`

<blockquote>

## [0.29.21] - 2026-03-06

### Bug Fixes

- Allow http endpoints for S3-compatible object stores
([#303](#303))
- Preserve follow wait budget after lagged recovery
([#311](#311))
- Initialize durability notifier from close status
([#312](#312))

<!-- generated by git-cliff -->
</blockquote>

## `s2-sdk`

<blockquote>

## [0.24.8] - 2026-03-06

### Bug Fixes

- Compression and FrameSignal ordering
([#313](#313))

<!-- generated by git-cliff -->
</blockquote>

## `s2-cli`

<blockquote>

## [0.29.21] - 2026-03-06

<!-- generated by git-cliff -->
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

Co-authored-by: s2-release-plz[bot] <262023388+s2-release-plz[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Detail Bug] Durability notifications can report writes as durable after DB is already closed

1 participant