Skip to content

state: pass state db handles through consumers#20561

Merged
euroelessar merged 18 commits into
mainfrom
ruslan/state-db-handle-plumbing
May 4, 2026
Merged

state: pass state db handles through consumers#20561
euroelessar merged 18 commits into
mainfrom
ruslan/state-db-handle-plumbing

Conversation

@euroelessar

@euroelessar euroelessar commented May 1, 2026

Copy link
Copy Markdown
Collaborator

Why

SQLite state was still being opened from consumer paths, including lazy OnceCell-backed thread-store call sites. That let one process construct multiple state DB connections for the same Codex home, which makes SQLite lock contention and database is locked failures much easier to hit.

State DB lifetime should be chosen by main-like entrypoints and tests, then passed through explicitly. Consumers should use the supplied Option<StateDbHandle> or StateDbHandle and keep their existing filesystem fallback or error behavior when no handle is available.

The startup path also needs to keep the rollout crate in charge of SQLite state initialization. Opening codex_state::StateRuntime directly bypasses rollout metadata backfill, so entrypoints should initialize through codex_rollout::state_db and receive a handle only after required rollout backfills have completed.

What Changed

  • Initialize the state DB in main-like entrypoints for CLI, TUI, app-server, exec, MCP server, and the thread-manager sample.
  • Pass Option<StateDbHandle> through ThreadManager, LocalThreadStore, app-server processors, TUI app wiring, rollout listing/recording, personality migration, shell snapshot cleanup, session-name lookup, and memory/device-key consumers.
  • Remove the lazy local state DB wrapper from the thread store so non-test consumers use only the supplied handle or their existing fallback path.
  • Make codex_rollout::state_db::init the local state startup path: it opens/migrates SQLite, runs rollout metadata backfill when needed, waits for concurrent backfill workers up to a bounded timeout, verifies completion, and then returns the initialized handle.
  • Keep optional/non-owning SQLite helpers, such as remote TUI local reads, as open-only paths that do not run startup backfill.
  • Switch app-server startup from direct codex_state::StateRuntime::init to the rollout state initializer so app-server cannot skip rollout backfill.
  • Collapse split rollout lookup/list APIs so callers use the normal methods with an optional state handle instead of _with_state_db variants.
  • Restore getConversationSummary(ThreadId) to delegate through ThreadStore::read_thread instead of a LocalThreadStore-specific rollout path special case.
  • Keep DB-backed rollout path lookup keyed on the DB row and file existence, without imposing the filesystem filename convention on existing DB rows.
  • Verify readable DB-backed rollout paths against session_meta.id before returning them, so a stale SQLite row that points at another thread's JSONL falls back to filesystem search and read-repairs the DB row.
  • Keep debug prompt-input filesystem-only so a one-off debug command does not initialize or backfill SQLite state just to print prompt input.
  • Keep goal-session test Codex homes alive only in the goal-specific helper, rather than leaking tempdirs from the shared session test helper.
  • Update tests and call sites to pass explicit state handles where DB behavior is expected and explicit None where filesystem-only behavior is intended.

Validation

  • CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo check -p codex-rollout -p codex-thread-store -p codex-app-server -p codex-core -p codex-tui -p codex-exec -p codex-cli --tests
  • CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout state_db_
  • CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout find_thread_path
  • CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout find_thread_path -- --nocapture
  • CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout try_init_ -- --nocapture
  • CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout
  • CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo clippy -p codex-rollout --lib -- -D warnings
  • CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-thread-store read_thread_falls_back_when_sqlite_path_points_to_another_thread -- --nocapture
  • CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-thread-store
  • CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core shell_snapshot
  • CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core --test all personality_migration
  • CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core --test all rollout_list_find
  • RUST_MIN_STACK=8388608 CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core --test all rollout_list_find::find_prefers_sqlite_path_by_id -- --nocapture
  • RUST_MIN_STACK=8388608 CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core --test all rollout_list_find -- --nocapture
  • CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core interrupt_accounts_active_goal_before_pausing
  • CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-app-server get_auth_status -- --test-threads=1
  • CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-app-server --lib
  • CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo check -p codex-rollout -p codex-app-server --tests
  • CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout -p codex-thread-store -p codex-core -p codex-app-server -p codex-tui -p codex-exec -p codex-cli
  • CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout -p codex-app-server
  • CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout
  • CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-core
  • just argument-comment-lint -p codex-core
  • just argument-comment-lint -p codex-rollout

Focused coverage added in codex-rollout:

  • recorder::tests::state_db_init_backfills_before_returning verifies the rollout metadata row exists before startup init returns.
  • state_db::tests::try_init_waits_for_concurrent_startup_backfill verifies startup waits for another worker to finish backfill instead of disabling the handle for the process.
  • state_db::tests::try_init_times_out_waiting_for_stuck_startup_backfill verifies startup does not hang indefinitely on a stuck backfill lease.
  • tests::find_thread_path_accepts_existing_state_db_path_without_canonical_filename verifies DB-backed lookup accepts valid existing rollout paths even when the filename does not include the thread UUID.
  • tests::find_thread_path_falls_back_when_db_path_points_to_another_thread verifies DB-backed lookup ignores a stale row whose existing path belongs to another thread and read-repairs the row after filesystem fallback.

Focused coverage updated in codex-core:

  • rollout_list_find::find_prefers_sqlite_path_by_id now uses a DB-preferred rollout file with matching session_meta.id, so it still verifies that valid SQLite paths win without depending on stale/empty rollout contents.

cargo test -p codex-app-server thread_list_respects_search_term_filter -- --test-threads=1 --nocapture was attempted locally but timed out waiting for the app-server test harness initialize response before reaching the changed thread-list code path.

bazel test //codex-rs/thread-store:thread-store-unit-tests --test_output=errors was attempted locally after the thread-store fix, but this container failed before target analysis while fetching v8+ through BuildBuddy/direct GitHub. The equivalent local crate coverage, including cargo test -p codex-thread-store, passes.

A plain local cargo check -p codex-rollout -p codex-app-server --tests also requires system libcap.pc for codex-linux-sandbox; the follow-up app-server check above used CODEX_SKIP_VENDORED_BWRAP=1 in this container.

@euroelessar euroelessar force-pushed the ruslan/state-db-handle-plumbing branch from e4a3f2d to e541ebe Compare May 1, 2026 05:39
@euroelessar euroelessar marked this pull request as ready for review May 1, 2026 08:19
@euroelessar euroelessar requested a review from a team as a code owner May 1, 2026 08:19

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 830b771d06

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread codex-rs/app-server/src/codex_message_processor.rs Outdated
@owenlin0 owenlin0 requested a review from jif-oai May 1, 2026 15:56
@euroelessar euroelessar force-pushed the ruslan/state-db-handle-plumbing branch 3 times, most recently from ad70b50 to 3d5fd73 Compare May 1, 2026 22:59
@euroelessar euroelessar force-pushed the ruslan/state-db-handle-plumbing branch from afd33e8 to 4ee203f Compare May 4, 2026 15:48
@euroelessar euroelessar force-pushed the ruslan/state-db-handle-plumbing branch from 4ee203f to 3ef0154 Compare May 4, 2026 18:24
@euroelessar euroelessar merged commit 4d201e3 into main May 4, 2026
26 checks passed
@euroelessar euroelessar deleted the ruslan/state-db-handle-plumbing branch May 4, 2026 18:46
@github-actions github-actions Bot locked and limited conversation to collaborators May 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants