[codex] Route live thread writes through ThreadStore#18882
Conversation
0fe48a3 to
833bc89
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 833bc89e61
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| async fn create_thread(&self, _params: CreateThreadParams) -> ThreadStoreResult<()> { | ||
| Err(not_implemented("create_thread")) | ||
| } | ||
|
|
||
| async fn resume_thread_recorder( | ||
| &self, | ||
| _params: ResumeThreadRecorderParams, | ||
| ) -> ThreadStoreResult<Box<dyn ThreadRecorder>> { | ||
| Err(not_implemented("resume_thread_recorder")) | ||
| async fn resume_thread(&self, _params: ResumeThreadParams) -> ThreadStoreResult<()> { | ||
| Err(not_implemented("resume_thread")) | ||
| } |
There was a problem hiding this comment.
Guard remote store selection until write APIs are implemented
Core session startup now requires ThreadStore::create_thread/resume_thread. When experimental_thread_store_endpoint is set, ThreadManager selects RemoteThreadStore, but these methods currently return not_implemented, causing non-ephemeral thread startup to fail immediately under that config.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
yeah the remote thread store in non-functional, not worrying too much about it yet
833bc89 to
792c0f8
Compare
|
@codex review |
1 similar comment
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
codex/codex-rs/thread-store/src/local/update_thread_metadata.rs
Lines 152 to 153 in 124eabb
persist_thread_memory_mode_update now depends on update_thread_metadata, which resolves rollouts by scanning codex_home only. Threads resumed from an explicit rollout path outside codex_home (supported by resume_thread_from_rollout) are not found, so memory-mode updates fail with thread not found despite an active resumed thread.
codex/codex-rs/core/src/thread_manager.rs
Line 794 in 124eabb
remove_thread now only drops the in-memory CodexThread. Live writers are owned by shared LocalThreadStore.live_recorders, and this path does not call shutdown_thread or discard_thread. Removing a thread after abnormal termination can therefore leave recorder entries and writer tasks alive indefinitely, leaking resources across the process lifetime.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 414ee8107f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| LiveThread::create( | ||
| Arc::clone(&thread_store), | ||
| CreateThreadParams { | ||
| thread_id: conversation_id, | ||
| forked_from_id, | ||
| source: session_source, | ||
| base_instructions: BaseInstructions { | ||
| text: session_configuration.base_instructions.clone(), | ||
| }, | ||
| dynamic_tools: session_configuration.dynamic_tools.clone(), | ||
| event_persistence_mode, | ||
| }, | ||
| ) | ||
| .await? | ||
| } | ||
| InitialHistory::Resumed(resumed_history) => { | ||
| LiveThread::resume( | ||
| Arc::clone(&thread_store), | ||
| ResumeThreadParams { | ||
| thread_id: resumed_history.conversation_id, | ||
| rollout_path: Some(resumed_history.rollout_path.clone()), | ||
| history: Some(resumed_history.history.clone()), | ||
| include_archived: true, | ||
| event_persistence_mode, | ||
| }, | ||
| ) | ||
| .await? |
There was a problem hiding this comment.
Keep thread startup working with configured remote stores
Session::new now always creates/resumes persistence through the configured ThreadStore for non-ephemeral sessions. With experimental_thread_store_endpoint set, ThreadManager injects RemoteThreadStore, whose live-writer methods are still unimplemented; startup now fails immediately instead of launching the thread.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6ab42ef34c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| async fn create_thread(&self, _params: CreateThreadParams) -> ThreadStoreResult<()> { | ||
| Err(not_implemented("create_thread")) | ||
| } | ||
|
|
||
| async fn resume_thread_recorder( | ||
| &self, | ||
| _params: ResumeThreadRecorderParams, | ||
| ) -> ThreadStoreResult<Box<dyn ThreadRecorder>> { | ||
| Err(not_implemented("resume_thread_recorder")) | ||
| async fn resume_thread(&self, _params: ResumeThreadParams) -> ThreadStoreResult<()> { | ||
| Err(not_implemented("resume_thread")) |
There was a problem hiding this comment.
Implement remote thread lifecycle methods
RemoteThreadStore::create_thread/resume_thread now return not_implemented, but session startup now routes live persistence through ThreadStore. With experimental_thread_store_endpoint set, ThreadManager picks RemoteThreadStore, and non-ephemeral Session::new calls these methods during init. This makes remote-backed sessions fail to start.
Useful? React with 👍 / 👎.
6ab42ef to
0a97cbd
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
codex/codex-rs/core/src/session/handlers.rs
Lines 974 to 978 in 0a97cbd
shutdown() closes thread persistence and then emits ShutdownComplete via send_event_raw, which always tries to persist. Because the live writer was already removed, this path hits ThreadNotFound and logs a persistence error on normal shutdowns. That creates false-positive error logs and obscures real persistence issues.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| restriction_product, | ||
| )); | ||
| let skills_watcher = build_skills_watcher(Arc::clone(&skills_manager)); | ||
| let thread_store = configured_thread_store(config); |
There was a problem hiding this comment.
nit: should this be selected at even higher level?
| otel.name = "session_init.thread_persistence", | ||
| session_init.ephemeral = config.ephemeral, | ||
| )); | ||
| let state_db_fut = async { |
There was a problem hiding this comment.
worth having on the abstraction? at least for now?
There was a problem hiding this comment.
downcast doesn't make it much better
There was a problem hiding this comment.
It went in and out a couple times as I revised.... agreed the downcast isn't much better but I wanted to push the failure / issue as early in the code flow as possible. I.e. make sure its clear that accessing state_db_fut in the session.rs code is not something the code out here should be doing; otherwise it's easy to just keep accessing it...
There was a problem hiding this comment.
to make it more obvious, I removed the default case that was still instantiating the state db. If you're running a local thread store you get one, otherwise (ephemeral or remote thread store) it's just None / we'll burn down what fails
7233b25 to
e338f77
Compare
58068b9 to
070c747
Compare
Begin migrating the thread write codepaths to ThreadStore.
This starts using ThreadStore inside of core session code, not only in the app server code.
Rework the interfaces around thread recording/persistence. We're left with the following:
ThreadManager: owns the process-level registry of loaded threads and handles cross-thread orchestration: start, resume, fork, lookup, remove, and route ops to running CodexThreads.CodexThread: represents one loaded/running thread from the outside. It is the handle app-server and callers use to submit ops, inspect session metadata, and shut the thread down.LiveThread: session-owned persistence lifecycle handle for one active thread. Core session code uses it to append rollout items, materialize lazy persistence, flush, shutdown, discard init-failed writers, and load that thread’s persisted history.ThreadStore: storage backend abstraction. It answers “how are threads persisted, read, listed, updated, archived?” Local and remote implementations live behind this trait.LocalThreadStore: local ThreadStore implementation. It owns the file/sqlite-specific details and keeps RolloutRecorder as a local implementation detail.This is a few too many Thread abstractions for my liking, but they do all represent different concepts / needs / layers.
Migration note: in places where the core code explicitly requires a path, rather than a thread ID, throw an error if we're running with a remote store.
Cover the new local live-writer lifecycle with focused tests and preserve app-server thread-start behavior, including ephemeral pathless sessions.