Persist and prewarm agent tasks per thread#17978
Conversation
|
Stack navigation for this slice:
|
b65f7d4 to
dfc5e05
Compare
## Summary Stack PR3 for feature-gated agent identity support. This PR adds per-thread agent task registration behind `features.use_agent_identity`. Tasks are minted on the first real user turn and cached in thread runtime state for later turns. ## Stack - PR1: #17385 - add `features.use_agent_identity` - PR2: #17386 - register agent identities when enabled - PR3: #17387 - this PR, original task registration slice - PR3.1: #17978 - persist and prewarm registered tasks per thread - PR4: #17980 - use `AgentAssertion` downstream when enabled ## Validation Covered as part of the local stack validation pass: - `just fmt` - `cargo test -p codex-core --lib agent_identity` - `cargo test -p codex-core --lib agent_assertion` - `cargo test -p codex-core --lib websocket_agent_task` - `cargo test -p codex-api api_bridge` - `cargo build -p codex-cli --bin codex` ## Notes The full local app-server E2E path is still being debugged after PR creation. The current branch stack is directionally ready for review while that follow-up continues.
ea5ec0a to
56f4b38
Compare
56f4b38 to
4c0010f
Compare
| // These tests start full app-server processes; keep headroom for concurrent debug startup. | ||
| const DEFAULT_READ_TIMEOUT: std::time::Duration = std::time::Duration::from_secs(30); |
There was a problem hiding this comment.
Are we doing this only for debug? Would like to understand better what/if the code change here is causing longer run loop or something.
There was a problem hiding this comment.
Good catch! This has been updated.
|
@codex review |
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
3dcdb8d to
67cff31
Compare
| } | ||
|
|
||
| pub(crate) async fn task_matches_current_binding(&self, task: &RegisteredAgentTask) -> bool { | ||
| pub(crate) async fn task_matches_current_identity(&self, task: &RegisteredAgentTask) -> bool { |
There was a problem hiding this comment.
This is going to be a catch 22 situation - probably will be answered in a later PR and I will come back to comment on this. What happens if there is a mismatch on the agent runtime id between the session vs the task? Agent identity has ttl that can expire mid session and be reacquired?
There was a problem hiding this comment.
Okay I am back - my concern still stands and will leave a PR level comment.
There was a problem hiding this comment.
Discussed directly - generally this should not happen in the current implementation.
| fn latest_persisted_agent_task( | ||
| rollout_items: &[RolloutItem], | ||
| ) -> Option<Option<SessionAgentTask>> { | ||
| rollout_items.iter().rev().find_map(|item| match item { | ||
| RolloutItem::SessionState(update) => Some(update.agent_task.clone()), | ||
| _ => None, | ||
| }) | ||
| } |
There was a problem hiding this comment.
I am not 100% sure if we want to actually persist agent task as a part of the rollout? Any downside for this to be in memory and limited the usage to the current session and on future resume we generate a new one or conceptually it is incorrect?
There was a problem hiding this comment.
Discussed in person - we want the task to be persisted so that we can keep permissions the same across the entirety of the rollout, and not start from scratch every so often.
| RolloutItem::EventMsg(_) | RolloutItem::SessionMeta(_) => {} | ||
| RolloutItem::EventMsg(_) | ||
| | RolloutItem::SessionMeta(_) | ||
| | RolloutItem::SessionState(_) => {} |
There was a problem hiding this comment.
Ties to if we want to store sessionState at all.
There was a problem hiding this comment.
Still think we do :)
|
|
||
| // record_initial_history can emit events. We record only after the SessionConfiguredEvent is emitted. | ||
| sess.record_initial_history(initial_history).await; | ||
| sess.start_agent_identity_registration(); |
There was a problem hiding this comment.
DO we want to await on this?
There was a problem hiding this comment.
[codex] We intentionally do not await this at startup. record_initial_history(...) needs to run first so resumed SessionState can restore/clear the persisted task before prewarm can cache a new one; after that, startup registration is best-effort prewarm. The actual user-turn path awaits ensure_agent_task_registered() in session/turn.rs, emits an error for that thread if registration fails, and retries on the next turn rather than blocking session startup or shutting down other threads.
I think I pushed for this originally, but I think you'd have better intuition here @shijie-oai. Core idea was to keep the task lifecycle very close to a session lifecycle (e.g. we can have multiple tasks per rollout now), but i'm g either way. |
shijie-oai
left a comment
There was a problem hiding this comment.
I am not seeing a clear failure handling if an agent or task validation is outdated when checking against backend.
If you are agent identity only (for enterprise use case), it is okay to hard fail as we need to strictly enforce defined TTL. But if you are chatgpt auth and we assign agent identity on the fly, I wonder what is the default TTL for that agent identity is and how we should best recover. Maybe that agent identity has the same life cycle as chatgpt refresh token?
e6f7ce3 to
0000662
Compare
Summary
Testing