feat(app-server): experimental flag to persist extended history#11227
Conversation
1fc7780 to
5b1e1c1
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7e242eb1fe
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 792262b0b7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7f8a9dde16
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
6243cfb to
a6208e8
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3c1eea654a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
f0ab0e8 to
b987497
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b987497096
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
0fcdb76 to
4f6b90a
Compare
|
@codex review |
783d384 to
21db6ac
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 783d38409f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
206e215 to
a419a13
Compare
a419a13 to
2a39cf5
Compare
celia-oai
left a comment
There was a problem hiding this comment.
what happens when a user try to resume a session without extended history? Shall we throw some warning? similar with user try to resume a session with extended history but flag off. mostly curious about the mismatch between persistence & resume
| RolloutCmd::AddItems(items) => { | ||
| let mut persisted_items = Vec::new(); | ||
| for item in items { | ||
| if is_persisted_response_item(&item) { |
There was a problem hiding this comment.
why is this one changed?
There was a problem hiding this comment.
it happens earlier in record_items() at line 474, so by the time we get here it's already filtered
2a39cf5 to
c7415c4
Compare
c7415c4 to
32eef61
Compare
@celia-oai I don't think we should throw a warning, we'll just return what we have stored on disk. the client needs to be able to handle the full set of items anyway (when streaming), so client-rendering should be compatible with either |
sg! |
This PR adds an experimental
persist_extended_historybool flag to app-server thread APIs so rollout logs can retain a richer set of EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e. onthread/resume).Motivation
Today, our rollout recorder only persists a small subset (e.g. user message, reasoning, assistant message) of
EventMsgtypes, dropping a good number (like command exec, file change, etc.) that are important for reconstructing full item history forthread/resume,thread/read, andthread/fork.Some clients want to be able to resume a thread without lossiness. This lossiness is primarily a UI thing, since what the model sees are
ResponseItemand notEventMsg.Approach
This change introduces an opt-in
persist_full_historyflag to preserve those events when you start/resume/fork a thread (defaults tofalse).This is done by adding an
EventPersistenceModeto the rollout recorder:Limited(existing behavior, default)Extended(new opt-in behavior)In
Extendedmode, persist additionalEventMsgvariants needed for non-lossy app-serverThreadItemreconstruction. We now store the following ThreadItems that we didn't before:For command executions in particular, we truncate the output using the existing
truncate_textfrom core to store an upper bound of 10,000 bytes, which is also the default value for truncating tool outputs shown to the model. This keeps the size of the rollout file and command execution items returned over the wire reasonable.And we also persist
EventMsg::Errorwhich we can now map back to the Turn's status and populates the Turn's error metadata.Updates to EventMsgs
To truly make
thread/resumenon-lossy, we also needed to persist thestatusonEventMsg::CommandExecutionEndEventandEventMsg::PatchApplyEndEvent. Previously it was not obvious whether a command failed or was declined (similar for apply_patch). These EventMsgs were never persisted before so I made it a required field.