[codex-analytics] report compaction request token counts#25946
Conversation
47337b1 to
3692f74
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 593c5c88dd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| attempt.subtract_active_context_tokens_before( | ||
| estimated_deleted_tokens.min(max_local_deleted_tokens), | ||
| ); |
There was a problem hiding this comment.
Set v1 before-tokens from the built prompt
For a successful /responses/compact run with no context-window trimming, attempt still reports the sess.get_total_token_usage() value captured before the compact request; even when trimming occurs these lines only subtract some locally-added tokens. The actual compact request is the Prompt built below, so responses_compact analytics continue to report the previous session/response total instead of the post-trim compaction request estimate this change is trying to measure.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
no, then we would have to estimate tokens across the entire prompt, substantially increasing uncertainty of token accounting since we lose the last server-side usage
| attempt.subtract_active_context_tokens_before( | ||
| estimated_deleted_tokens.min(max_local_deleted_tokens), |
There was a problem hiding this comment.
Add integration coverage for v1 token analytics
This changes the full remote-compaction path by subtracting locally trimmed history before reporting responses_compact active-context tokens, but the commit only adds a helper-level unit test and no core/tests/suite coverage that drives a compaction through test_codex and verifies the emitted analytics payload. The test-authoring rule says agent changes must add integration tests under the core suite (root AGENTS.md lines 103-106); without that, regressions in the wiring from trimming through attempt.track would not be caught.
Useful? React with 👍 / 👎.
| if let Some(token_usage) = token_usage { | ||
| attempt.set_active_context_tokens_before(token_usage.input_tokens); |
There was a problem hiding this comment.
Add integration coverage for v2 server token usage
This new branch makes the reported responses_compaction_v2 before-token count depend on the server input_tokens from the compaction stream, but the added coverage only calls collect_compaction_output directly and does not exercise a real remote-compaction turn or assert the analytics event. The test-authoring rule requires integration tests for agent changes (root AGENTS.md lines 103-106); an integration test should cover both the server-usage path and the local-estimate fallback through the full compaction flow.
Useful? React with 👍 / 👎.
| #[cfg(test)] | ||
| mod tests { |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
0c1f04c to
625f101
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 625f101395
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| personality: turn_context.personality, | ||
| ..Default::default() | ||
| }; | ||
| attempt.set_active_context_tokens_before(estimate_compaction_request_tokens(&prompt)); |
There was a problem hiding this comment.
Use completed usage for Responses compaction analytics
When a normal /responses compaction succeeds, drain_to_completed receives the server's token_usage and updates the session with it, but this new assignment fixes active_context_tokens_before to the local heuristic estimate and never overwrites it from the completed response. In threads with images, encrypted items, or tool schemas where estimate_compaction_request_tokens() diverges from actual tokenizer/server accounting, the new analytics field reports an inaccurate request size even though exact usage was available on the success path.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
leaving this implementation out due to essentially being legacy
| &base_instructions, | ||
| ); | ||
| let (deleted_items, estimated_deleted_tokens) = | ||
| trim_function_call_history_to_fit_context_window( |
There was a problem hiding this comment.
hm, I find it a bit confusing to have a mutable CompactionAnalyticsAttempt and then do prompt trimming + token arithmetic based on deleted content, instead of just estimating the token size of the prompt after trimming. Any reason we need to do it this way?
Could we keep CompactionAnalyticsAttempt immutable and pass the final active_context_tokens_before into track(...) instead?
Suggestion:
Right now the inner compaction paths take
&mut CompactionAnalyticsAttemptonly to refine one metric after the actual request shape is known. That makes the analytics lifecycle object double as mutable request-accounting state.A cleaner shape would be to keep a local
request_tokens_beforein each compaction path, update that as the request is finalized/retried, and then call:attempt.track(sess.as_ref(), status, error, request_tokens_before).await;This also seems simpler for legacy remote compaction: once the trimmed Prompt is built, we can compute estimate_compaction_request_tokens(&prompt) directly instead of starting from session usage and subtracting estimated deleted tokens.
There was a problem hiding this comment.
cleaned up the compact analytics attempt shape!
regarding the arithmetic based on the deleted content, i agree that it is somewhat confusing, but it is more accurate and bounds the error because we only have to estimate the delta rather than the entire prompt. since active tokens before is tokens reported from server-side token accounting, re-estimating the entire prompt loses that accuracy, which can be significant. lmk if this makes sense
514b238 to
dc50345
Compare
9e93ea1 to
306936c
Compare
306936c to
58979a2
Compare
Why
Compaction analytics need token counts that better represent the request being compacted. The existing session snapshot can diverge from the actual remote compaction request after output rewriting, and remote v2 can use server-side Responses usage when available.
What changed
active_context_tokens_beforeoverride toCompactionAnalyticsAttempt::track(...)for remote compaction when it has a better before-token value than the begin-time session snapshot. The local/compactpath passes no override.responses_compact, subtract the estimated token delta from pre-compaction output rewriting from the session snapshot, capped by locally-added tokens since the last successful API response.responses_compaction_v2, use the same bounded output-rewrite fallback as remote v1, then overwriteactive_context_tokens_beforewith servertoken_usage.input_tokensfrom theresponse.completedevent when present.collect_compaction_output.Verification
just fmtjust test -p codex-core collect_compaction_output_accepts_additional_output_itemsgit diff --check