Skip to content

[codex-analytics] report compaction request token counts#25946

Merged
rhan-oai merged 1 commit into
mainfrom
rhan/compaction-request-tokens
Jun 4, 2026
Merged

[codex-analytics] report compaction request token counts#25946
rhan-oai merged 1 commit into
mainfrom
rhan/compaction-request-tokens

Conversation

@rhan-oai

@rhan-oai rhan-oai commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Why

Compaction analytics need token counts that better represent the request being compacted. The existing session snapshot can diverge from the actual remote compaction request after output rewriting, and remote v2 can use server-side Responses usage when available.

What changed

  • Add an optional active_context_tokens_before override to CompactionAnalyticsAttempt::track(...) for remote compaction when it has a better before-token value than the begin-time session snapshot. The local /compact path passes no override.
  • For remote v1 responses_compact, subtract the estimated token delta from pre-compaction output rewriting from the session snapshot, capped by locally-added tokens since the last successful API response.
  • For remote v2 responses_compaction_v2, use the same bounded output-rewrite fallback as remote v1, then overwrite active_context_tokens_before with server token_usage.input_tokens from the response.completed event when present.
  • Keep the existing v2 compaction-output validation while carrying the completed response token usage through collect_compaction_output.

Verification

  • just fmt
  • just test -p codex-core collect_compaction_output_accepts_additional_output_items
  • git diff --check

@rhan-oai rhan-oai force-pushed the rhan/compaction-request-tokens branch 2 times, most recently from 47337b1 to 3692f74 Compare June 2, 2026 22:32
@rhan-oai

rhan-oai commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

@codex

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 593c5c88dd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread codex-rs/core/src/compact_remote.rs Outdated
Comment on lines +193 to +195
attempt.subtract_active_context_tokens_before(
estimated_deleted_tokens.min(max_local_deleted_tokens),
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Set v1 before-tokens from the built prompt

For a successful /responses/compact run with no context-window trimming, attempt still reports the sess.get_total_token_usage() value captured before the compact request; even when trimming occurs these lines only subtract some locally-added tokens. The actual compact request is the Prompt built below, so responses_compact analytics continue to report the previous session/response total instead of the post-trim compaction request estimate this change is trying to measure.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, then we would have to estimate tokens across the entire prompt, substantially increasing uncertainty of token accounting since we lose the last server-side usage

Comment thread codex-rs/core/src/compact_remote.rs Outdated
Comment on lines +193 to +194
attempt.subtract_active_context_tokens_before(
estimated_deleted_tokens.min(max_local_deleted_tokens),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add integration coverage for v1 token analytics

This changes the full remote-compaction path by subtracting locally trimmed history before reporting responses_compact active-context tokens, but the commit only adds a helper-level unit test and no core/tests/suite coverage that drives a compaction through test_codex and verifies the emitted analytics payload. The test-authoring rule says agent changes must add integration tests under the core suite (root AGENTS.md lines 103-106); without that, regressions in the wiring from trimming through attempt.track would not be caught.

Useful? React with 👍 / 👎.

Comment thread codex-rs/core/src/compact_remote_v2.rs Outdated
Comment on lines +265 to +266
if let Some(token_usage) = token_usage {
attempt.set_active_context_tokens_before(token_usage.input_tokens);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add integration coverage for v2 server token usage

This new branch makes the reported responses_compaction_v2 before-token count depend on the server input_tokens from the compaction stream, but the added coverage only calls collect_compaction_output directly and does not exercise a real remote-compaction turn or assert the analytics event. The test-authoring rule requires integration tests for agent changes (root AGENTS.md lines 103-106); an integration test should cover both the server-usage path and the local-estimate fallback through the full compaction flow.

Useful? React with 👍 / 👎.

Comment thread codex-rs/core/src/compact_remote.rs Outdated
Comment on lines +453 to +454
#[cfg(test)]
mod tests {

This comment was marked as outdated.

@rhan-oai rhan-oai changed the title Report compaction request token counts [codex-analytics] report compaction request token counts Jun 3, 2026
@rhan-oai rhan-oai force-pushed the rhan/compaction-request-tokens branch 2 times, most recently from 0c1f04c to 625f101 Compare June 3, 2026 17:10
@rhan-oai rhan-oai marked this pull request as ready for review June 3, 2026 18:31
@rhan-oai rhan-oai requested a review from a team as a code owner June 3, 2026 18:31

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 625f101395

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread codex-rs/core/src/compact.rs Outdated
personality: turn_context.personality,
..Default::default()
};
attempt.set_active_context_tokens_before(estimate_compaction_request_tokens(&prompt));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use completed usage for Responses compaction analytics

When a normal /responses compaction succeeds, drain_to_completed receives the server's token_usage and updates the session with it, but this new assignment fixes active_context_tokens_before to the local heuristic estimate and never overwrites it from the completed response. In threads with images, encrypted items, or tool schemas where estimate_compaction_request_tokens() diverges from actual tokenizer/server accounting, the new analytics field reports an inaccurate request size even though exact usage was available on the success path.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving this implementation out due to essentially being legacy

&base_instructions,
);
let (deleted_items, estimated_deleted_tokens) =
trim_function_call_history_to_fit_context_window(

@owenlin0 owenlin0 Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I find it a bit confusing to have a mutable CompactionAnalyticsAttempt and then do prompt trimming + token arithmetic based on deleted content, instead of just estimating the token size of the prompt after trimming. Any reason we need to do it this way?

Could we keep CompactionAnalyticsAttempt immutable and pass the final active_context_tokens_before into track(...) instead?

Suggestion:

Right now the inner compaction paths take &mut CompactionAnalyticsAttempt only to refine one metric after the actual request shape is known. That makes the analytics lifecycle object double as mutable request-accounting state.

A cleaner shape would be to keep a local request_tokens_before in each compaction path, update that as the request is finalized/retried, and then call:

attempt.track(sess.as_ref(), status, error, request_tokens_before).await;

This also seems simpler for legacy remote compaction: once the trimmed Prompt is built, we can compute estimate_compaction_request_tokens(&prompt) directly instead of starting from session usage and subtracting estimated deleted tokens.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned up the compact analytics attempt shape!

regarding the arithmetic based on the deleted content, i agree that it is somewhat confusing, but it is more accurate and bounds the error because we only have to estimate the delta rather than the entire prompt. since active tokens before is tokens reported from server-side token accounting, re-estimating the entire prompt loses that accuracy, which can be significant. lmk if this makes sense

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah makes sense, carry on

@rhan-oai rhan-oai force-pushed the rhan/compaction-request-tokens branch 2 times, most recently from 514b238 to dc50345 Compare June 3, 2026 21:45
@rhan-oai rhan-oai marked this pull request as draft June 3, 2026 21:45
@rhan-oai rhan-oai force-pushed the rhan/compaction-request-tokens branch 6 times, most recently from 9e93ea1 to 306936c Compare June 3, 2026 23:11
@rhan-oai rhan-oai requested a review from owenlin0 June 3, 2026 23:20
@rhan-oai rhan-oai force-pushed the rhan/compaction-request-tokens branch from 306936c to 58979a2 Compare June 4, 2026 02:39
@rhan-oai rhan-oai marked this pull request as ready for review June 4, 2026 02:41
@rhan-oai rhan-oai enabled auto-merge (squash) June 4, 2026 02:41
@rhan-oai rhan-oai merged commit c143a86 into main Jun 4, 2026
31 checks passed
@rhan-oai rhan-oai deleted the rhan/compaction-request-tokens branch June 4, 2026 03:00
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants