detail: "original" token/byte estimates are currently unbounded

## Summary

Codex locally estimates `detail: "original"` image token usage by counting raw 32px patches across the decoded source image, with no upper bound for the image sizing caps that GPT-5.4/GPT-5.5 apply server-side.

In the [Computer Use guide](https://developers.openai.com/api/docs/guides/tools-computer-use), the documentation states that `detail: "original` preserves the full screenshot resolution of up to 10.24 megapixels before images are downscaled. 

That can hugely overestimate context usage for large images and materially affect local behavior such as auto-compaction and remote compaction trimming. Uncapped estimates, in practice, can lead to endless compaction loops when the original image bytes estimate significantly diverge from the true token cost of consuming that image. 

## Research

The estimator in `codex-rs/core/src/context_manager/history.rs` replaces inline base64 image payload bytes with a model-visible byte estimate:

- non-original images use `RESIZED_IMAGE_BYTES_ESTIMATE = 7373`
- original-detail images call `estimate_original_image_bytes`
- `estimate_original_image_bytes` decodes the image, reads dimensions, computes:

```text
ceil(width / 32) * ceil(height / 32) patches
```

and returns:

```text
approx_bytes_for_tokens(patch_count) == patch_count * 4 bytes
```

Because the later token estimate is `ceil(bytes / 4)`, the original image estimate is effectively one token per raw 32px patch. However, the local estimator not cap the token estimate. This is non consistent with how tokens are counted on the server.

Example local estimates:

| Image | Local estimated image bytes | Local estimated image tokens |
|---|---:|---:|
| `6000x6000` | `141,376` | `35,344` |
| `10000x10000` | `391,876` | `97,969` |
| `12000x12000` | `562,500` | `140,625` |
| `20000x20000` | `1,562,500` | `390,625` |

OpenAI's image sizing docs for GPT-5.4/GPT-5.5 say `original` allows up to 10,000 patches or a 6000px max dimension, and images over either limit are resized while preserving aspect ratio (https://developers.openai.com/api/docs/guides/images-vision#model-sizing-behavior). 

However, the documentation does not mention what the token multiplier for the model is, so I ran a bunch of API requests on a 10.24 megapixel image at various aspect ratios to check the most usage an image can incur. The numbers are the server-reported token cost of the vision request.

<img width="1221" height="417" alt="Image" src="https://github.com/user-attachments/assets/638b4831-140f-4bc1-928e-d9ca0f08fcfd" />

According to the table, `detail: "original"` vision payloads consume, at maximum, 12,000 tokens (which implies a 1.2x token multiplier on 10,000 patches).

Relevant code:

`codex-rs/core/src/context_manager/history.rs`
  - `RESIZED_IMAGE_BYTES_ESTIMATE`
  - `ORIGINAL_IMAGE_PATCH_SIZE`
  - `estimate_original_image_bytes`
  - `image_data_url_estimate_adjustment`
  
`codex-rs/utils/string/src/truncate.rs`
  - `APPROX_BYTES_PER_TOKEN = 4`
  - `approx_token_count`
  - `approx_bytes_for_tokens`
  - `approx_tokens_from_byte_count`


## Impact

- `ContextManager::get_total_token_usage` adds estimated tokens for items after the last model-generated item.
- `session/turn.rs` uses total token usage to decide pre-turn and mid-turn auto-compaction.
- `compact_remote.rs` trims function-call history while `estimate_token_count_with_base_instructions` exceeds the model context window.
- `recompute_token_usage` can synthesize token-count events from the estimate.

So one or multiple large, original-detail images can make Codex believe the session is larger than the API actually sees, causing premature compaction or trimming.


## Replication
Generate a 6000x6000 pixel image.  Provide the path to that image and tell codex: "Please use the view image tool 10 times on the same image, at original resolution". Codex will go into an endless compaction loop because the local byte estimates far exceed the compaction threshold. 

## Expected behavior

`detail: "original"` image estimation should be capped at 12,000 tokens (~48,000 bytes)

Image	Local estimated image bytes	Local estimated image tokens
`6000x6000`	`141,376`	`35,344`
`10000x10000`	`391,876`	`97,969`
`12000x12000`	`562,500`	`140,625`
`20000x20000`	`1,562,500`	`390,625`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detail: "original" token/byte estimates are currently unbounded #19806

Summary

Research

Impact

Replication

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

detail: "original" token/byte estimates are currently unbounded #19806

Description

Summary

Research

Impact

Replication

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions