Skip to content

detail: "original" token/byte estimates are currently unbounded #19806

@miraclebakelaser

Description

@miraclebakelaser

Summary

Codex locally estimates detail: "original" image token usage by counting raw 32px patches across the decoded source image, with no upper bound for the image sizing caps that GPT-5.4/GPT-5.5 apply server-side.

In the Computer Use guide, the documentation states that detail: "original preserves the full screenshot resolution of up to 10.24 megapixels before images are downscaled.

That can hugely overestimate context usage for large images and materially affect local behavior such as auto-compaction and remote compaction trimming. Uncapped estimates, in practice, can lead to endless compaction loops when the original image bytes estimate significantly diverge from the true token cost of consuming that image.

Research

The estimator in codex-rs/core/src/context_manager/history.rs replaces inline base64 image payload bytes with a model-visible byte estimate:

  • non-original images use RESIZED_IMAGE_BYTES_ESTIMATE = 7373
  • original-detail images call estimate_original_image_bytes
  • estimate_original_image_bytes decodes the image, reads dimensions, computes:
ceil(width / 32) * ceil(height / 32) patches

and returns:

approx_bytes_for_tokens(patch_count) == patch_count * 4 bytes

Because the later token estimate is ceil(bytes / 4), the original image estimate is effectively one token per raw 32px patch. However, the local estimator not cap the token estimate. This is non consistent with how tokens are counted on the server.

Example local estimates:

Image Local estimated image bytes Local estimated image tokens
6000x6000 141,376 35,344
10000x10000 391,876 97,969
12000x12000 562,500 140,625
20000x20000 1,562,500 390,625

OpenAI's image sizing docs for GPT-5.4/GPT-5.5 say original allows up to 10,000 patches or a 6000px max dimension, and images over either limit are resized while preserving aspect ratio (https://developers.openai.com/api/docs/guides/images-vision#model-sizing-behavior).

However, the documentation does not mention what the token multiplier for the model is, so I ran a bunch of API requests on a 10.24 megapixel image at various aspect ratios to check the most usage an image can incur. The numbers are the server-reported token cost of the vision request.

Image

According to the table, detail: "original" vision payloads consume, at maximum, 12,000 tokens (which implies a 1.2x token multiplier on 10,000 patches).

Relevant code:

codex-rs/core/src/context_manager/history.rs

  • RESIZED_IMAGE_BYTES_ESTIMATE
  • ORIGINAL_IMAGE_PATCH_SIZE
  • estimate_original_image_bytes
  • image_data_url_estimate_adjustment

codex-rs/utils/string/src/truncate.rs

  • APPROX_BYTES_PER_TOKEN = 4
  • approx_token_count
  • approx_bytes_for_tokens
  • approx_tokens_from_byte_count

Impact

  • ContextManager::get_total_token_usage adds estimated tokens for items after the last model-generated item.
  • session/turn.rs uses total token usage to decide pre-turn and mid-turn auto-compaction.
  • compact_remote.rs trims function-call history while estimate_token_count_with_base_instructions exceeds the model context window.
  • recompute_token_usage can synthesize token-count events from the estimate.

So one or multiple large, original-detail images can make Codex believe the session is larger than the API actually sees, causing premature compaction or trimming.

Replication

Generate a 6000x6000 pixel image. Provide the path to that image and tell codex: "Please use the view image tool 10 times on the same image, at original resolution". Codex will go into an endless compaction loop because the local byte estimates far exceed the compaction threshold.

Expected behavior

detail: "original" image estimation should be capped at 12,000 tokens (~48,000 bytes)

Metadata

Metadata

Assignees

No one assigned

    Labels

    CLIIssues related to the Codex CLIbugSomething isn't workingcontextIssues related to context management (including compaction)rate-limitsIssues related to rate limits, quotas, and token usage reporting

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions