Skip to content

Ensure Anthropic max_tokens clears the thinking budget#52

Merged
ScriptSmith merged 2 commits into
mainfrom
worktree-anthropic-thinking-budget
Jun 8, 2026
Merged

Ensure Anthropic max_tokens clears the thinking budget#52
ScriptSmith merged 2 commits into
mainfrom
worktree-anthropic-thinking-budget

Conversation

@ScriptSmith

Copy link
Copy Markdown
Member

No description provided.

@greptile-apps

greptile-apps Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a 400 error that occurs when Anthropic's budget_tokens (set via reasoning.effort) exceeds max_tokens — a common mismatch since both are derived independently (e.g., medium effort yields a 16 000-token budget against the 4 096 default). A new max_tokens_with_thinking_headroom helper ensures max_tokens is raised to at least budget_tokens + THINKING_OUTPUT_MARGIN (8 000) whenever fixed-budget thinking is active.

  • Introduces max_tokens_with_thinking_headroom that only ever raises max_tokens — generous caller-supplied values, adaptive thinking, and disabled thinking are left untouched — and emits a tracing::debug! log when a raise occurs.
  • Applied consistently to both the chat-completions path (create_chat_completion) and the Responses API path (create_responses); the legacy create_completion stub is unaffected as it always returns an error.
  • Four unit tests cover the key boundary cases: below budget, above budget+margin (preserved), between budget and budget+margin (still raised), and non-fixed thinking modes (unchanged).

Confidence Score: 5/5

Safe to merge — the change only raises max_tokens when needed and cannot reduce it, so existing behaviour for callers who already set a high value or use adaptive/disabled thinking is unchanged.

The helper is purely additive: it raises max_tokens or leaves it alone, never lowers it. Both active call sites are updated, the legacy stub is correctly unaffected, overflow is guarded with saturating_add, and the four unit tests cover all meaningful boundary conditions including the edge case between budget and budget+margin.

No files require special attention.

Important Files Changed

Filename Overview
src/providers/anthropic/mod.rs Adds max_tokens_with_thinking_headroom helper with correct saturating_add overflow guard, applies it in both live call paths, and backs it with comprehensive boundary tests. No issues found.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming request with reasoning.effort] --> B{Resolve max_tokens from payload / config / default}
    B --> C[convert_*_reasoning_config produces thinking + output_config]
    C --> D{thinking variant?}
    D -- "Enabled { budget_tokens }" --> E["desired = budget_tokens + THINKING_OUTPUT_MARGIN (8 000)"]
    E --> F{"max_tokens >= desired?"}
    F -- "Yes (caller set generous value)" --> G[Keep max_tokens unchanged]
    F -- "No (too low)" --> H[Raise max_tokens to desired, emit tracing::debug!]
    D -- "Adaptive / Disabled / None" --> G
    G --> I[Build AnthropicRequest with final max_tokens]
    H --> I
    I --> J[POST /v1/messages]
Loading

Reviews (3): Last reviewed commit: "Review fixes" | Re-trigger Greptile

Comment thread src/providers/anthropic/mod.rs
Comment thread src/providers/anthropic/mod.rs
@ScriptSmith

Copy link
Copy Markdown
Member Author

@greptile-apps

@ScriptSmith ScriptSmith merged commit c322ff1 into main Jun 8, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant