What happened?
Assistant turns on amazon-bedrock with adaptive-thinking Claude models (Opus 4.6, Opus 4.7, Sonnet 4.6) are silently truncated at exactly 4096 output tokens with stopReason: "length", regardless of the model's actual maxTokens (128000 in pi-ai's registry) or any user-configured modelOverrides.maxTokens.
Calling the same model directly via the AWS CLI with explicit maxTokens: 6000 returns cleanly with outputTokens: 6000. AWS will produce well past 4096 when the field is set. Cap is on Pi's side.
Steps to reproduce
- Run pi against a Bedrock adaptive-thinking model:
pi --provider amazon-bedrock --model us.anthropic.claude-opus-4-7
- Send a single prompt that elicits >4096 output tokens of plain text, no tool use
- Inspect the session JSONL — assistant turn ends with
stopReason: "length", usage.output: 4096.
Expected behavior
Adaptive anthropic models on Bedrock should not terminate at 4096 tokens
Version
0.75.4
What happened?
Assistant turns on amazon-bedrock with adaptive-thinking Claude models (Opus 4.6, Opus 4.7, Sonnet 4.6) are silently truncated at exactly 4096 output tokens with stopReason: "length", regardless of the model's actual maxTokens (128000 in pi-ai's registry) or any user-configured modelOverrides.maxTokens.
Calling the same model directly via the AWS CLI with explicit maxTokens: 6000 returns cleanly with outputTokens: 6000. AWS will produce well past 4096 when the field is set. Cap is on Pi's side.
Steps to reproduce
stopReason: "length", usage.output: 4096.Expected behavior
Adaptive anthropic models on Bedrock should not terminate at 4096 tokens
Version
0.75.4