Summary
Chat hard-fails when the assembled prompt exceeds the model's context window. The conversation context is not trimmed or summarized to fit, so long chats become unusable instead of degrading gracefully.
AI error in chat: Error: 413 ... {"type":"error","error":{"type":"invalid_request_error","message":"prompt is too long: 206134 tokens > 200000 maximum"}} ... code "413"
Reported on app 2.5.4 (macOS).
Expected
When the assembled context would exceed the model window, trim or summarize older turns (and cap injected history and retrieved context) so the request fits, rather than returning a hard 413 to the user.
Note
The recent change to always inject full conversation history (#3636) increases prompt size and may make this more likely. A token-budget guard on the final assembled prompt would address both at once.
Possibly related: closed #2570 (chat overflow errors).
Source: in-app feedback, Jun 5 2026.
Summary
Chat hard-fails when the assembled prompt exceeds the model's context window. The conversation context is not trimmed or summarized to fit, so long chats become unusable instead of degrading gracefully.
Reported on app 2.5.4 (macOS).
Expected
When the assembled context would exceed the model window, trim or summarize older turns (and cap injected history and retrieved context) so the request fits, rather than returning a hard 413 to the user.
Note
The recent change to always inject full conversation history (#3636) increases prompt size and may make this more likely. A token-budget guard on the final assembled prompt would address both at once.
Possibly related: closed #2570 (chat overflow errors).
Source: in-app feedback, Jun 5 2026.