Skip to content

Responses API: misleading error on context overflow, must communicate token limit exceeded #19

@marksverdhei

Description

@marksverdhei

Bug Description

When the conversation context hits the ctx-size limit during a /v1/responses tool-calling session, the API returns a misleading 500 error that provides no indication of the actual cause (context overflow). This creates unrecoverable retry loops in agentic clients like Codex CLI.

Reproduction

  1. Use Codex CLI (or any agentic client) against /v1/responses with a model like qwen3.5-27b at ctx-size=32768
  2. Let the conversation grow through tool call rounds until context approaches 32K tokens
  3. Observe the failure cascade

What happens

Phase 1: Silent truncation

When context reaches the limit, the model generates a truncated tool call (e.g., only 35 tokens before hitting the ceiling). The server returns this as HTTP 200 with:

  • "status": "completed" (should be "incomplete")
  • Truncated, invalid JSON in the tool call arguments (e.g., {"command": "cat s — cut off at column 20)
  • finish_reason: "length" is set but no incomplete_details object

Phase 2: Misleading error on next request

The client includes the truncated tool call in conversation history for the next request. The func_args_not_string() function in common/chat.cpp:1407 tries to parse it and throws:

Failed to parse tool call arguments as JSON: [json.exception.parse_error.101] 
parse error at line 1, column 20: syntax error while parsing object - 
unexpected end of input; expected '}'

This is returned as a generic 500 with no indication that context overflow was the root cause. The client retries endlessly since the conversation only grows.

Expected behavior

Two things need to happen:

1. Truncated responses must signal "status": "incomplete"

Both to_json_oaicompat_resp() (line ~973) and to_json_oaicompat_resp_stream() (line ~1080) currently hardcode "status": "completed". When stop == STOP_TYPE_LIMIT, the response must:

  • Set "status": "incomplete"
  • Include "incomplete_details": {"reason": "max_output_tokens"} (per OpenAI spec)
  • Either omit the truncated tool call entirely, or mark it as incomplete so the client doesn't try to use it

2. Input validation must return a clear, actionable error

When func_args_not_string() fails to parse tool call arguments from input messages, the error should check whether the token count is near the context limit and return:

  • HTTP 400 (not 500 — the input is malformed, it's not a server error)
  • A message that explains the actual cause, e.g.:
    Tool call arguments in message history contain invalid JSON (truncated at column 20). 
    This typically happens when a previous response was truncated due to context length limits. 
    Consider reducing conversation history or increasing ctx-size.
    

At minimum, even without the smart detection, the error should be a 400 with a message like:

Invalid tool call arguments in input messages: JSON parse error at column 20 (unexpected end of input). 
Check that all tool_calls in the conversation history contain valid JSON arguments.

Relevant code

  • common/chat.cpp:1396-1414func_args_not_string() throws the misleading error
  • tools/server/server-task.cpp:~973to_json_oaicompat_resp() hardcodes "status": "completed"
  • tools/server/server-task.cpp:~1080to_json_oaicompat_resp_stream() same issue
  • tools/server/server-context.cpp:1250-1253 — truncation sets STOP_TYPE_LIMIT but this is not propagated to the Responses API status

Impact

This causes agentic tool-calling loops (Codex CLI, etc.) to enter infinite retry cycles when context is exhausted, with no way for the client to diagnose or recover from the issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions