Skip to content

fix(server): return 400 (not 500) for truncated tool-call args in input (#19)#109

Open
marksverdhei wants to merge 1 commit into
htfrom
fix/responses-truncated-toolcall-400
Open

fix(server): return 400 (not 500) for truncated tool-call args in input (#19)#109
marksverdhei wants to merge 1 commit into
htfrom
fix/responses-truncated-toolcall-400

Conversation

@marksverdhei

Copy link
Copy Markdown

Summary

Closes #19 (Part 2). Truncated tool-call arguments in the input message history now produce an actionable HTTP 400 instead of a misleading 500.

The bug

When a /v1/responses or /v1/chat/completions conversation hits the --ctx-size limit, the model can emit a tool call whose arguments JSON is cut off mid-string. The client echoes that truncated tool call back in the next request's history. func_args_not_string() (common/chat.cpp) then fails to json::parse it and threw std::runtime_error, which the server's ex_wrapper (server.cpp) maps to a generic 500 with no cause. Agentic clients (Codex CLI, etc.) cannot diagnose this and retry forever — the history only grows, so it never recovers.

The fix

Throw std::invalid_argument instead. ex_wrapper already maps that to 400 (ERROR_TYPE_INVALID_REQUEST), and the message now names the likely root cause and the remedy:

Invalid tool call arguments in input messages: <json parse error w/ column>.
This usually means a previous tool call was truncated because the conversation
reached the context-size limit; reduce the conversation history or increase
--ctx-size, then retry.

This matches the existing idiom in the file — OAI message-shape validation already throws std::invalid_argument for malformed input.

Part 1 was already done

The other half of #19 (status:"incomplete" + incomplete_details:{reason:"max_output_tokens"} when stop == STOP_TYPE_LIMIT) is already implemented in server-task.cpp (to_json_oaicompat_resp / _stream). This PR completes the remaining Part 2.

Test

Adds test_input_with_truncated_tool_call_arguments_returns_400 to test_tool_call.py. Notable: the truncated call must sit mid-history (followed by a tool result + user turn) — a trailing assistant tool_call is stripped by the continue-final-message path before the parse runs, so it wouldn't exercise the bug. Uses the Hermes-2-Pro tool_use template, which advertises object-arguments support (verified via llama-debug-template-parser).

unit/test_tool_call.py::test_input_with_truncated_tool_call_arguments_returns_400 PASSED
7 passed, 196 deselected   # full non-slow tool-call suite, no regressions

Backend rebuilt + smoke-tested live (llama-server + Hermes-2-Pro template): malformed mid-history args → HTTP 400; valid history unaffected.

When a tool_call in the input message history carries an `arguments`
string that is invalid JSON — which happens when a prior /v1/responses or
/v1/chat/completions reply was truncated because the conversation hit the
context-size limit — func_args_not_string() threw std::runtime_error,
which the server's ex_wrapper maps to a generic HTTP 500. Agentic clients
(Codex CLI etc.) cannot diagnose the cause and retry endlessly because the
history only grows.

Throw std::invalid_argument instead, which ex_wrapper already maps to a
400 with an actionable message pointing at context-size truncation. This
matches the existing idiom in this file (oai message validation already
throws std::invalid_argument for malformed input).

Adds a server regression test on the Hermes-2-Pro tool_use template (which
advertises object-arguments support). The truncated call must sit
mid-history followed by a tool result + user turn: a trailing assistant
tool_call is stripped by the continue-final-message path before the parse
runs, so it would not exercise the bug.

Part 1 of #19 (status:"incomplete" + incomplete_details on STOP_TYPE_LIMIT)
already shipped in server-task.cpp; this completes Part 2.

Closes #19.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Responses API: misleading error on context overflow, must communicate token limit exceeded

1 participant