fix(server): return 400 (not 500) for truncated tool-call args in input (#19)#109
Open
marksverdhei wants to merge 1 commit into
Open
fix(server): return 400 (not 500) for truncated tool-call args in input (#19)#109marksverdhei wants to merge 1 commit into
marksverdhei wants to merge 1 commit into
Conversation
When a tool_call in the input message history carries an `arguments` string that is invalid JSON — which happens when a prior /v1/responses or /v1/chat/completions reply was truncated because the conversation hit the context-size limit — func_args_not_string() threw std::runtime_error, which the server's ex_wrapper maps to a generic HTTP 500. Agentic clients (Codex CLI etc.) cannot diagnose the cause and retry endlessly because the history only grows. Throw std::invalid_argument instead, which ex_wrapper already maps to a 400 with an actionable message pointing at context-size truncation. This matches the existing idiom in this file (oai message validation already throws std::invalid_argument for malformed input). Adds a server regression test on the Hermes-2-Pro tool_use template (which advertises object-arguments support). The truncated call must sit mid-history followed by a tool result + user turn: a trailing assistant tool_call is stripped by the continue-final-message path before the parse runs, so it would not exercise the bug. Part 1 of #19 (status:"incomplete" + incomplete_details on STOP_TYPE_LIMIT) already shipped in server-task.cpp; this completes Part 2. Closes #19.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #19 (Part 2). Truncated tool-call arguments in the input message history now produce an actionable HTTP 400 instead of a misleading 500.
The bug
When a
/v1/responsesor/v1/chat/completionsconversation hits the--ctx-sizelimit, the model can emit a tool call whoseargumentsJSON is cut off mid-string. The client echoes that truncated tool call back in the next request's history.func_args_not_string()(common/chat.cpp) then fails tojson::parseit and threwstd::runtime_error, which the server'sex_wrapper(server.cpp) maps to a generic 500 with no cause. Agentic clients (Codex CLI, etc.) cannot diagnose this and retry forever — the history only grows, so it never recovers.The fix
Throw
std::invalid_argumentinstead.ex_wrapperalready maps that to 400 (ERROR_TYPE_INVALID_REQUEST), and the message now names the likely root cause and the remedy:This matches the existing idiom in the file — OAI message-shape validation already throws
std::invalid_argumentfor malformed input.Part 1 was already done
The other half of #19 (
status:"incomplete"+incomplete_details:{reason:"max_output_tokens"}whenstop == STOP_TYPE_LIMIT) is already implemented inserver-task.cpp(to_json_oaicompat_resp/_stream). This PR completes the remaining Part 2.Test
Adds
test_input_with_truncated_tool_call_arguments_returns_400totest_tool_call.py. Notable: the truncated call must sit mid-history (followed by a tool result + user turn) — a trailing assistant tool_call is stripped by the continue-final-message path before the parse runs, so it wouldn't exercise the bug. Uses the Hermes-2-Protool_usetemplate, which advertises object-arguments support (verified viallama-debug-template-parser).Backend rebuilt + smoke-tested live (
llama-server+ Hermes-2-Pro template): malformed mid-history args →HTTP 400; valid history unaffected.