Skip to content

fix(server): Responses API emits incomplete status on truncation (#19)#82

Merged
marksverdhei merged 1 commit into
htfrom
fix/responses-api-truncation-status
Jun 12, 2026
Merged

fix(server): Responses API emits incomplete status on truncation (#19)#82
marksverdhei merged 1 commit into
htfrom
fix/responses-api-truncation-status

Conversation

@marksverdhei

Copy link
Copy Markdown

Summary

Closes Phase 1 of #19: when generation hits STOP_TYPE_LIMIT (max_output_tokens / ctx-size cap), the OAI Responses code paths used to hardcode "status": "completed" on the top-level response, all output items, and the streaming response.completed SSE event. Agentic clients (Codex CLI, etc.) couldn't tell a finished response from a truncated one — they fed partial output back into conversation history, triggering JSON-parse-failure 500s on the next request and infinite retry loops.

Per the OAI Responses spec:

  • Top-level status flips to "incomplete" on truncation.
  • New top-level incomplete_details: { reason: "max_output_tokens" } field.
  • Per-item status (message / reasoning / function_call) inherits the same value, so clients can detect partial tool_calls / partial messages at the per-item level.
  • Streaming variant: final SSE event becomes response.incomplete (instead of response.completed), with the same payload shape.

What this does not do

Phase 2 of #19 (HTTP 400 + actionable message from func_args_not_string) is intentionally out of scope. That requires typed-exception plumbing through common/chat.cpp into the server error path — a separate, bigger change. Phase 1 alone prevents the cascade in the first place: once clients see truncation as truncation, they don't retry with malformed history.

Test plan

  • tools/server/tests/unit/test_compat_oai_responses.py::test_responses_truncation_emits_incomplete_status — non-streaming repro with max_output_tokens: 2 on tinyllama2 (reliably trips STOP_TYPE_LIMIT). Asserts top-level status=incomplete + incomplete_details.reason=max_output_tokens + per-item status.
  • test_responses_truncation_stream_emits_incomplete_event — streaming repro. Verifies a response.incomplete event arrives with the same payload shape.
  • Two pre-existing test_responses_with_openai_library / test_responses_stream_with_openai_library tests still pass (no happy-path regression).
  • (manual, post-merge) re-run the original Codex CLI repro from the issue to confirm the retry loop is broken.

Files touched

  • tools/server/server-task.cpp — both to_json_oaicompat_resp and to_json_oaicompat_resp_stream
  • tools/server/tests/unit/test_compat_oai_responses.py — 2 new test cases

🤖 Generated with Claude Code

When generation hits `STOP_TYPE_LIMIT` (max_output_tokens / ctx-size cap),
the OAI Responses code paths hardcoded `"status": "completed"` everywhere
— top-level response, per-message output items, function_call items, and
the streaming `response.completed` SSE event. Agentic clients (Codex CLI,
etc.) couldn't tell a finished response from a truncated one and ended
up feeding partial output back into conversation history, triggering
infinite retry loops on JSON-parse failures (issue #19, Phase 2).

Per the OAI Responses spec, branch on the stop type in:

* `server_task_result_cmpl_final::to_json_oaicompat_resp()` — emit
  `"status": "incomplete"` on the top-level response, all output items
  inherit the same status, plus `"incomplete_details": {"reason":
  "max_output_tokens"}` at the top level.
* `to_json_oaicompat_resp_stream()` — same mapping on the per-item
  statuses, plus the final SSE event becomes `response.incomplete` (vs
  `response.completed`) with `incomplete_details` on the payload.

Doesn't address Phase 2 of the issue (HTTP 400 + actionable message
from `func_args_not_string`) — that requires typed exception plumbing
through common/chat.cpp into the server error path. Phase 1 alone
prevents the cascade in the first place: clients see truncation as
truncation, not as a malformed completed response.

Test coverage in test_compat_oai_responses.py:

* `test_responses_truncation_emits_incomplete_status` — non-streaming:
  `max_output_tokens: 2` on tinyllama2 reliably trips STOP_TYPE_LIMIT;
  assert status=incomplete + incomplete_details + per-item status.
* `test_responses_truncation_stream_emits_incomplete_event` — streaming:
  same setup, verify a `response.incomplete` event arrives with the
  same payload shape.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant