Skip to content

fix(responses): handle response.incomplete streaming event in Responses->Chat transform#27266

Open
VANDRANKI wants to merge 1 commit into
BerriAI:mainfrom
VANDRANKI:fix/responses-incomplete-event-handler
Open

fix(responses): handle response.incomplete streaming event in Responses->Chat transform#27266
VANDRANKI wants to merge 1 commit into
BerriAI:mainfrom
VANDRANKI:fix/responses-incomplete-event-handler

Conversation

@VANDRANKI

Copy link
Copy Markdown
Contributor

Summary

The Responses API streaming transform (LiteLLMResponsesAPIStreamingIterator) did not handle the response.incomplete event type, which is sent by Azure OpenAI when generation ends due to max_output_tokens being reached or a content filter trigger. The event fell through to the else: pass branch, silently discarding incomplete_details and content_filters.

Root cause

In litellm/completion_extras/litellm_responses_transformation/transformation.py, the _handle_event method handled response.completed, response.failed, and response.cancelled but had no branch for response.incomplete.

Fix

Add an elif event_type == "response.incomplete": handler that:

  • Maps incomplete_details.reason to a standard finish_reason:
    • "max_output_tokens""length"
    • "content_filter""content_filter"
    • anything else → "stop"
  • Forwards content_filters and incomplete_details via provider_specific_fields so callers can inspect the raw values
  • Extracts usage from the event if present
  • Returns a terminal ModelResponseStream with the correct finish_reason, matching the pattern already used by response.failed and response.cancelled

Fixes #27186

…es->Chat transform

The responses->chat streaming transformer handled response.completed but had
no branch for response.incomplete. When Azure OpenAI (or any Responses-API
compatible provider) returned a response.incomplete event (e.g. due to a
content filter or max_output_tokens limit), the code fell through to the
"Unhandled event" path, logged a debug line, and returned an empty chunk.
This caused the terminal metadata (content_filters, incomplete_details) to
be silently dropped and the stream ended without a proper finish_reason.

Fix: add an explicit handler for response.incomplete that:
- Maps incomplete_details.reason to finish_reason (max_output_tokens -> length,
  content_filter -> content_filter, anything else -> stop)
- Forwards content_filters and incomplete_details via provider_specific_fields
  so downstream custom loggers and guardrail hooks can inspect them
- Extracts and transforms usage if present

Fixes BerriAI#27186
@codspeed-hq

codspeed-hq Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing VANDRANKI:fix/responses-incomplete-event-handler (40bd8d5) with main (6ff668c)

Open in CodSpeed

@greptile-apps

greptile-apps Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds handling for the response.incomplete streaming event in OpenAiResponsesToChatCompletionStreamIterator.translate_responses_chunk_to_openai_stream, which previously fell through to an else: pass branch and silently discarded incomplete_details and content_filters from Azure OpenAI responses.

  • The new handler maps incomplete_details.reason to a finish_reason and forwards raw fields via provider_specific_fields, following the shape of the response.completed handler.
  • The default fallback finish_reason = "stop" is inconsistent with the existing non-streaming _map_responses_status_to_finish_reason method, which maps any "incomplete" status to "length", and could mislead callers into treating a truncated response as a normal completion.
  • No unit tests are added for the new streaming branch, leaving the finish-reason mapping and provider_specific_fields logic unverified by CI.

Confidence Score: 3/5

The change correctly fixes the silent discard of incomplete events, but the fallback finish_reason disagrees with the non-streaming code path and there are no tests to catch regressions.

The new handler introduces a finish_reason of 'stop' as default for unrecognised incomplete_details.reason values, contradicting the established non-streaming behaviour of mapping every incomplete response to 'length'. Callers relying on finish_reason to detect truncation would silently receive the wrong signal for any novel Azure reason codes. Additionally, the fix ships with no unit tests, so the mapping logic, provider_specific_fields passthrough, and usage extraction are all invisible to CI.

litellm/completion_extras/litellm_responses_transformation/transformation.py — specifically the new response.incomplete handler and its fallback finish_reason value

Important Files Changed

Filename Overview
litellm/completion_extras/litellm_responses_transformation/transformation.py Adds a response.incomplete branch in the streaming chunk translator; the finish_reason fallback defaults to "stop" which contradicts the existing non-streaming mapping of "incomplete" → "length", and no tests cover the new path.

Reviews (1): Last reviewed commit: "fix(responses): handle response.incomple..." | Re-trigger Greptile

Comment on lines +1372 to +1377
if reason == "max_output_tokens":
finish_reason = "length"
elif reason == "content_filter":
finish_reason = "content_filter"
else:
finish_reason = "stop"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The default fallback finish_reason of "stop" is inconsistent with both the existing non-streaming code and the semantics of a response.incomplete event. The static _map_responses_status_to_finish_reason method (line 1046) maps any "incomplete" status to "length". A caller receiving finish_reason="stop" when the real cause is an unknown truncation reason will incorrectly conclude the generation completed normally, suppressing retry or truncation-handling logic.

Suggested change
if reason == "max_output_tokens":
finish_reason = "length"
elif reason == "content_filter":
finish_reason = "content_filter"
else:
finish_reason = "stop"
if reason == "max_output_tokens":
finish_reason = "length"
elif reason == "content_filter":
finish_reason = "content_filter"
else:
finish_reason = "length"

Comment on lines +1358 to +1408
elif event_type == "response.incomplete":
# Response ended early (e.g. content_filter or max_output_tokens).
# Map incomplete_details.reason to a finish_reason so downstream
# callbacks and guardrails receive a terminal chunk instead of an
# empty unhandled-event chunk.
response_data = parsed_chunk.get("response", {})
incomplete_details = (
response_data.get("incomplete_details") if response_data else None
)
reason = (
incomplete_details.get("reason") if incomplete_details else None
)
# Map Responses API reason -> Chat Completions finish_reason
finish_reason: str
if reason == "max_output_tokens":
finish_reason = "length"
elif reason == "content_filter":
finish_reason = "content_filter"
else:
finish_reason = "stop"

# Surface content_filters and incomplete_details via provider_specific_fields
# so that custom loggers and guardrail hooks can inspect them.
provider_specific: Dict[str, Any] = {}
if incomplete_details:
provider_specific["incomplete_details"] = incomplete_details
content_filters = (
response_data.get("content_filters") if response_data else None
)
if content_filters:
provider_specific["content_filters"] = content_filters

usage = None
if response_data and response_data.get("usage"):
from litellm.responses.utils import ResponseAPILoggingUtils

usage = ResponseAPILoggingUtils._transform_response_api_usage_to_chat_usage(
response_data.get("usage")
)

return ModelResponseStream(
choices=[
StreamingChoices(
index=0,
delta=Delta(content=""),
finish_reason=finish_reason,
)
],
usage=usage,
provider_specific_fields=provider_specific if provider_specific else None,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No tests added for this handler

The PR fixes a silent data-loss bug (issue #27186) but adds no unit tests to test_completion_extras_litellm_responses_transformation_transformation.py covering the new response.incomplete branch in translate_responses_chunk_to_openai_stream. Without a test, the finish-reason mapping, provider_specific_fields population, and usage extraction paths are all unverified and invisible to CI. A regression that reverts this fix would not be caught.

Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)

@codecov

codecov Bot commented May 6, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 20 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...litellm_responses_transformation/transformation.py 0.00% 20 Missing ⚠️

📢 Thoughts on this report? Let us know!


usage = ResponseAPILoggingUtils._transform_response_api_usage_to_chat_usage(
response_data.get("usage")
)
Bojun-Vvibe added a commit to Bojun-Vvibe/oss-contributions that referenced this pull request May 6, 2026
- BerriAI/litellm#27266 handle response.incomplete in Responses->Chat transform [merge-after-nits]
- BerriAI/litellm#27259 add module docstring + regression test for render smoke [merge-as-is]
- google-gemini/gemini-cli#26559 implement OIDC auth provider for A2A remote agents [merge-after-nits]
- QwenLM/qwen-code#3861 preserve comments via comment-json on settings migration [merge-after-nits]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants