fix(responses): handle response.incomplete streaming event in Responses->Chat transform by VANDRANKI · Pull Request #27266 · BerriAI/litellm

VANDRANKI · 2026-05-06T01:25:05Z

Summary

The Responses API streaming transform (LiteLLMResponsesAPIStreamingIterator) did not handle the response.incomplete event type, which is sent by Azure OpenAI when generation ends due to max_output_tokens being reached or a content filter trigger. The event fell through to the else: pass branch, silently discarding incomplete_details and content_filters.

Root cause

In litellm/completion_extras/litellm_responses_transformation/transformation.py, the _handle_event method handled response.completed, response.failed, and response.cancelled but had no branch for response.incomplete.

Fix

Add an elif event_type == "response.incomplete": handler that:

Maps incomplete_details.reason to a standard finish_reason:
- "max_output_tokens" → "length"
- "content_filter" → "content_filter"
- anything else → "stop"
Forwards content_filters and incomplete_details via provider_specific_fields so callers can inspect the raw values
Extracts usage from the event if present
Returns a terminal ModelResponseStream with the correct finish_reason, matching the pattern already used by response.failed and response.cancelled

Fixes #27186

…es->Chat transform The responses->chat streaming transformer handled response.completed but had no branch for response.incomplete. When Azure OpenAI (or any Responses-API compatible provider) returned a response.incomplete event (e.g. due to a content filter or max_output_tokens limit), the code fell through to the "Unhandled event" path, logged a debug line, and returned an empty chunk. This caused the terminal metadata (content_filters, incomplete_details) to be silently dropped and the stream ended without a proper finish_reason. Fix: add an explicit handler for response.incomplete that: - Maps incomplete_details.reason to finish_reason (max_output_tokens -> length, content_filter -> content_filter, anything else -> stop) - Forwards content_filters and incomplete_details via provider_specific_fields so downstream custom loggers and guardrail hooks can inspect them - Extracts and transforms usage if present Fixes BerriAI#27186

codspeed-hq · 2026-05-06T01:27:24Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing VANDRANKI:fix/responses-incomplete-event-handler (40bd8d5) with main (6ff668c)}

greptile-apps · 2026-05-06T01:27:57Z

Greptile Summary

Adds handling for the response.incomplete streaming event in OpenAiResponsesToChatCompletionStreamIterator.translate_responses_chunk_to_openai_stream, which previously fell through to an else: pass branch and silently discarded incomplete_details and content_filters from Azure OpenAI responses.

The new handler maps incomplete_details.reason to a finish_reason and forwards raw fields via provider_specific_fields, following the shape of the response.completed handler.
The default fallback finish_reason = "stop" is inconsistent with the existing non-streaming _map_responses_status_to_finish_reason method, which maps any "incomplete" status to "length", and could mislead callers into treating a truncated response as a normal completion.
No unit tests are added for the new streaming branch, leaving the finish-reason mapping and provider_specific_fields logic unverified by CI.

Confidence Score: 3/5

The change correctly fixes the silent discard of incomplete events, but the fallback finish_reason disagrees with the non-streaming code path and there are no tests to catch regressions.

The new handler introduces a finish_reason of 'stop' as default for unrecognised incomplete_details.reason values, contradicting the established non-streaming behaviour of mapping every incomplete response to 'length'. Callers relying on finish_reason to detect truncation would silently receive the wrong signal for any novel Azure reason codes. Additionally, the fix ships with no unit tests, so the mapping logic, provider_specific_fields passthrough, and usage extraction are all invisible to CI.

litellm/completion_extras/litellm_responses_transformation/transformation.py — specifically the new response.incomplete handler and its fallback finish_reason value

Important Files Changed

Filename	Overview
litellm/completion_extras/litellm_responses_transformation/transformation.py	Adds a `response.incomplete` branch in the streaming chunk translator; the finish_reason fallback defaults to "stop" which contradicts the existing non-streaming mapping of "incomplete" → "length", and no tests cover the new path.

_{Reviews (1): Last reviewed commit: "fix(responses): handle response.incomple..." | Re-trigger Greptile}

greptile-apps · 2026-05-06T01:28:01Z

+            if reason == "max_output_tokens":
+                finish_reason = "length"
+            elif reason == "content_filter":
+                finish_reason = "content_filter"
+            else:
+                finish_reason = "stop"


The default fallback finish_reason of "stop" is inconsistent with both the existing non-streaming code and the semantics of a response.incomplete event. The static _map_responses_status_to_finish_reason method (line 1046) maps any "incomplete" status to "length". A caller receiving finish_reason="stop" when the real cause is an unknown truncation reason will incorrectly conclude the generation completed normally, suppressing retry or truncation-handling logic.

Suggested change

if reason == "max_output_tokens":

finish_reason = "length"

elif reason == "content_filter":

finish_reason = "content_filter"

else:

finish_reason = "stop"

if reason == "max_output_tokens":

finish_reason = "length"

elif reason == "content_filter":

finish_reason = "content_filter"

else:

finish_reason = "length"

greptile-apps · 2026-05-06T01:28:02Z

+        elif event_type == "response.incomplete":
+            # Response ended early (e.g. content_filter or max_output_tokens).
+            # Map incomplete_details.reason to a finish_reason so downstream
+            # callbacks and guardrails receive a terminal chunk instead of an
+            # empty unhandled-event chunk.
+            response_data = parsed_chunk.get("response", {})
+            incomplete_details = (
+                response_data.get("incomplete_details") if response_data else None
+            )
+            reason = (
+                incomplete_details.get("reason") if incomplete_details else None
+            )
+            # Map Responses API reason -> Chat Completions finish_reason
+            finish_reason: str
+            if reason == "max_output_tokens":
+                finish_reason = "length"
+            elif reason == "content_filter":
+                finish_reason = "content_filter"
+            else:
+                finish_reason = "stop"
+
+            # Surface content_filters and incomplete_details via provider_specific_fields
+            # so that custom loggers and guardrail hooks can inspect them.
+            provider_specific: Dict[str, Any] = {}
+            if incomplete_details:
+                provider_specific["incomplete_details"] = incomplete_details
+            content_filters = (
+                response_data.get("content_filters") if response_data else None
+            )
+            if content_filters:
+                provider_specific["content_filters"] = content_filters
+
+            usage = None
+            if response_data and response_data.get("usage"):
+                from litellm.responses.utils import ResponseAPILoggingUtils
+
+                usage = ResponseAPILoggingUtils._transform_response_api_usage_to_chat_usage(
+                    response_data.get("usage")
+                )
+
+            return ModelResponseStream(
+                choices=[
+                    StreamingChoices(
+                        index=0,
+                        delta=Delta(content=""),
+                        finish_reason=finish_reason,
+                    )
+                ],
+                usage=usage,
+                provider_specific_fields=provider_specific if provider_specific else None,
+            )


No tests added for this handler

The PR fixes a silent data-loss bug (issue #27186) but adds no unit tests to test_completion_extras_litellm_responses_transformation_transformation.py covering the new response.incomplete branch in translate_responses_chunk_to_openai_stream. Without a test, the finish-reason mapping, provider_specific_fields population, and usage extraction paths are all unverified and invisible to CI. A regression that reverts this fix would not be caught.

Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)

codecov · 2026-05-06T01:28:33Z

Codecov Report

❌ Patch coverage is 0% with 20 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...litellm_responses_transformation/transformation.py	0.00%	20 Missing ⚠️

📢 Thoughts on this report? Let us know!

+
+                usage = ResponseAPILoggingUtils._transform_response_api_usage_to_chat_usage(
+                    response_data.get("usage")
+                )


- BerriAI/litellm#27266 handle response.incomplete in Responses->Chat transform [merge-after-nits] - BerriAI/litellm#27259 add module docstring + regression test for render smoke [merge-as-is] - google-gemini/gemini-cli#26559 implement OIDC auth provider for A2A remote agents [merge-after-nits] - QwenLM/qwen-code#3861 preserve comments via comment-json on settings migration [merge-after-nits]

greptile-apps Bot reviewed May 6, 2026

View reviewed changes

github-advanced-security AI found potential problems May 6, 2026

View reviewed changes

Comment thread litellm/completion_extras/litellm_responses_transformation/transformation.py

usage = ResponseAPILoggingUtils._transform_response_api_usage_to_chat_usage(

response_data.get("usage")

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(responses): handle response.incomplete streaming event in Responses->Chat transform#27266

fix(responses): handle response.incomplete streaming event in Responses->Chat transform#27266
VANDRANKI wants to merge 1 commit into
BerriAI:mainfrom
VANDRANKI:fix/responses-incomplete-event-handler

VANDRANKI commented May 6, 2026

Uh oh!

codspeed-hq Bot commented May 6, 2026

Uh oh!

greptile-apps Bot commented May 6, 2026

Important Files Changed

Uh oh!

greptile-apps Bot May 6, 2026

Uh oh!

greptile-apps Bot May 6, 2026

Uh oh!

codecov Bot commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

VANDRANKI commented May 6, 2026

Summary

Root cause

Fix

Uh oh!

codspeed-hq Bot commented May 6, 2026

Merging this PR will not alter performance

Uh oh!

greptile-apps Bot commented May 6, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

greptile-apps Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 6, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants