fix(responses): handle response.incomplete streaming event in Responses->Chat transform#27266
fix(responses): handle response.incomplete streaming event in Responses->Chat transform#27266VANDRANKI wants to merge 1 commit into
Conversation
…es->Chat transform The responses->chat streaming transformer handled response.completed but had no branch for response.incomplete. When Azure OpenAI (or any Responses-API compatible provider) returned a response.incomplete event (e.g. due to a content filter or max_output_tokens limit), the code fell through to the "Unhandled event" path, logged a debug line, and returned an empty chunk. This caused the terminal metadata (content_filters, incomplete_details) to be silently dropped and the stream ended without a proper finish_reason. Fix: add an explicit handler for response.incomplete that: - Maps incomplete_details.reason to finish_reason (max_output_tokens -> length, content_filter -> content_filter, anything else -> stop) - Forwards content_filters and incomplete_details via provider_specific_fields so downstream custom loggers and guardrail hooks can inspect them - Extracts and transforms usage if present Fixes BerriAI#27186
Greptile SummaryAdds handling for the
Confidence Score: 3/5The change correctly fixes the silent discard of incomplete events, but the fallback finish_reason disagrees with the non-streaming code path and there are no tests to catch regressions. The new handler introduces a finish_reason of 'stop' as default for unrecognised incomplete_details.reason values, contradicting the established non-streaming behaviour of mapping every incomplete response to 'length'. Callers relying on finish_reason to detect truncation would silently receive the wrong signal for any novel Azure reason codes. Additionally, the fix ships with no unit tests, so the mapping logic, provider_specific_fields passthrough, and usage extraction are all invisible to CI. litellm/completion_extras/litellm_responses_transformation/transformation.py — specifically the new response.incomplete handler and its fallback finish_reason value
|
| Filename | Overview |
|---|---|
| litellm/completion_extras/litellm_responses_transformation/transformation.py | Adds a response.incomplete branch in the streaming chunk translator; the finish_reason fallback defaults to "stop" which contradicts the existing non-streaming mapping of "incomplete" → "length", and no tests cover the new path. |
Reviews (1): Last reviewed commit: "fix(responses): handle response.incomple..." | Re-trigger Greptile
| if reason == "max_output_tokens": | ||
| finish_reason = "length" | ||
| elif reason == "content_filter": | ||
| finish_reason = "content_filter" | ||
| else: | ||
| finish_reason = "stop" |
There was a problem hiding this comment.
The default fallback
finish_reason of "stop" is inconsistent with both the existing non-streaming code and the semantics of a response.incomplete event. The static _map_responses_status_to_finish_reason method (line 1046) maps any "incomplete" status to "length". A caller receiving finish_reason="stop" when the real cause is an unknown truncation reason will incorrectly conclude the generation completed normally, suppressing retry or truncation-handling logic.
| if reason == "max_output_tokens": | |
| finish_reason = "length" | |
| elif reason == "content_filter": | |
| finish_reason = "content_filter" | |
| else: | |
| finish_reason = "stop" | |
| if reason == "max_output_tokens": | |
| finish_reason = "length" | |
| elif reason == "content_filter": | |
| finish_reason = "content_filter" | |
| else: | |
| finish_reason = "length" |
| elif event_type == "response.incomplete": | ||
| # Response ended early (e.g. content_filter or max_output_tokens). | ||
| # Map incomplete_details.reason to a finish_reason so downstream | ||
| # callbacks and guardrails receive a terminal chunk instead of an | ||
| # empty unhandled-event chunk. | ||
| response_data = parsed_chunk.get("response", {}) | ||
| incomplete_details = ( | ||
| response_data.get("incomplete_details") if response_data else None | ||
| ) | ||
| reason = ( | ||
| incomplete_details.get("reason") if incomplete_details else None | ||
| ) | ||
| # Map Responses API reason -> Chat Completions finish_reason | ||
| finish_reason: str | ||
| if reason == "max_output_tokens": | ||
| finish_reason = "length" | ||
| elif reason == "content_filter": | ||
| finish_reason = "content_filter" | ||
| else: | ||
| finish_reason = "stop" | ||
|
|
||
| # Surface content_filters and incomplete_details via provider_specific_fields | ||
| # so that custom loggers and guardrail hooks can inspect them. | ||
| provider_specific: Dict[str, Any] = {} | ||
| if incomplete_details: | ||
| provider_specific["incomplete_details"] = incomplete_details | ||
| content_filters = ( | ||
| response_data.get("content_filters") if response_data else None | ||
| ) | ||
| if content_filters: | ||
| provider_specific["content_filters"] = content_filters | ||
|
|
||
| usage = None | ||
| if response_data and response_data.get("usage"): | ||
| from litellm.responses.utils import ResponseAPILoggingUtils | ||
|
|
||
| usage = ResponseAPILoggingUtils._transform_response_api_usage_to_chat_usage( | ||
| response_data.get("usage") | ||
| ) | ||
|
|
||
| return ModelResponseStream( | ||
| choices=[ | ||
| StreamingChoices( | ||
| index=0, | ||
| delta=Delta(content=""), | ||
| finish_reason=finish_reason, | ||
| ) | ||
| ], | ||
| usage=usage, | ||
| provider_specific_fields=provider_specific if provider_specific else None, | ||
| ) |
There was a problem hiding this comment.
No tests added for this handler
The PR fixes a silent data-loss bug (issue #27186) but adds no unit tests to test_completion_extras_litellm_responses_transformation_transformation.py covering the new response.incomplete branch in translate_responses_chunk_to_openai_stream. Without a test, the finish-reason mapping, provider_specific_fields population, and usage extraction paths are all unverified and invisible to CI. A regression that reverts this fix would not be caught.
Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
- BerriAI/litellm#27266 handle response.incomplete in Responses->Chat transform [merge-after-nits] - BerriAI/litellm#27259 add module docstring + regression test for render smoke [merge-as-is] - google-gemini/gemini-cli#26559 implement OIDC auth provider for A2A remote agents [merge-after-nits] - QwenLM/qwen-code#3861 preserve comments via comment-json on settings migration [merge-after-nits]
Summary
The Responses API streaming transform (
LiteLLMResponsesAPIStreamingIterator) did not handle theresponse.incompleteevent type, which is sent by Azure OpenAI when generation ends due tomax_output_tokensbeing reached or a content filter trigger. The event fell through to theelse: passbranch, silently discardingincomplete_detailsandcontent_filters.Root cause
In
litellm/completion_extras/litellm_responses_transformation/transformation.py, the_handle_eventmethod handledresponse.completed,response.failed, andresponse.cancelledbut had no branch forresponse.incomplete.Fix
Add an
elif event_type == "response.incomplete":handler that:incomplete_details.reasonto a standardfinish_reason:"max_output_tokens"→"length""content_filter"→"content_filter""stop"content_filtersandincomplete_detailsviaprovider_specific_fieldsso callers can inspect the raw valuesModelResponseStreamwith the correctfinish_reason, matching the pattern already used byresponse.failedandresponse.cancelledFixes #27186