fix(llm): guard OpenAI backend against malformed provider responses#776
fix(llm): guard OpenAI backend against malformed provider responses#776sanidhyasin wants to merge 2 commits into
Conversation
OpenAI-compatible gateways (OpenRouter, vLLM, DashScope, LiteLLM, ...) can return a technically-successful HTTP response whose body is missing `choices`, carries an empty list, or has a null `message`/`usage`. The backend indexed `response.choices[0].message` and read `response.usage` directly, so those shapes crashed with a raw AttributeError/IndexError/ TypeError instead of a controlled provider error. Add `_first_choice`/`_first_message` helpers that raise a ValidationException on empty/missing choices or messages, and route the `complete()` parse path, `_normalize_response`, and `_parse_or_repair_structured_content` through them. Missing `usage` now degrades to zero token counts instead of raising. Closes plastic-labs#676
|
Worried about impact? Review this PR in Change Stack to explore blast radius before you approve or request changes. No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughAdds defensive helpers to safely extract the first choice/message from OpenAI-compatible responses and applies them across response normalization, structured completion parsing/repair, and tests to convert malformed response shapes into controlled ValidationException errors. ChangesOpenAI response parsing robustness
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/llm/test_backends/test_openai.py (1)
326-352: 💤 Low valueTest correctly verifies graceful degradation for missing usage.
The test validates that missing
usagedefaults token counts to 0. Optionally, you could also assert thatcache_creation_input_tokensandcache_read_input_tokensare 0 for comprehensive coverage, but the current assertions are sufficient for the primary objective.Optional: Add cache token assertions
assert result.content == "ok" assert result.input_tokens == 0 assert result.output_tokens == 0 +assert result.cache_creation_input_tokens == 0 +assert result.cache_read_input_tokens == 0🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/llm/test_backends/test_openai.py` around lines 326 - 352, The test test_openai_backend_tolerates_missing_usage should also assert that cache token metrics default to 0 when usage is missing; after calling OpenAIBackend.complete and obtaining result, add assertions that result.cache_creation_input_tokens == 0 and result.cache_read_input_tokens == 0 so the test verifies graceful degradation for all token fields produced by OpenAIBackend.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/llm/test_backends/test_openai.py`:
- Around line 326-352: The test test_openai_backend_tolerates_missing_usage
should also assert that cache token metrics default to 0 when usage is missing;
after calling OpenAIBackend.complete and obtaining result, add assertions that
result.cache_creation_input_tokens == 0 and result.cache_read_input_tokens == 0
so the test verifies graceful degradation for all token fields produced by
OpenAIBackend.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f4ac9441-c2a6-4057-8e9f-07732363ea7e
📒 Files selected for processing (2)
src/llm/backends/openai.pytests/llm/test_backends/test_openai.py
Address CodeRabbit review on plastic-labs#776: - assert cache_creation/cache_read default to 0 when usage is missing - add docstrings to the touched complete/_normalize_response/ _parse_or_repair_structured_content methods and the test's _Schema so all functions changed in this PR are documented (docstring coverage)
Description
Closes #676.
The OpenAI chat backend (
src/llm/backends/openai.py) assumed every response is a well-formedChatCompletionwith at least one choice and ausageattribute. OpenAI-compatible gateways (OpenRouter, vLLM, DashScope, LiteLLM, new-api, Groq-compatible layers, ...) sometimes return a technically-successful HTTP response whose body is missingchoices, carries an empty list, or has a nullmessage/usage. In those cases the backend crashed with a rawAttributeError/IndexError/TypeErrordeep insideresponse.choices[0].message/response.usageinstead of surfacing a controlled provider error.Changes
_first_choice()and_first_message()— that raise a controlledValidationExceptionwhenchoicesisNone/empty or the first choice has nomessage.complete()structuredparse()path (.message.parsed/.content/.refusal)_normalize_response()(response.usage,choices[0].finish_reason,choices[0].message)_parse_or_repair_structured_content()usagenow degrades to zero token counts viagetattr(...)rather than raising (theCompletionResultalready handled a falsyusage).The malformed/empty case now fails as a
ValidationExceptionthat points at the provider response, so callers can treat it as a provider failure (and fall back) instead of crashing.Scope
This intentionally covers the robustness/crash-defense half of #676. The broader structured-output fallback-chain expansion (
json_schema → json_object → no response_format) overlaps with the JSON-mode work in #663/#751 and is left out to keep this change focused and easy to review.Tests
Added cases to
tests/llm/test_backends/test_openai.pycovering the response shapes called out in the issue:choices=[],choices=None, nousageattribute, andmessage=None→ controlledValidationExceptionusageentirely → returns a result with token counts defaulted to0parse()) response with emptychoices→ controlledValidationExceptionuv run pytest tests/llm/test_backends/test_openai.py→ 27 passed.ruff check,ruff format --check, andbasedpyrighton the touched files are clean.Summary by CodeRabbit
Bug Fixes
Tests