Skip to content

fix(llm): guard OpenAI backend against malformed provider responses#776

Open
sanidhyasin wants to merge 2 commits into
plastic-labs:mainfrom
sanidhyasin:fix/openai-backend-malformed-response-guard
Open

fix(llm): guard OpenAI backend against malformed provider responses#776
sanidhyasin wants to merge 2 commits into
plastic-labs:mainfrom
sanidhyasin:fix/openai-backend-malformed-response-guard

Conversation

@sanidhyasin

@sanidhyasin sanidhyasin commented Jun 6, 2026

Copy link
Copy Markdown

Description

Closes #676.

The OpenAI chat backend (src/llm/backends/openai.py) assumed every response is a well-formed ChatCompletion with at least one choice and a usage attribute. OpenAI-compatible gateways (OpenRouter, vLLM, DashScope, LiteLLM, new-api, Groq-compatible layers, ...) sometimes return a technically-successful HTTP response whose body is missing choices, carries an empty list, or has a null message/usage. In those cases the backend crashed with a raw AttributeError/IndexError/TypeError deep inside response.choices[0].message / response.usage instead of surfacing a controlled provider error.

Changes

  • Add two small helpers — _first_choice() and _first_message() — that raise a controlled ValidationException when choices is None/empty or the first choice has no message.
  • Route the three trusting spots through them:
    • complete() structured parse() path (.message.parsed / .content / .refusal)
    • _normalize_response() (response.usage, choices[0].finish_reason, choices[0].message)
    • _parse_or_repair_structured_content()
  • Missing usage now degrades to zero token counts via getattr(...) rather than raising (the CompletionResult already handled a falsy usage).

The malformed/empty case now fails as a ValidationException that points at the provider response, so callers can treat it as a provider failure (and fall back) instead of crashing.

Scope

This intentionally covers the robustness/crash-defense half of #676. The broader structured-output fallback-chain expansion (json_schema → json_object → no response_format) overlaps with the JSON-mode work in #663/#751 and is left out to keep this change focused and easy to review.

Tests

Added cases to tests/llm/test_backends/test_openai.py covering the response shapes called out in the issue:

  • choices=[], choices=None, no usage attribute, and message=None → controlled ValidationException
  • response missing usage entirely → returns a result with token counts defaulted to 0
  • malformed structured (parse()) response with empty choices → controlled ValidationException

uv run pytest tests/llm/test_backends/test_openai.py → 27 passed. ruff check, ruff format --check, and basedpyright on the touched files are clean.

Summary by CodeRabbit

  • Bug Fixes

    • Tightened parsing of OpenAI-style responses to reliably detect missing or empty choices/messages and raise controlled validation errors.
    • Preserved existing fallback/repair behavior when parsed output is absent but raw content exists; improved safe extraction of finish reasons and content overrides.
    • Improved structured-output repair to handle malformed inputs more robustly.
  • Tests

    • Added tests covering malformed responses, missing usage/token defaults, and structured-output validation error paths.

OpenAI-compatible gateways (OpenRouter, vLLM, DashScope, LiteLLM, ...)
can return a technically-successful HTTP response whose body is missing
`choices`, carries an empty list, or has a null `message`/`usage`. The
backend indexed `response.choices[0].message` and read `response.usage`
directly, so those shapes crashed with a raw AttributeError/IndexError/
TypeError instead of a controlled provider error.

Add `_first_choice`/`_first_message` helpers that raise a
ValidationException on empty/missing choices or messages, and route the
`complete()` parse path, `_normalize_response`, and
`_parse_or_repair_structured_content` through them. Missing `usage` now
degrades to zero token counts instead of raising.

Closes plastic-labs#676
@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Worried about impact? Review this PR in Change Stack to explore blast radius before you approve or request changes.

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 19a0263f-c198-40e2-9dd0-15f3d39aab26

📥 Commits

Reviewing files that changed from the base of the PR and between 2fbdeb1 and 4951863.

📒 Files selected for processing (2)
  • src/llm/backends/openai.py
  • tests/llm/test_backends/test_openai.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/llm/test_backends/test_openai.py
  • src/llm/backends/openai.py

Walkthrough

Adds defensive helpers to safely extract the first choice/message from OpenAI-compatible responses and applies them across response normalization, structured completion parsing/repair, and tests to convert malformed response shapes into controlled ValidationException errors.

Changes

OpenAI response parsing robustness

Layer / File(s) Summary
Defensive response access helpers
src/llm/backends/openai.py
Introduces _first_choice() and _first_message() functions that safely extract the first choice and message from an OpenAI response, raising controlled ValidationException when choices are missing, empty, or message is None.
Structured completion and repair hardening
src/llm/backends/openai.py
Replaces direct indexing in the structured parse path with _first_message(), uses attribute-safe extraction for parsed, raw content, and refusal, and updates the structured-content repair flow to operate on safe raw content before attempting JSON repair.
Response normalization hardening
src/llm/backends/openai.py
Updates _normalize_response() to use _first_choice() for safe choice extraction, raises ValidationException when the first choice lacks a message, and uses getattr() for resilient content override extraction.
Test coverage for malformed responses
tests/llm/test_backends/test_openai.py
Adds BaseModel and ValidationException imports and three new tests: parameterized malformed-chat assertions (empty/missing choices, missing message), a missing-usage token-count test, and a structured-output test asserting ValidationException when parse() returns empty choices.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Choices hid in thickets, messages astray,
I hopped through responses to chase errors away,
Helpers now peer where the wild shapes hide,
Catching the gaps with a validating stride,
Safe paths for replies — hop, guard, and guide.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(llm): guard OpenAI backend against malformed provider responses' accurately describes the main change: hardening the OpenAI backend against malformed responses by adding validation guards and helper functions.
Linked Issues check ✅ Passed The PR fully implements all coding requirements from issue #676: adds helper functions for safe choice/message extraction [#676], raises ValidationException for missing/empty choices or None message [#676], guards against AttributeError/IndexError/TypeError via safe accessors [#676], degrades missing usage gracefully to zero token counts [#676], and includes comprehensive unit tests covering malformed response shapes [#676].
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #676 objectives: helper functions (_first_choice, _first_message) for safe extraction, validation guards in three code paths (complete, _normalize_response, _parse_or_repair_structured_content), graceful degradation for missing usage, and tests covering the specified malformed response shapes.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/llm/test_backends/test_openai.py (1)

326-352: 💤 Low value

Test correctly verifies graceful degradation for missing usage.

The test validates that missing usage defaults token counts to 0. Optionally, you could also assert that cache_creation_input_tokens and cache_read_input_tokens are 0 for comprehensive coverage, but the current assertions are sufficient for the primary objective.

Optional: Add cache token assertions
 assert result.content == "ok"
 assert result.input_tokens == 0
 assert result.output_tokens == 0
+assert result.cache_creation_input_tokens == 0
+assert result.cache_read_input_tokens == 0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/llm/test_backends/test_openai.py` around lines 326 - 352, The test
test_openai_backend_tolerates_missing_usage should also assert that cache token
metrics default to 0 when usage is missing; after calling OpenAIBackend.complete
and obtaining result, add assertions that result.cache_creation_input_tokens ==
0 and result.cache_read_input_tokens == 0 so the test verifies graceful
degradation for all token fields produced by OpenAIBackend.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/llm/test_backends/test_openai.py`:
- Around line 326-352: The test test_openai_backend_tolerates_missing_usage
should also assert that cache token metrics default to 0 when usage is missing;
after calling OpenAIBackend.complete and obtaining result, add assertions that
result.cache_creation_input_tokens == 0 and result.cache_read_input_tokens == 0
so the test verifies graceful degradation for all token fields produced by
OpenAIBackend.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f4ac9441-c2a6-4057-8e9f-07732363ea7e

📥 Commits

Reviewing files that changed from the base of the PR and between 9f26fdd and 2fbdeb1.

📒 Files selected for processing (2)
  • src/llm/backends/openai.py
  • tests/llm/test_backends/test_openai.py

Address CodeRabbit review on plastic-labs#776:
- assert cache_creation/cache_read default to 0 when usage is missing
- add docstrings to the touched complete/_normalize_response/
  _parse_or_repair_structured_content methods and the test's _Schema so
  all functions changed in this PR are documented (docstring coverage)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] OpenAI-compatible chat responses can crash backend when choices or usage are missing

1 participant