fix(llm): guard OpenAI backend against malformed provider responses by sanidhyasin · Pull Request #776 · plastic-labs/honcho

sanidhyasin · 2026-06-06T07:44:44Z

Description

Closes #676.

The OpenAI chat backend (src/llm/backends/openai.py) assumed every response is a well-formed ChatCompletion with at least one choice and a usage attribute. OpenAI-compatible gateways (OpenRouter, vLLM, DashScope, LiteLLM, new-api, Groq-compatible layers, ...) sometimes return a technically-successful HTTP response whose body is missing choices, carries an empty list, or has a null message/usage. In those cases the backend crashed with a raw AttributeError/IndexError/TypeError deep inside response.choices[0].message / response.usage instead of surfacing a controlled provider error.

Changes

Add two small helpers — _first_choice() and _first_message() — that raise a controlled ValidationException when choices is None/empty or the first choice has no message.
Route the three trusting spots through them:
- complete() structured parse() path (.message.parsed / .content / .refusal)
- _normalize_response() (response.usage, choices[0].finish_reason, choices[0].message)
- _parse_or_repair_structured_content()
Missing usage now degrades to zero token counts via getattr(...) rather than raising (the CompletionResult already handled a falsy usage).

The malformed/empty case now fails as a ValidationException that points at the provider response, so callers can treat it as a provider failure (and fall back) instead of crashing.

Scope

This intentionally covers the robustness/crash-defense half of #676. The broader structured-output fallback-chain expansion (json_schema → json_object → no response_format) overlaps with the JSON-mode work in #663/#751 and is left out to keep this change focused and easy to review.

Tests

Added cases to tests/llm/test_backends/test_openai.py covering the response shapes called out in the issue:

choices=[], choices=None, no usage attribute, and message=None → controlled ValidationException
response missing usage entirely → returns a result with token counts defaulted to 0
malformed structured (parse()) response with empty choices → controlled ValidationException

uv run pytest tests/llm/test_backends/test_openai.py → 27 passed. ruff check, ruff format --check, and basedpyright on the touched files are clean.

Summary by CodeRabbit

Bug Fixes
- Tightened parsing of OpenAI-style responses to reliably detect missing or empty choices/messages and raise controlled validation errors.
- Preserved existing fallback/repair behavior when parsed output is absent but raw content exists; improved safe extraction of finish reasons and content overrides.
- Improved structured-output repair to handle malformed inputs more robustly.
Tests
- Added tests covering malformed responses, missing usage/token defaults, and structured-output validation error paths.

OpenAI-compatible gateways (OpenRouter, vLLM, DashScope, LiteLLM, ...) can return a technically-successful HTTP response whose body is missing `choices`, carries an empty list, or has a null `message`/`usage`. The backend indexed `response.choices[0].message` and read `response.usage` directly, so those shapes crashed with a raw AttributeError/IndexError/ TypeError instead of a controlled provider error. Add `_first_choice`/`_first_message` helpers that raise a ValidationException on empty/missing choices or messages, and route the `complete()` parse path, `_normalize_response`, and `_parse_or_repair_structured_content` through them. Missing `usage` now degrades to zero token counts instead of raising. Closes plastic-labs#676

coderabbitai · 2026-06-06T07:44:55Z

Worried about impact? Review this PR in Change Stack to explore blast radius before you approve or request changes.

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 19a0263f-c198-40e2-9dd0-15f3d39aab26

📥 Commits

Reviewing files that changed from the base of the PR and between 2fbdeb1 and 4951863.

📒 Files selected for processing (2)

src/llm/backends/openai.py
tests/llm/test_backends/test_openai.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/llm/test_backends/test_openai.py
src/llm/backends/openai.py

Walkthrough

Adds defensive helpers to safely extract the first choice/message from OpenAI-compatible responses and applies them across response normalization, structured completion parsing/repair, and tests to convert malformed response shapes into controlled ValidationException errors.

Changes

OpenAI response parsing robustness

Layer / File(s)	Summary
Defensive response access helpers `src/llm/backends/openai.py`	Introduces `_first_choice()` and `_first_message()` functions that safely extract the first choice and message from an OpenAI response, raising controlled `ValidationException` when choices are missing, empty, or message is None.
Structured completion and repair hardening `src/llm/backends/openai.py`	Replaces direct indexing in the structured parse path with `_first_message()`, uses attribute-safe extraction for `parsed`, raw `content`, and `refusal`, and updates the structured-content repair flow to operate on safe raw content before attempting JSON repair.
Response normalization hardening `src/llm/backends/openai.py`	Updates `_normalize_response()` to use `_first_choice()` for safe choice extraction, raises `ValidationException` when the first choice lacks a `message`, and uses `getattr()` for resilient content override extraction.
Test coverage for malformed responses `tests/llm/test_backends/test_openai.py`	Adds `BaseModel` and `ValidationException` imports and three new tests: parameterized malformed-chat assertions (empty/missing choices, missing message), a missing-`usage` token-count test, and a structured-output test asserting `ValidationException` when `parse()` returns empty `choices`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Choices hid in thickets, messages astray,
I hopped through responses to chase errors away,
Helpers now peer where the wild shapes hide,
Catching the gaps with a validating stride,
Safe paths for replies — hop, guard, and guide.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(llm): guard OpenAI backend against malformed provider responses' accurately describes the main change: hardening the OpenAI backend against malformed responses by adding validation guards and helper functions.
Linked Issues check	✅ Passed	The PR fully implements all coding requirements from issue `#676`: adds helper functions for safe choice/message extraction [`#676`], raises ValidationException for missing/empty choices or None message [`#676`], guards against AttributeError/IndexError/TypeError via safe accessors [`#676`], degrades missing usage gracefully to zero token counts [`#676`], and includes comprehensive unit tests covering malformed response shapes [`#676`].
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue `#676` objectives: helper functions (_first_choice, _first_message) for safe extraction, validation guards in three code paths (complete, _normalize_response, _parse_or_repair_structured_content), graceful degradation for missing usage, and tests covering the specified malformed response shapes.
Docstring Coverage	✅ Passed	Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/llm/test_backends/test_openai.py (1)
326-352: 💤 Low value

Test correctly verifies graceful degradation for missing usage.

The test validates that missing usage defaults token counts to 0. Optionally, you could also assert that cache_creation_input_tokens and cache_read_input_tokens are 0 for comprehensive coverage, but the current assertions are sufficient for the primary objective.
Optional: Add cache token assertions
 assert result.content == "ok"
 assert result.input_tokens == 0
 assert result.output_tokens == 0
+assert result.cache_creation_input_tokens == 0
+assert result.cache_read_input_tokens == 0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/llm/test_backends/test_openai.py` around lines 326 - 352, The test
test_openai_backend_tolerates_missing_usage should also assert that cache token
metrics default to 0 when usage is missing; after calling OpenAIBackend.complete
and obtaining result, add assertions that result.cache_creation_input_tokens ==
0 and result.cache_read_input_tokens == 0 so the test verifies graceful
degradation for all token fields produced by OpenAIBackend.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/llm/test_backends/test_openai.py`:
- Around line 326-352: The test test_openai_backend_tolerates_missing_usage
should also assert that cache token metrics default to 0 when usage is missing;
after calling OpenAIBackend.complete and obtaining result, add assertions that
result.cache_creation_input_tokens == 0 and result.cache_read_input_tokens == 0
so the test verifies graceful degradation for all token fields produced by
OpenAIBackend.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f4ac9441-c2a6-4057-8e9f-07732363ea7e

📥 Commits

Reviewing files that changed from the base of the PR and between 9f26fdd and 2fbdeb1.

📒 Files selected for processing (2)

src/llm/backends/openai.py
tests/llm/test_backends/test_openai.py

Address CodeRabbit review on plastic-labs#776: - assert cache_creation/cache_read default to 0 when usage is missing - add docstrings to the touched complete/_normalize_response/ _parse_or_repair_structured_content methods and the test's _Schema so all functions changed in this PR are documented (docstring coverage)

coderabbitai Bot reviewed Jun 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llm): guard OpenAI backend against malformed provider responses#776

fix(llm): guard OpenAI backend against malformed provider responses#776
sanidhyasin wants to merge 2 commits into
plastic-labs:mainfrom
sanidhyasin:fix/openai-backend-malformed-response-guard

sanidhyasin commented Jun 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sanidhyasin commented Jun 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Scope

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sanidhyasin commented Jun 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading