fix(openai-backend): fallback json_schema→json_object for providers without structured output support#751
Conversation
…hat don't support structured output DeepSeek, Ollama, vLLM, Together, Qwen, and other OpenAI-compatible providers do not support response_format type 'json_schema'. Only native OpenAI models support this. Honcho uses chat.completions.parse() which sends response_format as a Pydantic model, converted to json_schema by the SDK, causing 400 BadRequestError on these providers. Fix: when _create_structured_response() gets BadRequestError, fall back to json_object mode and inject the JSON Schema into the last user message as a prompt instruction. This preserves structured output behavior on providers that only support json_object. Closes: plastic-labs#716 (Ollama/OpenAI-compatible providers deriver silently produces zero observations) Refs: plastic-labs#716, plastic-labs#670
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughAttempts json_schema with a precomputed model schema; if the provider rejects it (BadRequestError) it retries using response_format json_object while injecting the same schema as a user-message instruction. Also enforces json_object + schema-injection for typed completions when json_mode is enabled. ChangesStructured Response Fallback
🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…arams When extra_params has json_mode=True, skip the json_schema attempt entirely and directly use json_object with the schema injected into the prompt. This avoids a wasted round-trip to providers that don't support json_schema (DeepSeek, Ollama, vLLM, Together, Qwen, etc.). The fallback in _create_structured_response still catches json_schema failures for callers that don't set json_mode explicitly.
Update:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/llm/backends/openai.py`:
- Around line 392-399: The BadRequestError handler around
self._client.chat.completions.create is too broad and mutates the original
messages; change it to inspect the caught BadRequestError and only retry when
the error explicitly indicates the provider rejected response_format/json_schema
(e.g., check error.code or error.message for keywords like "json_schema" or
"response_format"), otherwise re-raise the exception; also avoid unsafe shallow
copies and in-place edits of messages—create a deep copy (or build a new
messages list) and append a new system message containing the injected schema
string (convert any multimodal/content-parts to a safe single string for the
injection) so you never mutate messages[-1]["content"] and then call
self._client.chat.completions.create with the new messages and json_object
format.
- Around line 406-413: The code mutates caller-owned nested message dicts and
unsafely coerces message content to string; fix by making a deep copy of the
messages (e.g., copy.deepcopy(json_params.get("messages", []))) and assign that
copy back into json_params["messages"], then inject the schema into the last
message only after validating its shape: if last_msg["role"] == "user" and
isinstance(last_msg["content"], str) append "\n\n" + schema_prompt, if it's a
list or other supported structured type handle appropriately or otherwise raise
a typed ValidationException to fail fast for unsupported shapes; ensure no
in-place modifications touch the original caller objects and preserve other
message types unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6d45566f-c0d2-4f1c-80e8-55fde0bfa890
📒 Files selected for processing (1)
src/llm/backends/openai.py
| try: | ||
| return await self._client.chat.completions.create(**structured_params) | ||
| except BadRequestError: | ||
| # Some providers (e.g. DeepSeek) don't support json_schema type. | ||
| # Fall back to json_object + inject schema into the system prompt. | ||
| logger.info( | ||
| "json_schema not supported by provider, falling back to json_object for %s", | ||
| response_format.__name__, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/llm/backends/openai.py"
if [ ! -f "$FILE" ]; then
echo "Missing file: $FILE" >&2
exit 1
fi
echo "=== Function search ==="
rg -n "def _create_structured_response|_create_structured_response" "$FILE"
echo "=== Context around lines ~360-440 ==="
nl -ba "$FILE" | sed -n '340,460p'
echo "=== Search for json_schema / json_object handling ==="
rg -n "json_schema|json_object|response_format" "$FILE" | head -n 200Repository: plastic-labs/honcho
Length of output: 314
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/llm/backends/openai.py"
if [ ! -f "$FILE" ]; then
echo "Missing file: $FILE" >&2
exit 1
fi
echo "=== Context around lines ~340-460 ==="
sed -n '340,460p' "$FILE" | awk '{printf "%6d:%s\n", NR+339, $0}'
echo "=== Search for json_schema / json_object handling ==="
rg -n "json_schema|json_object|response_format|BadRequestError" "$FILE"Repository: plastic-labs/honcho
Length of output: 8385
🌐 Web query:
openai python BadRequestError attributes body message how to detect json_schema not supported response_format json_schema
💡 Result:
When encountering a BadRequestError while using Structured Outputs in the OpenAI Python client, the error often stems from violations of strict JSON Schema requirements. The OpenAI API expects a specific structure for Structured Outputs, and common issues include missing 'additionalProperties: false' attributes, incorrect schema definitions, or unsupported model versions [1][2][3][4]. To detect and troubleshoot these issues effectively: 1. Identify the Error Cause: The 'body' attribute of the BadRequestError typically contains a dictionary with an 'error' key [1][2]. Inspect this dictionary—specifically the 'message' and 'param' fields—to pinpoint the validation failure [1][3][5]. - If the error message mentions 'additionalProperties', it means your schema is missing "additionalProperties": false [2][5]. - If the error is 'not supported', you are likely using a model version that does not support Structured Outputs (e.g., 'json_schema' is only supported on gpt-4o-2024-08-06 and later, or gpt-4o-mini and later) [3][6][7]. - If a 'type' key is missing, ensure your Pydantic model or manual JSON schema is correctly defined with object types [4]. 2. Handling Pydantic Models: When passing a Pydantic BaseModel, the OpenAI SDK attempts to generate a strict JSON schema automatically [4]. If you receive schema-related errors: - Ensure all fields are explicitly defined. - If using Pydantic, the error might occur because 'additionalProperties' is not being set correctly for nested models or when 'extra="allow"' is enabled [2][5]. - You can manually inspect the generated schema by using the SDK's internal to_strict_json_schema utility if available in your version, or by debugging the model's schema generation [2]. 3. Best Practices for Validation: - Use Structured Outputs: Prioritize using Structured Outputs (via response_format={"type": "json_schema", ...}) over simple JSON mode for schema adherence [6][7]. - Model Compatibility: Verify that the model you are using is explicitly listed as supporting Structured Outputs (e.g., gpt-4o-2024-08-06 or gpt-4o-mini) [3][8][6]. - Schema Enforcement: The API strictly enforces that all keys listed in 'required' must be present, and 'additionalProperties' must be set to 'false' for all object types [1][2][5]. By examining the e.body of the caught openai.BadRequestError, you can programmatically extract the specific error code and message to provide better feedback or trigger retries [1][9].
Citations:
- 1: Schema validation error openai/openai-python#2533
- 2: Improper handling of pydantic
extra="allow"openai/openai-python#2740 - 3: https://stackoverflow.com/questions/79039544/openai-api-error-invalid-parameter-response-format-of-type-json-schema-is
- 4: https://community.openai.com/t/invalid-schema-for-response-format-schema-must-have-a-type-key/1147207
- 5: Pydantic Field metadata causes invalid JSON schema in OpenAI Structured Outputs openai/openai-python#2024
- 6: https://developers.openai.com/api/docs/guides/structured-outputs?api-mode=responses
- 7: https://developers.openai.com/api/docs/guides/structured-outputs
- 8: Getting response_format 'json_schema' not supported error when using assistant with GPT-4o openai/openai-dotnet#276
- 9: https://dev.to/137foundry/python-custom-exception-classes-when-and-how-to-define-your-own-17m3
Restrict json_schema fallback retry and avoid unsafe message mutation (src/llm/backends/openai.py:392-417)
- The
except BadRequestError:fallback retries on all 400s, masking unrelated validation errors; gate the retry to only the cases where the provider rejectsresponse_format/json_schema, otherwise re-raise. - The fallback uses a shallow
messagescopy and mutatesmessages[-1]["content"]viastr(...), which can corrupt OpenAI messagecontentwhen it’s a multimodal/content-parts array.
Suggested guard
- except BadRequestError:
+ except BadRequestError as exc:
+ error_text = str(exc).lower()
+ if "json_schema" not in error_text and "response_format" not in error_text:
+ raise
# Some providers (e.g. DeepSeek) don't support json_schema type.
# Fall back to json_object + inject schema into the system prompt.
logger.info(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/llm/backends/openai.py` around lines 392 - 399, The BadRequestError
handler around self._client.chat.completions.create is too broad and mutates the
original messages; change it to inspect the caught BadRequestError and only
retry when the error explicitly indicates the provider rejected
response_format/json_schema (e.g., check error.code or error.message for
keywords like "json_schema" or "response_format"), otherwise re-raise the
exception; also avoid unsafe shallow copies and in-place edits of
messages—create a deep copy (or build a new messages list) and append a new
system message containing the injected schema string (convert any
multimodal/content-parts to a safe single string for the injection) so you never
mutate messages[-1]["content"] and then call
self._client.chat.completions.create with the new messages and json_object
format.
| json_params = dict(params) | ||
| json_params["response_format"] = {"type": "json_object"} | ||
| messages = list(json_params.get("messages", [])) | ||
| # Inject schema into the last user message or append a system instruction | ||
| if messages and messages[-1]["role"] == "user": | ||
| messages[-1]["content"] = ( | ||
| str(messages[-1]["content"]) + "\n\n" + schema_prompt | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate the function and inspect around the reported lines
rg -n "def _create_structured_response" -S src/llm/backends/openai.py
nl -ba src/llm/backends/openai.py | sed -n '360,460p'
# Check exception definition for ValidationException and imports in this file
rg -n "ValidationException" -S src/llm/backends/openai.py src/exceptions.py src -S
nl -ba src/exceptions.py | sed -n '1,220p'
# Find all references to "_create_structured_response" to understand how messages are constructed/used
rg -n "_create_structured_response" -S src/llm/backends/openai.py src | head -n 50
# Search for places where multimodal message content is represented as a list/part array
rg -n "\"content\"\s*:\s*\[" -S src/llm | head -n 50
rg -n "type\"\\s*:\\s*\"text\"|\"text\"\\s*:\\s*schema_prompt" -S src/llm/backends/openai.py src/llm | head -n 50Repository: plastic-labs/honcho
Length of output: 171
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/llm/backends/openai.py"
# Show function header and surrounding lines with line numbers
awk 'NR>=340 && NR<=460 {printf "%d:%s\n", NR, $0}' "$FILE"
# Show exact portion around the reported snippet (406-413)
awk 'NR>=390 && NR<=430 {printf "%d:%s\n", NR, $0}' "$FILE"
# Check where ValidationException is defined and how it's imported/used
rg -n "class ValidationException|ValidationException" src/exceptions.py src/llm/backends/openai.py src | head -n 200
awk 'NR>=1 && NR<=260 {printf "%d:%s\n", NR, $0}' src/exceptions.py
# Check for multimodal content conventions in the codebase (OpenAI message parts)
rg -n "\"role\"\\s*:\\s*\"user\"|\"type\"\\s*:\\s*\"text\"|\"content\"\\s*:\\s*\\[" src/llm/backends/openai.py src/llm | head -n 200Repository: plastic-labs/honcho
Length of output: 17855
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect how multimodal/user message content is represented
FILE_HA="src/llm/history_adapters.py"
awk 'NR>=1 && NR<=140 {printf "%d:%s\n", NR, $0}' "$FILE_HA"
# Inspect tool_loop message construction around where it sets {"role":"user","content": ...}
FILE_TL="src/llm/tool_loop.py"
rg -n "messages\.copy\(\) if messages else|content:\s*prompt|synthesis_prompt" "$FILE_TL"
awk 'NR>=280 && NR<=660 {printf "%d:%s\n", NR, $0}' "$FILE_TL"
# Inspect how OpenAIBackend builds/uses messages and calls _create_structured_response
FILE_OAI="src/llm/backends/openai.py"
rg -n "_create_structured_response" "$FILE_OAI"
awk 'NR>=1 && NR<=260 {printf "%d:%s\n", NR, $0}' "$FILE_OAI"
awk 'NR>=260 && NR<=520 {printf "%d:%s\n", NR, $0}' "$FILE_OAI"Repository: plastic-labs/honcho
Length of output: 42784
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Search for OpenAI-style multimodal message content arrays (parts)
rg -n "\"content\"\\s*:\\s*\\[" src/llm src | head -n 200
# Search for common multimodal part keys
rg -n "image_url|type\"\\s*:\\s*\"image|input_image|base64" src/llm src | head -n 200
# Search for explicit OpenAI message part shapes
rg -n "\"type\"\\s*:\\s*\"text\"|\"text\"\\s*:\\s*\"" src/llm src | head -n 200
# Look for calls into OpenAIBackend.complete with non-str content usage
rg -n "OpenAIBackend\\(|response_format.*structured|_create_structured_response\\(" src/llm src | head -n 200Repository: plastic-labs/honcho
Length of output: 2506
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "backend\\.complete\\(" -S src/llm src | head -n 200
rg -n "OpenAIBackend\\(" -S src/llm src
# Find honcho_llm_call_inner usage and how it handles messages for retries
rg -n "def honcho_llm_call_inner|honcho_llm_call_inner\\(" -S src/llm | head -n 200
rg -n "messages=.*copy\\(|messages\\.copy\\(" -S src/llm src | head -n 200
# Inspect llm/api.py around retry logic that might reuse messages
awk 'NR>=250 && NR<=520 {printf "%d:%s\n", NR, $0}' src/llm/api.pyRepository: plastic-labs/honcho
Length of output: 11398
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find log_reasoning_trace implementation/usage
rg -n "def log_reasoning_trace|log_reasoning_trace\\(" -S src | head -n 50
rg -n "messages=" -S src/llm | head -n 50
# Inspect request_builder / where messages come from for OpenAI calls
rg -n "request_builder|build_messages|messages:" src/llm/request_builder.py src/llm/api.py src/llm/executor.py src/llm/tool_loop.py | head -n 200
# Search for OpenAI multimodal content arrays specifically with role/content list pattern near OpenAIHistoryAdapter usage
rg -n "OpenAIHistoryAdapter|role\"\\s*:\\s*\"user\"\\s*,\\s*\"content\"\\s*:\\s*\\[" -S src/llm | head -n 200Repository: plastic-labs/honcho
Length of output: 2776
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect message building for OpenAI calls
sed -n '1,220p' src/llm/request_builder.py
# Search for OpenAI multimodal keys/payloads
rg -n "image_url|input_image|base64|data:image" -S src/llm src | head -n 200
# Search for OpenAI-style message content arrays (role/content where content is a list)
rg -n "\"role\"\\s*:\\s*\"(user|assistant|system)\"[\\s\\S]{0,120}\"content\"\\s*:\\s*\\[" -S src/llm src | head -n 200
# Search for generic "content": [ across OpenAI-related adapters
rg -n "OpenAIHistoryAdapter|format_assistant_tool_message" -S src/llm/history_adapters.pyRepository: plastic-labs/honcho
Length of output: 5023
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find where the history adapter is selected/used
rg -n "HistoryAdapter|history_adapters|OpenAIHistoryAdapter|history_adapter" -S src/llm | head -n 200
# Inspect the relevant parts in llm/api.py and registry
sed -n '1,260p' src/llm/registry.py
sed -n '1,520p' src/llm/api.py
# Inspect conversation assembly in tool_loop for where it calls format_assistant_tool_message / append_tool_results
rg -n "format_assistant_tool_message|append_tool_results|append_tool_results\\(" -S src/llm/tool_loop.py
sed -n '140,260p' src/llm/tool_loop.py
sed -n '260,360p' src/llm/tool_loop.py
sed -n '360,520p' src/llm/tool_loop.pyRepository: plastic-labs/honcho
Length of output: 41922
🏁 Script executed:
#!/bin/bash
set -euo pipefail
awk 'NR>=1 && NR<=40 {printf "%d:%s\n", NR, $0}' src/llm/backends/openai.pyRepository: plastic-labs/honcho
Length of output: 1449
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "honcho_llm_call\\([\\s\\S]{0,400}tools=" src | head -n 200
rg -n "honcho_llm_call\\([\\s\\S]{0,400}response_model=" src | head -n 200
# Narrow: response_model together with tools on same call
rg -n "honcho_llm_call\\([\\s\\S]{0,600}tools=[\\s\\S]{0,200}response_model=" src | head -n 200
rg -n "honcho_llm_call\\([\\s\\S]{0,600}response_model=[\\s\\S]{0,200}tools=" src | head -n 200
# Search for execute_tool_loop called with response_model not None
rg -n "execute_tool_loop\\([\\s\\S]{0,400}response_model=" src/llm | head -n 200Repository: plastic-labs/honcho
Length of output: 45
Avoid mutating caller-owned OpenAI chat messages (and unsafe content coercion) in structured-output fallback.
messages = list(...) only copies the outer list; assigning messages[-1]["content"] = ... mutates the original nested message dict from the caller. Also str(messages[-1]["content"]) can corrupt non-string OpenAI message content (e.g., content-part arrays). Deep-copy the messages and handle content by type; fail fast with a typed ValidationException for unsupported shapes.
Safer fallback rewrite
+import copy- json_params = dict(params)
+ json_params = dict(params)
json_params["response_format"] = {"type": "json_object"}
- messages = list(json_params.get("messages", []))
+ messages = copy.deepcopy(
+ cast(list[dict[str, Any]], json_params.get("messages", []))
+ )
# Inject schema into the last user message or append a system instruction
if messages and messages[-1]["role"] == "user":
- messages[-1]["content"] = (
- str(messages[-1]["content"]) + "\n\n" + schema_prompt
- )
+ content = messages[-1].get("content")
+ if isinstance(content, str):
+ messages[-1]["content"] = f"{content}\n\n{schema_prompt}"
+ elif isinstance(content, list):
+ messages[-1]["content"] = [
+ *content,
+ {"type": "text", "text": schema_prompt},
+ ]
+ else:
+ raise ValidationException(
+ "Unsupported user message content type for schema fallback"
+ )
else:
messages.append({"role": "user", "content": schema_prompt})🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/llm/backends/openai.py` around lines 406 - 413, The code mutates
caller-owned nested message dicts and unsafely coerces message content to
string; fix by making a deep copy of the messages (e.g.,
copy.deepcopy(json_params.get("messages", []))) and assign that copy back into
json_params["messages"], then inject the schema into the last message only after
validating its shape: if last_msg["role"] == "user" and
isinstance(last_msg["content"], str) append "\n\n" + schema_prompt, if it's a
list or other supported structured type handle appropriately or otherwise raise
a typed ValidationException to fail fast for unsupported shapes; ensure no
in-place modifications touch the original caller objects and preserve other
message types unchanged.
Problem
Honcho uses
openai.chat.completions.parse()for structured output, which sendsresponse_formatas a Pydantic model. The OpenAI SDK converts this to{"type": "json_schema", "json_schema": {...}}.DeepSeek, Ollama, vLLM, Together, Qwen, and most OpenAI-compatible providers do not support
response_format: {"type": "json_schema"}. Only native OpenAI models support this.Result: all deriver tasks requiring structured output fail with
400 BadRequestError: "This response_format type is unavailable now". The deriver silently produces zero observations on these providers.Closes #716.
Fix
When
_create_structured_response()gets aBadRequestErrorfromjson_schema, it falls back to:response_format: {"type": "json_object"}This is transparent — no config changes needed, no breaking changes. Providers that support
json_schema(native OpenAI) keep working as before.Testing
json_schema not supported by provider, falling back to json_object for PromptRepresentationReferences
Summary by CodeRabbit
New Features
Bug Fixes