Skip to content

feat(openai): extra_body passthrough + embeddings dimensions + json_object default#670

Closed
ZVin-Chen wants to merge 2 commits into
plastic-labs:mainfrom
ZVin-Chen:feature/openai-extra-body-passthrough
Closed

feat(openai): extra_body passthrough + embeddings dimensions + json_object default#670
ZVin-Chen wants to merge 2 commits into
plastic-labs:mainfrom
ZVin-Chen:feature/openai-extra-body-passthrough

Conversation

@ZVin-Chen

@ZVin-Chen ZVin-Chen commented May 11, 2026

Copy link
Copy Markdown

Summary

Three small, additive improvements to the OpenAI backend (src/llm/backends/openai.py + src/embedding_client.py) that broaden compatibility with OpenAI-compatible providers (DeepSeek, Alibaba Bailian, vLLM-hosted models, etc.) without changing behaviour on OpenAI itself.

1. provider_params → OpenAI SDK extra_body passthrough

ConfiguredModelSettings.overrides.provider_params is documented as a free-form passthrough for backend-specific knobs, but the OpenAI backend's _build_params was cherry-picking a small whitelist (top_p, frequency_penalty, presence_penalty, seed, verbosity) and silently dropping the rest. Any unrecognised key now flows through to the OpenAI SDK's extra_body parameter, which is the canonical way to pass provider-specific fields.

This unblocks e.g.:

[dialectic.levels.minimal.model_config.overrides]
provider_params = { thinking = { type = "disabled" } }

…on DeepSeek's v4 family, which defaults thinking.type=enabled (returns reasoning_content + rejects tool_choice=required).

2. Forward dimensions on embedding calls

embedding_client.py already validates that returned embedding length matches vector_dimensions, but it doesn't pass dimensions= on the wire. With Alibaba Bailian's text-embedding-v4, default output is 1024-d but the deployment's pgvector schema typically uses 1536. Calling with dimensions=self.vector_dimensions lets the operator-configured EMBEDDING_VECTOR_DIMENSIONS actually take effect — OpenAI's own text-embedding-3-* accepts the same parameter, so this is portable.

Applied at all three OpenAI embedding call sites (single embed, batch embed, batch embed with text ids).

3. Default structured-output to json_object + schema-in-prompt

When a Pydantic class is passed as response_format, the OpenAI backend was first trying chat.completions.parse() (Structured Outputs, json_schema mode), then falling back to a json_schema payload via _create_structured_response. Both paths use json_schema, which DeepSeek-v4 family and several Bailian / vLLM models reject (This response_format type is unavailable now / Failed to deserialize the JSON body).

Default now goes straight to {"type": "json_object"} (universally supported) with the target Pydantic schema injected as a system-message instruction. The existing repair_response_model_json machinery already handles minor JSON shape drift, so the looser enforcement is acceptable in exchange for portability. Streaming path (stream) gets the same treatment.

Trade-off: on OpenAI itself, this gives up strict-mode json_schema in favour of json_object + prompted schema. In practice the parsed output is virtually identical because gpt-* models comply with schema descriptions reliably; the repair logic catches any drift.

Why these three together

They surfaced together while wiring honcho into a stack that uses DeepSeek as the dialectic / deriver LLM and Alibaba Bailian (text-embedding-v4) for embeddings. Each change is independent and small but they share the same motivation: make the OpenAI backend gracefully cover the long tail of OpenAI-compatible providers.

End-to-end verification

Tested against:

  • LLM: DeepSeek v4-flash via https://api.deepseek.com/v1 with thinking={type:disabled} injected through provider_params
  • Embeddings: Alibaba Bailian text-embedding-v4 via https://dashscope.aliyuncs.com/compatible-mode/v1, dimensions=1536

Full dialectic chat across all 5 reasoning levels (minimal / low / medium / high / max) returns synthesized answers; deriver creates observations end-to-end; dreamer specialists (deduction + induction) complete a full update_peer_card cycle.

Notes

  • No new dependencies.
  • No behaviour change on OpenAI's own endpoints other than (a) embeddings now explicitly request dimensions (matches OpenAI's documented param) and (b) structured outputs route through json_object instead of json_schema (still produces valid Pydantic-parseable JSON via repair logic).
  • The submodule is wired into a downstream consumer (ZVin-Chen/emotional_agent#17) that exercises the stack end-to-end.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Fixed embedding request dimension consistency across all operation types to ensure proper vector generation.
    • Improved structured JSON output handling with better validation, error detection, and automatic correction mechanisms.
  • Improvements

    • Enhanced provider parameter compatibility by expanding support for additional configuration options.

Review Change Stack

ZVin-Chen and others added 2 commits May 11, 2026 17:21
Provider-specific OpenAI-compatible knobs (DeepSeek's `thinking`,
vLLM/SGLang options, etc.) currently can't reach the wire — `_build_params`
hard-codes a small whitelist (top_p / freq / presence / seed / verbosity)
and silently drops everything else from ModelConfig.provider_params.

Forward any unrecognised provider_params key through to the OpenAI SDK's
`extra_body` parameter so operators can inject provider-specific fields
without backend changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t to json_object

Two related portability improvements for OpenAI-compatible providers
that don't implement OpenAI's newest API surface:

1. embedding_client.py: forward `vector_dimensions` to the embeddings
   endpoint as the `dimensions` parameter on every OpenAI call. Without
   it, providers like Alibaba Bailian's text-embedding-v4 default to a
   different output size than the operator-configured
   EMBEDDING_VECTOR_DIMENSIONS, breaking the pgvector schema match.

2. backends/openai.py: switch structured-output (`response_format=<Pydantic
   class>`) to `{"type": "json_object"}` and inject the target schema
   as a system-message instruction, instead of OpenAI Structured Outputs'
   `json_schema`. Reasons:
     * DeepSeek's v4 family rejects `json_schema` outright.
     * Several Bailian / vLLM-hosted models only implement OpenAI's older
       JSON mode (`json_object`).
     * OpenAI itself accepts the new shape gracefully.
   Schema enforcement is a bit looser; `repair_response_model_json`
   already handles minor drift downstream, so the trade-off favours
   portability. Applies to both blocking complete() and streaming
   stream() paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 65f4d609-0bd2-41cc-a4f0-4997ec204dd8

📥 Commits

Reviewing files that changed from the base of the PR and between a4ae372 and f927a30.

📒 Files selected for processing (2)
  • src/embedding_client.py
  • src/llm/backends/openai.py

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.


Walkthrough

This PR introduces two independent improvements to OpenAI integration: embedding API calls now explicitly pass the configured dimensionality parameter, and structured output handling is refactored from SDK-specific parsing to a portable schema-injection mechanism that works across OpenAI-compatible providers.

Changes

Embedding Dimensions Parameter

Layer / File(s) Summary
Embedding API Call Sites
src/embedding_client.py
Three OpenAI embeddings.create call sites now include dimensions=self.vector_dimensions parameter: single-query embedding, simple batch embedding, and batch processing.

OpenAI Structured Output Portability

Layer / File(s) Summary
Backend Imports
src/llm/backends/openai.py
Imports reworked to remove parse/validation utilities and retain only Pydantic BaseModel and local repair/exception utilities needed for the new portability mechanism.
Structured Response Creation
src/llm/backends/openai.py
_create_structured_response() now forces json_object format, generates Pydantic model JSON schema, and injects a schema instruction into the system message.
Complete Method Structured Path
src/llm/backends/openai.py
complete() method with response_format as BaseModel now calls _create_structured_response(), repairs JSON via _parse_or_repair_structured_content(), and normalizes the result, replacing chat.completions.parse and validate_structured_output.
Stream Method Structured Path
src/llm/backends/openai.py
stream() method now forces json_object and injects JSON-schema instruction into the first system message (creating or updating as needed).
Extra Parameters Pass-through
src/llm/backends/openai.py
_build_params() recognizes common OpenAI top-level fields and routes unrecognized extra_params keys to params["extra_body"] for provider-specific option pass-through.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Poem

🐰 A rabbit hops through vectors bright,
With dimensions tuned just right!
And JSON schemas, now portably borne,
Flow through the system, carefully sworn,
To compatible providers with nary a frown! 🎯

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ZVin-Chen

Copy link
Copy Markdown
Author

Sorry, opened against the wrong base. Re-targeting to my fork's main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant