feat(openai): extra_body passthrough + embeddings dimensions + json_object default#670
feat(openai): extra_body passthrough + embeddings dimensions + json_object default#670ZVin-Chen wants to merge 2 commits into
Conversation
Provider-specific OpenAI-compatible knobs (DeepSeek's `thinking`, vLLM/SGLang options, etc.) currently can't reach the wire — `_build_params` hard-codes a small whitelist (top_p / freq / presence / seed / verbosity) and silently drops everything else from ModelConfig.provider_params. Forward any unrecognised provider_params key through to the OpenAI SDK's `extra_body` parameter so operators can inject provider-specific fields without backend changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t to json_object
Two related portability improvements for OpenAI-compatible providers
that don't implement OpenAI's newest API surface:
1. embedding_client.py: forward `vector_dimensions` to the embeddings
endpoint as the `dimensions` parameter on every OpenAI call. Without
it, providers like Alibaba Bailian's text-embedding-v4 default to a
different output size than the operator-configured
EMBEDDING_VECTOR_DIMENSIONS, breaking the pgvector schema match.
2. backends/openai.py: switch structured-output (`response_format=<Pydantic
class>`) to `{"type": "json_object"}` and inject the target schema
as a system-message instruction, instead of OpenAI Structured Outputs'
`json_schema`. Reasons:
* DeepSeek's v4 family rejects `json_schema` outright.
* Several Bailian / vLLM-hosted models only implement OpenAI's older
JSON mode (`json_object`).
* OpenAI itself accepts the new shape gracefully.
Schema enforcement is a bit looser; `repair_response_model_json`
already handles minor drift downstream, so the trade-off favours
portability. Applies to both blocking complete() and streaming
stream() paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
Disabled knowledge base sources:
WalkthroughThis PR introduces two independent improvements to OpenAI integration: embedding API calls now explicitly pass the configured dimensionality parameter, and structured output handling is refactored from SDK-specific parsing to a portable schema-injection mechanism that works across OpenAI-compatible providers. ChangesEmbedding Dimensions Parameter
OpenAI Structured Output Portability
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Sorry, opened against the wrong base. Re-targeting to my fork's main. |
Summary
Three small, additive improvements to the OpenAI backend (
src/llm/backends/openai.py+src/embedding_client.py) that broaden compatibility with OpenAI-compatible providers (DeepSeek, Alibaba Bailian, vLLM-hosted models, etc.) without changing behaviour on OpenAI itself.1.
provider_params→ OpenAI SDKextra_bodypassthroughConfiguredModelSettings.overrides.provider_paramsis documented as a free-form passthrough for backend-specific knobs, but the OpenAI backend's_build_paramswas cherry-picking a small whitelist (top_p,frequency_penalty,presence_penalty,seed,verbosity) and silently dropping the rest. Any unrecognised key now flows through to the OpenAI SDK'sextra_bodyparameter, which is the canonical way to pass provider-specific fields.This unblocks e.g.:
…on DeepSeek's v4 family, which defaults
thinking.type=enabled(returnsreasoning_content+ rejectstool_choice=required).2. Forward
dimensionson embedding callsembedding_client.pyalready validates that returned embedding length matchesvector_dimensions, but it doesn't passdimensions=on the wire. With Alibaba Bailian'stext-embedding-v4, default output is 1024-d but the deployment's pgvector schema typically uses 1536. Calling withdimensions=self.vector_dimensionslets the operator-configuredEMBEDDING_VECTOR_DIMENSIONSactually take effect — OpenAI's owntext-embedding-3-*accepts the same parameter, so this is portable.Applied at all three OpenAI embedding call sites (single embed, batch embed, batch embed with text ids).
3. Default structured-output to
json_object+ schema-in-promptWhen a Pydantic class is passed as
response_format, the OpenAI backend was first tryingchat.completions.parse()(Structured Outputs,json_schemamode), then falling back to ajson_schemapayload via_create_structured_response. Both paths usejson_schema, which DeepSeek-v4 family and several Bailian / vLLM models reject (This response_format type is unavailable now/Failed to deserialize the JSON body).Default now goes straight to
{"type": "json_object"}(universally supported) with the target Pydantic schema injected as a system-message instruction. The existingrepair_response_model_jsonmachinery already handles minor JSON shape drift, so the looser enforcement is acceptable in exchange for portability. Streaming path (stream) gets the same treatment.Trade-off: on OpenAI itself, this gives up strict-mode
json_schemain favour ofjson_object+ prompted schema. In practice the parsed output is virtually identical becausegpt-*models comply with schema descriptions reliably; the repair logic catches any drift.Why these three together
They surfaced together while wiring honcho into a stack that uses DeepSeek as the dialectic / deriver LLM and Alibaba Bailian (
text-embedding-v4) for embeddings. Each change is independent and small but they share the same motivation: make the OpenAI backend gracefully cover the long tail of OpenAI-compatible providers.End-to-end verification
Tested against:
https://api.deepseek.com/v1withthinking={type:disabled}injected through provider_paramstext-embedding-v4viahttps://dashscope.aliyuncs.com/compatible-mode/v1,dimensions=1536Full dialectic chat across all 5 reasoning levels (
minimal/low/medium/high/max) returns synthesized answers; deriver creates observations end-to-end; dreamer specialists (deduction + induction) complete a fullupdate_peer_cardcycle.Notes
dimensions(matches OpenAI's documented param) and (b) structured outputs route throughjson_objectinstead ofjson_schema(still produces valid Pydantic-parseable JSON via repair logic).🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Improvements