Skip to content

fix(vision): Z.AI (智谱 GLM) vision model compatibility#19346

Closed
agilejava wants to merge 1 commit into
NousResearch:mainfrom
agilejava:fix/zai-vision-compat
Closed

fix(vision): Z.AI (智谱 GLM) vision model compatibility#19346
agilejava wants to merge 1 commit into
NousResearch:mainfrom
agilejava:fix/zai-vision-compat

Conversation

@agilejava

Copy link
Copy Markdown

Problem

Z.AI (智谱 GLM) vision models (glm-4v-flash, glm-4v-plus, glm-4v, etc.) fail when used as the auxiliary.vision provider due to two compatibility issues:

Issue 1: Error 1210 — max_tokens rejected on multimodal calls

Z.AI rejects the max_tokens parameter for vision model requests with error code 1210 ("API 调用参数有误"). The error string does not contain "max_tokens", so the existing unsupported-parameter retry logic in call_llm() never fires. This results in a hard failure with no recovery.

Issue 2: Wrong endpoint inheritance (silent failure)

When the main runtime provider uses Z.AI's Anthropic-compatible endpoint (open.bigmodel.cn/api/anthropic), the vision client inherits this endpoint via resolve_vision_provider_client(). Z.AI's Anthropic wire cannot properly handle image content — models silently fail with responses like "I can't see the image" (received base64 data but couldn't process it through the Anthropic translation layer). This is a silent failure with no HTTP error, making it extremely hard to diagnose.

Changes

  1. resolve_vision_provider_client() — Force Z.AI vision to use the OpenAI-compatible endpoint (open.bigmodel.cn/api/paas/v4) instead of inheriting the Anthropic wire from the main runtime provider.

  2. _build_call_kwargs() — Skip max_tokens for Z.AI vision models (detected by 4v/5v/-v suffix in model name).

  3. _AnthropicCompletionsAdapter — Support _skip_zai_max_tokens flag for cases where the adapter is still used.

  4. _to_openai_base_url() — Rewrite Z.AI Anthropic URLs to OpenAI-compatible path (handles edge case URL rewriting).

  5. call_llm() retry — Detect Z.AI-specific error 1210 and strip max_tokens before retrying.

Testing

  • Verified with glm-4v-flash on Z.AI free tier — vision analysis works correctly
  • Main runtime (GLM-4/GLM-5 text models) unaffected — changes only activate for Z.AI vision models
  • vision_analyze tool returns correct descriptions of test images

Config used for testing

auxiliary:
  vision:
    provider: zai
    model: glm-4v-flash

Main provider: zai with base_url: https://open.bigmodel.cn/api/anthropic

…ax_tokens handling

Z.AI (智谱 GLM) vision models (glm-4v-flash, glm-4v-plus, etc.) have two
compatibility issues when used through the Anthropic-compatible endpoint:

1. **Error 1210 — max_tokens rejected on multimodal calls**: Z.AI rejects
   the max_tokens parameter for vision model requests with error code 1210
   ("API 调用参数有误"). The error string does not contain "max_tokens",
   so the existing unsupported-parameter retry logic never fires.

2. **Wrong endpoint inheritance**: When the main runtime provider uses Z.AI's
   Anthropic-compatible endpoint (open.bigmodel.cn/api/anthropic), the vision
   client inherits this endpoint. But Z.AI's Anthropic wire cannot properly
   handle image content — models silently fail ("I can't see the image") or
   reject max_tokens.

Changes:
- resolve_vision_provider_client(): force Z.AI vision to use OpenAI-compatible
  endpoint (open.bigmodel.cn/api/paas/v4) instead of inheriting Anthropic wire
- _build_call_kwargs(): skip max_tokens for Z.AI vision models (4v/5v/-v suffix)
- _AnthropicCompletionsAdapter: support _skip_zai_max_tokens flag
- _to_openai_base_url(): rewrite Z.AI Anthropic URLs to OpenAI-compatible path
- call_llm() retry: detect Z.AI error 1210 and strip max_tokens before retry
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder provider/zai ZAI provider tool/vision Vision analysis and image generation labels May 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists provider/zai ZAI provider tool/vision Vision analysis and image generation type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants