-
Notifications
You must be signed in to change notification settings - Fork 614
feat: add GLM-4.5V model support #739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds GLM-4.5V (multimodal) model entries to default settings, Zhipu provider-specific settings, and the Zhipu provider’s fetchOpenAIModels list. No control flow, signatures, or error handling changes. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
src/main/presenter/configPresenter/modelDefaultSettings.ts(1 hunks)src/main/presenter/configPresenter/providerModelSettings.ts(1 hunks)src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts(1 hunks)
🧰 Additional context used
📓 Path-based instructions (9)
**/*.{ts,tsx,js,jsx,vue}
📄 CodeRabbit Inference Engine (CLAUDE.md)
Use English for logs and comments
Files:
src/main/presenter/configPresenter/modelDefaultSettings.tssrc/main/presenter/llmProviderPresenter/providers/zhipuProvider.tssrc/main/presenter/configPresenter/providerModelSettings.ts
**/*.{ts,tsx}
📄 CodeRabbit Inference Engine (CLAUDE.md)
Strict type checking enabled for TypeScript
**/*.{ts,tsx}: 始终使用 try-catch 处理可能的错误
提供有意义的错误信息
记录详细的错误日志
优雅降级处理
日志应包含时间戳、日志级别、错误代码、错误描述、堆栈跟踪(如适用)、相关上下文信息
日志级别应包括 ERROR、WARN、INFO、DEBUG
不要吞掉错误
提供用户友好的错误信息
实现错误重试机制
避免记录敏感信息
使用结构化日志
设置适当的日志级别
Files:
src/main/presenter/configPresenter/modelDefaultSettings.tssrc/main/presenter/llmProviderPresenter/providers/zhipuProvider.tssrc/main/presenter/configPresenter/providerModelSettings.ts
src/main/**/*.ts
📄 CodeRabbit Inference Engine (CLAUDE.md)
Main to Renderer: Use EventBus to broadcast events via mainWindow.webContents.send()
Use Electron's built-in APIs for file system and native dialogs
Files:
src/main/presenter/configPresenter/modelDefaultSettings.tssrc/main/presenter/llmProviderPresenter/providers/zhipuProvider.tssrc/main/presenter/configPresenter/providerModelSettings.ts
src/main/presenter/**/*.ts
📄 CodeRabbit Inference Engine (CLAUDE.md)
One presenter per functional domain
Files:
src/main/presenter/configPresenter/modelDefaultSettings.tssrc/main/presenter/llmProviderPresenter/providers/zhipuProvider.tssrc/main/presenter/configPresenter/providerModelSettings.ts
src/main/presenter/configPresenter/**/*.ts
📄 CodeRabbit Inference Engine (CLAUDE.md)
Centralize configuration in configPresenter/
Files:
src/main/presenter/configPresenter/modelDefaultSettings.tssrc/main/presenter/configPresenter/providerModelSettings.ts
**/*.{js,jsx,ts,tsx}
📄 CodeRabbit Inference Engine (.cursor/rules/development-setup.mdc)
**/*.{js,jsx,ts,tsx}: 使用 OxLint 进行代码检查
Log和注释使用英文书写
Files:
src/main/presenter/configPresenter/modelDefaultSettings.tssrc/main/presenter/llmProviderPresenter/providers/zhipuProvider.tssrc/main/presenter/configPresenter/providerModelSettings.ts
src/{main,renderer}/**/*.ts
📄 CodeRabbit Inference Engine (.cursor/rules/electron-best-practices.mdc)
src/{main,renderer}/**/*.ts: Use context isolation for improved security
Implement proper inter-process communication (IPC) patterns
Optimize application startup time with lazy loading
Implement proper error handling and logging for debugging
Files:
src/main/presenter/configPresenter/modelDefaultSettings.tssrc/main/presenter/llmProviderPresenter/providers/zhipuProvider.tssrc/main/presenter/configPresenter/providerModelSettings.ts
src/main/**/*.{ts,js,tsx,jsx}
📄 CodeRabbit Inference Engine (.cursor/rules/project-structure.mdc)
主进程代码放在
src/main
Files:
src/main/presenter/configPresenter/modelDefaultSettings.tssrc/main/presenter/llmProviderPresenter/providers/zhipuProvider.tssrc/main/presenter/configPresenter/providerModelSettings.ts
src/main/presenter/llmProviderPresenter/providers/*.ts
📄 CodeRabbit Inference Engine (CLAUDE.md)
src/main/presenter/llmProviderPresenter/providers/*.ts: Create provider file in src/main/presenter/llmProviderPresenter/providers/ when adding a new LLM provider
Implement coreStream method following standardized event interface in LLM provider files
src/main/presenter/llmProviderPresenter/providers/*.ts: Each file insrc/main/presenter/llmProviderPresenter/providers/*.tsshould handle interaction with a specific LLM API, including request/response formatting, tool definition conversion, native/non-native tool call management, and standardizing output streams to a common event format.
Provider implementations must use acoreStreammethod that yields standardized stream events to decouple the main loop from provider-specific details.
ThecoreStreammethod in each Provider must perform a single streaming API request per conversation round and must not contain multi-round tool call loop logic.
Provider files should implement helper methods such asformatMessages,convertToProviderTools,parseFunctionCalls, andprepareFunctionCallPromptas needed for provider-specific logic.
All provider implementations must parse provider-specific data chunks and yield standardized events for text, reasoning, tool calls, usage, errors, stop reasons, and image data.
When a provider does not support native function calling, it must prepare messages using prompt wrapping (e.g.,prepareFunctionCallPrompt) before making the API call.
When a provider supports native function calling, MCP tools must be converted to the provider's format (e.g., usingconvertToProviderTools) and included in the API request.
Provider implementations should aggregate and yield usage events as part of the standardized stream.
Provider implementations should yield image data events in the standardized format when applicable.
Provider implementations should yield reasoning events in the standardized format when applicable.
Provider implementations should yield tool call events (`tool_call_star...
Files:
src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts
🧠 Learnings (1)
📚 Learning: 2025-07-21T01:45:33.790Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T01:45:33.790Z
Learning: Applies to src/main/presenter/configPresenter/providers.ts : Add provider configuration in configPresenter/providers.ts when adding a new LLM provider
Applied to files:
src/main/presenter/llmProviderPresenter/providers/zhipuProvider.tssrc/main/presenter/configPresenter/providerModelSettings.ts
🔇 Additional comments (4)
src/main/presenter/configPresenter/modelDefaultSettings.ts (1)
1648-1658: GLM-4.5V default settings added — consistent and well-scopedThe new entry aligns with the Zhipu provider settings and the provider’s model list (contextLength 65536, maxTokens 8192, vision true). No control flow or typing issues here.
src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts (1)
142-150: Added GLM-4.5V to Zhipu catalog — matches defaults/provider configThe catalog entry is consistent with default and provider settings (contextLength 65536, maxTokens 8192). Placement under the multimodal section is correct.
src/main/presenter/configPresenter/providerModelSettings.ts (2)
310-320: Zhipu provider model settings for GLM-4.5V — consistent with defaultsThe provider-specific entry mirrors defaults (temperature 0.7, maxTokens 8192, contextLength 65536, vision true, functionCall true, reasoning true). Looks good.
310-320: Confirmed: keep functionCall: true for GLM-4.5VGLM‑4.5V supports native OpenAI‑compatible function/tool calling (docs/examples show the OpenAI-style "tools" parameter and tool_call responses). No change required.
- File: src/main/presenter/configPresenter/providerModelSettings.ts — glm-4.5v entry (lines ~310–320): keep
functionCall: true.Caveat: ensure your chosen deployment/SDK honors the OpenAI-style tools schema; if it doesn't, use your prompt-wrapping path instead.
| { | ||
| id: 'glm-4.5v', | ||
| name: 'GLM-4.5V', | ||
| temperature: 0.7, | ||
| maxTokens: 8192, | ||
| contextLength: 65536, | ||
| match: ['glm-4.5v'], | ||
| vision: true, | ||
| functionCall: true, | ||
| reasoning: true | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Confirm GLM-4.5V capabilities (function calling, context window) and naming
Please double-check with Zhipu’s latest docs that:
- Official model ID is exactly
glm-4.5v - It supports native function/tool calling
- Context window should be 65,536 (vs 128k like text-only 4.5 variants)
If any of these differ, update this block and provider-specific settings accordingly.
Optionally, verify codebase references to ensure no other places need updates:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -nC2 -i '\bglm[- ]?4\.5v\b|GLM-4\.5V'Length of output: 2336
🌐 Web query:
What is the official model identifier for Zhipu's GLM-4.5V, its context window size, and does it support native function (tool) calling compatible with OpenAI "tools"?
💡 Result:
The official model identifier for Zhipu's GLM-4.5V is glm-4.5v[7]. Its context window size is up to 64K tokens for multimodal (i.e., both text and visual) inputs[2][6][9]. Current documentation and developer guides do not confirm official support for direct native function/tool calling in the way that OpenAI's "tools" (previously called "functions") works, nor do they state explicit compatibility with the OpenAI "tools" API schema.
Details and Supporting Information
-
Official model identifier:
- As shown in Zhipu's Python SDK API example, the invocation looks like this:
This indicates the canonical model name for API access is glm-4.5v[7].
response = client.chat.completions.create(model="glm-4.5v", ...)
- As shown in Zhipu's Python SDK API example, the invocation looks like this:
-
Context window size:
- Zhipu GLM-4.5V supports a 64K token multimodal context window (i.e., it can process up to 64,000 tokens of combined text, image, or video per request)[2][6][9]. This is confirmed in technical summaries and product announcements.
-
Function (Tool) Calling Support:
- There is no explicit mention in the available official documentation, technical reports, or SDK examples of a native implementation for function calling (i.e., OpenAI "tools" schema compatibility for structured API endpoints as actions, also called "functions")[7][3].
- While GLM-4.5V is designed for strong multimodal and agent scenarios and can be integrated into GUI agent prompts and desktop tools, the APIs provided show only classic chat/inference patterns (prompt completion and image/video inputs)[2][3][7]. No evidence is given of a dedicated function calling interface or adherence to the OpenAI "tools" JSON schema for function calling.
- Available developer demos show classic multimodal inference, visual grounding, and GUI agent tasks using prompt engineering, not a schema-augmented or structured function call format as standardized by OpenAI's "tools"[3].
Summary Table
| Aspect | GLM-4.5V Specification |
|---|---|
| Official Model Name | glm-4.5v |
| Context Window | 64,000 tokens (64K), multimodal |
| OpenAI "tools" Support | Not officially documented as supported |
If your use case requires structured function/tool calling as defined by OpenAI's "tools" schema (functions with input/output schemas handled natively by the model API), there is no direct evidence of support for this in GLM-4.5V as of August 2025. Integration would likely require custom prompt handling and downstream orchestration code on your part.
Citations:
- 1: https://news.aibase.com/news/20407
- 2: https://news.aibase.com/news/20404
- 3: https://github.com/zai-org/GLM-V
- 4: https://news.smol.ai/issues/25-08-12-not-much
- 5: https://eu.36kr.com/en/p/3418614402207366
- 6: https://www.youtube.com/watch?v=vTuXQkQWgqA
- 7: https://www.kdjingpai.com/en/glm-45v/
- 8: https://news.smol.ai/issues/25-08-11-ioi-gold
- 9: https://hyper.ai/en/headlines/532a7a0f1b1768a229a19fa2b061a0b7
- 10: https://www.barchart.com/story/news/34130142/zai-launches-glm45v-opensource-visionlanguage-model-sets-new-bar-for-multimodal-reasoning
Action: Set GLM-4.5V functionCall to false (ID & context verified)
Short summary: Verified the official model ID is glm-4.5v and the multimodal context window is 64K tokens (65536). There is no official documentation showing native function/tool-calling support compatible with OpenAI "tools", so the code should not mark this model as supporting function calls.
Files/locations to update:
- src/main/presenter/configPresenter/modelDefaultSettings.ts — entry at ~lines 1649–1656: change
functionCall: true - src/main/presenter/configPresenter/providerModelSettings.ts — entry at ~lines 311–318: change
functionCall: true - src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts — model entry around line ~143: verify and update if it marks function calling
Suggested change (replace the single line in each block):
- from:
functionCall: true, - to:
functionCall: false, // no official native function/tool-calling documented for glm-4.5v
Keep vision: true and contextLength: 65536 as-is.
🤖 Prompt for AI Agents
In src/main/presenter/configPresenter/modelDefaultSettings.ts around lines
1648–1658, change the glm-4.5v model entry to set functionCall to false (it
currently is true) because there's no official native function/tool-calling
support; likewise update
src/main/presenter/configPresenter/providerModelSettings.ts around lines 311–318
to set functionCall: false for the glm-4.5v entry, and inspect
src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts near line
~143 to verify any glm-4.5v model entry does not mark functionCall true and
change it to false if present; keep vision: true and contextLength: 65536
unchanged and add a short inline comment like "// no official native
function/tool-calling documented for glm-4.5v" where you flip the flag.
| { | ||
| id: 'glm-4.5v', | ||
| name: 'GLM-4.5V', | ||
| group: 'zhipu', | ||
| providerId: this.provider.id, | ||
| isCustom: false, | ||
| contextLength: 65536, | ||
| maxTokens: 8192 | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Sanity check: vision handling path works for GLM-4.5V
Since this model is multimodal, ensure the OpenAI-compatible code path properly formats image inputs for Zhipu (as it already does for glm-4v*). No code changes required here if the existing vision path is reused; just verify end-to-end.
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check for existing vision handling in Zhipu/OpenAI-compatible paths
rg -n -C3 -i 'glm-4v|image|vision' src/main/presenter/llmProviderPresenter/providersLength of output: 121472
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts"
echo "=== File: $file ==="
nl -ba "$file" | sed -n '1,420p'Length of output: 449
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts"
if [[ ! -f "$file" ]]; then
echo "FILE_NOT_FOUND: $file"
exit 0
fi
echo "=== Showing $file (first 400 lines) ==="
sed -n '1,400p' "$file"
echo "=== Searching for inheritance and vision/image handling within the file ==="
rg -n -C2 "class |extends|handleImg|image_url|image:|image_url|image" "$file" || trueLength of output: 6476
Action: mark GLM-4.5V as vision-enabled in zhipuProvider
OpenAICompatibleProvider already implements the image/vision handling path and ZhipuProvider inherits it, but the GLM-4.5V model entry in src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts is not flagged as vision:true — the upper layer uses that flag to decide whether to insert image_url. Add the vision flag so the multimodal path is used end-to-end (and do the same for other glm-4v* variants if they are multimodal).
Files to update:
- src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts — add vision: true to the GLM-4.5V model object (and optionally glm-4v*, glm-4v-plus-0111, glm-4v-flash if they support vision).
Suggested diff:
@@
{
id: 'glm-4.5v',
name: 'GLM-4.5V',
group: 'zhipu',
providerId: this.provider.id,
isCustom: false,
-
vision: true, contextLength: 65536, maxTokens: 8192 },
🤖 Prompt for AI Agents
In src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts around
lines 142 to 150, the GLM-4.5V model entry is missing the vision flag so the
multimodal image path is not used; add vision: true to that model object (and
optionally add vision: true to other glm-4v* entries such as glm-4v,
glm-4v-plus-0111, glm-4v-flash if those models support vision) so the upper
layers will include image_url and route requests through the provider's vision
handling.
add GLM-4.5V model support
Summary by CodeRabbit