Skip to content

Conversation

@yyhhyyyyyy
Copy link
Collaborator

@yyhhyyyyyy yyhhyyyyyy commented Aug 15, 2025

add GLM-4.5V model support

Summary by CodeRabbit

  • New Features
    • Added GLM-4.5V multimodal model to the Zhipu provider.
    • Supports image understanding (vision), function calling, and advanced reasoning.
    • Offers a 65k context window with up to 8k generated tokens.
    • Available alongside existing GLM-4.5 options without affecting current selections.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 15, 2025

Walkthrough

Adds GLM-4.5V (multimodal) model entries to default settings, Zhipu provider-specific settings, and the Zhipu provider’s fetchOpenAIModels list. No control flow, signatures, or error handling changes.

Changes

Cohort / File(s) Summary
Default model settings update
src/main/presenter/configPresenter/modelDefaultSettings.ts
Adds GLM-4.5V entry (id: glm-4.5v) with vision, functionCall, reasoning; temperature 0.7, maxTokens 8192, contextLength 65536.
Provider model settings update (Zhipu)
src/main/presenter/configPresenter/providerModelSettings.ts
Adds GLM-4.5V config under Zhipu; same parameters and matching ['glm-4.5v']; positioned before glm-4.5.
Zhipu provider models list update
src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts
Extends fetchOpenAIModels with GLM-4.5V in multimodal section; sets metadata (group zhipu, providerId, isCustom false).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Suggested reviewers

  • zerob13

Poem

I twitch my whiskers—new V in the mix,
GLM hops in with multimodal tricks.
In defaults, providers, the list it will stay,
A carrot of config to brighten the day.
Hop, save, merge—then bound away! 🥕🐇

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/add-glm-4.5v-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7174593 and fce0b8e.

📒 Files selected for processing (3)
  • src/main/presenter/configPresenter/modelDefaultSettings.ts (1 hunks)
  • src/main/presenter/configPresenter/providerModelSettings.ts (1 hunks)
  • src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (9)
**/*.{ts,tsx,js,jsx,vue}

📄 CodeRabbit Inference Engine (CLAUDE.md)

Use English for logs and comments

Files:

  • src/main/presenter/configPresenter/modelDefaultSettings.ts
  • src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts
  • src/main/presenter/configPresenter/providerModelSettings.ts
**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (CLAUDE.md)

Strict type checking enabled for TypeScript

**/*.{ts,tsx}: 始终使用 try-catch 处理可能的错误
提供有意义的错误信息
记录详细的错误日志
优雅降级处理
日志应包含时间戳、日志级别、错误代码、错误描述、堆栈跟踪(如适用)、相关上下文信息
日志级别应包括 ERROR、WARN、INFO、DEBUG
不要吞掉错误
提供用户友好的错误信息
实现错误重试机制
避免记录敏感信息
使用结构化日志
设置适当的日志级别

Files:

  • src/main/presenter/configPresenter/modelDefaultSettings.ts
  • src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts
  • src/main/presenter/configPresenter/providerModelSettings.ts
src/main/**/*.ts

📄 CodeRabbit Inference Engine (CLAUDE.md)

Main to Renderer: Use EventBus to broadcast events via mainWindow.webContents.send()

Use Electron's built-in APIs for file system and native dialogs

Files:

  • src/main/presenter/configPresenter/modelDefaultSettings.ts
  • src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts
  • src/main/presenter/configPresenter/providerModelSettings.ts
src/main/presenter/**/*.ts

📄 CodeRabbit Inference Engine (CLAUDE.md)

One presenter per functional domain

Files:

  • src/main/presenter/configPresenter/modelDefaultSettings.ts
  • src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts
  • src/main/presenter/configPresenter/providerModelSettings.ts
src/main/presenter/configPresenter/**/*.ts

📄 CodeRabbit Inference Engine (CLAUDE.md)

Centralize configuration in configPresenter/

Files:

  • src/main/presenter/configPresenter/modelDefaultSettings.ts
  • src/main/presenter/configPresenter/providerModelSettings.ts
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit Inference Engine (.cursor/rules/development-setup.mdc)

**/*.{js,jsx,ts,tsx}: 使用 OxLint 进行代码检查
Log和注释使用英文书写

Files:

  • src/main/presenter/configPresenter/modelDefaultSettings.ts
  • src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts
  • src/main/presenter/configPresenter/providerModelSettings.ts
src/{main,renderer}/**/*.ts

📄 CodeRabbit Inference Engine (.cursor/rules/electron-best-practices.mdc)

src/{main,renderer}/**/*.ts: Use context isolation for improved security
Implement proper inter-process communication (IPC) patterns
Optimize application startup time with lazy loading
Implement proper error handling and logging for debugging

Files:

  • src/main/presenter/configPresenter/modelDefaultSettings.ts
  • src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts
  • src/main/presenter/configPresenter/providerModelSettings.ts
src/main/**/*.{ts,js,tsx,jsx}

📄 CodeRabbit Inference Engine (.cursor/rules/project-structure.mdc)

主进程代码放在 src/main

Files:

  • src/main/presenter/configPresenter/modelDefaultSettings.ts
  • src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts
  • src/main/presenter/configPresenter/providerModelSettings.ts
src/main/presenter/llmProviderPresenter/providers/*.ts

📄 CodeRabbit Inference Engine (CLAUDE.md)

src/main/presenter/llmProviderPresenter/providers/*.ts: Create provider file in src/main/presenter/llmProviderPresenter/providers/ when adding a new LLM provider
Implement coreStream method following standardized event interface in LLM provider files

src/main/presenter/llmProviderPresenter/providers/*.ts: Each file in src/main/presenter/llmProviderPresenter/providers/*.ts should handle interaction with a specific LLM API, including request/response formatting, tool definition conversion, native/non-native tool call management, and standardizing output streams to a common event format.
Provider implementations must use a coreStream method that yields standardized stream events to decouple the main loop from provider-specific details.
The coreStream method in each Provider must perform a single streaming API request per conversation round and must not contain multi-round tool call loop logic.
Provider files should implement helper methods such as formatMessages, convertToProviderTools, parseFunctionCalls, and prepareFunctionCallPrompt as needed for provider-specific logic.
All provider implementations must parse provider-specific data chunks and yield standardized events for text, reasoning, tool calls, usage, errors, stop reasons, and image data.
When a provider does not support native function calling, it must prepare messages using prompt wrapping (e.g., prepareFunctionCallPrompt) before making the API call.
When a provider supports native function calling, MCP tools must be converted to the provider's format (e.g., using convertToProviderTools) and included in the API request.
Provider implementations should aggregate and yield usage events as part of the standardized stream.
Provider implementations should yield image data events in the standardized format when applicable.
Provider implementations should yield reasoning events in the standardized format when applicable.
Provider implementations should yield tool call events (`tool_call_star...

Files:

  • src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts
🧠 Learnings (1)
📚 Learning: 2025-07-21T01:45:33.790Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T01:45:33.790Z
Learning: Applies to src/main/presenter/configPresenter/providers.ts : Add provider configuration in configPresenter/providers.ts when adding a new LLM provider

Applied to files:

  • src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts
  • src/main/presenter/configPresenter/providerModelSettings.ts
🔇 Additional comments (4)
src/main/presenter/configPresenter/modelDefaultSettings.ts (1)

1648-1658: GLM-4.5V default settings added — consistent and well-scoped

The new entry aligns with the Zhipu provider settings and the provider’s model list (contextLength 65536, maxTokens 8192, vision true). No control flow or typing issues here.

src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts (1)

142-150: Added GLM-4.5V to Zhipu catalog — matches defaults/provider config

The catalog entry is consistent with default and provider settings (contextLength 65536, maxTokens 8192). Placement under the multimodal section is correct.

src/main/presenter/configPresenter/providerModelSettings.ts (2)

310-320: Zhipu provider model settings for GLM-4.5V — consistent with defaults

The provider-specific entry mirrors defaults (temperature 0.7, maxTokens 8192, contextLength 65536, vision true, functionCall true, reasoning true). Looks good.


310-320: Confirmed: keep functionCall: true for GLM-4.5V

GLM‑4.5V supports native OpenAI‑compatible function/tool calling (docs/examples show the OpenAI-style "tools" parameter and tool_call responses). No change required.

  • File: src/main/presenter/configPresenter/providerModelSettings.ts — glm-4.5v entry (lines ~310–320): keep functionCall: true.

Caveat: ensure your chosen deployment/SDK honors the OpenAI-style tools schema; if it doesn't, use your prompt-wrapping path instead.

Comment on lines +1648 to +1658
{
id: 'glm-4.5v',
name: 'GLM-4.5V',
temperature: 0.7,
maxTokens: 8192,
contextLength: 65536,
match: ['glm-4.5v'],
vision: true,
functionCall: true,
reasoning: true
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Confirm GLM-4.5V capabilities (function calling, context window) and naming

Please double-check with Zhipu’s latest docs that:

  • Official model ID is exactly glm-4.5v
  • It supports native function/tool calling
  • Context window should be 65,536 (vs 128k like text-only 4.5 variants)

If any of these differ, update this block and provider-specific settings accordingly.

Optionally, verify codebase references to ensure no other places need updates:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -nC2 -i '\bglm[- ]?4\.5v\b|GLM-4\.5V'

Length of output: 2336


🌐 Web query:

What is the official model identifier for Zhipu's GLM-4.5V, its context window size, and does it support native function (tool) calling compatible with OpenAI "tools"?

💡 Result:

The official model identifier for Zhipu's GLM-4.5V is glm-4.5v[7]. Its context window size is up to 64K tokens for multimodal (i.e., both text and visual) inputs[2][6][9]. Current documentation and developer guides do not confirm official support for direct native function/tool calling in the way that OpenAI's "tools" (previously called "functions") works, nor do they state explicit compatibility with the OpenAI "tools" API schema.

Details and Supporting Information

  • Official model identifier:

    • As shown in Zhipu's Python SDK API example, the invocation looks like this:
      response = client.chat.completions.create(model="glm-4.5v", ...)
      This indicates the canonical model name for API access is glm-4.5v[7].
  • Context window size:

    • Zhipu GLM-4.5V supports a 64K token multimodal context window (i.e., it can process up to 64,000 tokens of combined text, image, or video per request)[2][6][9]. This is confirmed in technical summaries and product announcements.
  • Function (Tool) Calling Support:

    • There is no explicit mention in the available official documentation, technical reports, or SDK examples of a native implementation for function calling (i.e., OpenAI "tools" schema compatibility for structured API endpoints as actions, also called "functions")[7][3].
    • While GLM-4.5V is designed for strong multimodal and agent scenarios and can be integrated into GUI agent prompts and desktop tools, the APIs provided show only classic chat/inference patterns (prompt completion and image/video inputs)[2][3][7]. No evidence is given of a dedicated function calling interface or adherence to the OpenAI "tools" JSON schema for function calling.
    • Available developer demos show classic multimodal inference, visual grounding, and GUI agent tasks using prompt engineering, not a schema-augmented or structured function call format as standardized by OpenAI's "tools"[3].

Summary Table

Aspect GLM-4.5V Specification
Official Model Name glm-4.5v
Context Window 64,000 tokens (64K), multimodal
OpenAI "tools" Support Not officially documented as supported

If your use case requires structured function/tool calling as defined by OpenAI's "tools" schema (functions with input/output schemas handled natively by the model API), there is no direct evidence of support for this in GLM-4.5V as of August 2025. Integration would likely require custom prompt handling and downstream orchestration code on your part.

Citations:


Action: Set GLM-4.5V functionCall to false (ID & context verified)

Short summary: Verified the official model ID is glm-4.5v and the multimodal context window is 64K tokens (65536). There is no official documentation showing native function/tool-calling support compatible with OpenAI "tools", so the code should not mark this model as supporting function calls.

Files/locations to update:

  • src/main/presenter/configPresenter/modelDefaultSettings.ts — entry at ~lines 1649–1656: change functionCall: true
  • src/main/presenter/configPresenter/providerModelSettings.ts — entry at ~lines 311–318: change functionCall: true
  • src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts — model entry around line ~143: verify and update if it marks function calling

Suggested change (replace the single line in each block):

  • from:
    functionCall: true,
  • to:
    functionCall: false, // no official native function/tool-calling documented for glm-4.5v

Keep vision: true and contextLength: 65536 as-is.

🤖 Prompt for AI Agents
In src/main/presenter/configPresenter/modelDefaultSettings.ts around lines
1648–1658, change the glm-4.5v model entry to set functionCall to false (it
currently is true) because there's no official native function/tool-calling
support; likewise update
src/main/presenter/configPresenter/providerModelSettings.ts around lines 311–318
to set functionCall: false for the glm-4.5v entry, and inspect
src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts near line
~143 to verify any glm-4.5v model entry does not mark functionCall true and
change it to false if present; keep vision: true and contextLength: 65536
unchanged and add a short inline comment like "// no official native
function/tool-calling documented for glm-4.5v" where you flip the flag.

Comment on lines +142 to +150
{
id: 'glm-4.5v',
name: 'GLM-4.5V',
group: 'zhipu',
providerId: this.provider.id,
isCustom: false,
contextLength: 65536,
maxTokens: 8192
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Sanity check: vision handling path works for GLM-4.5V

Since this model is multimodal, ensure the OpenAI-compatible code path properly formats image inputs for Zhipu (as it already does for glm-4v*). No code changes required here if the existing vision path is reused; just verify end-to-end.


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Check for existing vision handling in Zhipu/OpenAI-compatible paths
rg -n -C3 -i 'glm-4v|image|vision' src/main/presenter/llmProviderPresenter/providers

Length of output: 121472


🏁 Script executed:

#!/bin/bash
set -euo pipefail
file="src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts"
echo "=== File: $file ==="
nl -ba "$file" | sed -n '1,420p'

Length of output: 449


🏁 Script executed:

#!/bin/bash
set -euo pipefail
file="src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts"
if [[ ! -f "$file" ]]; then
  echo "FILE_NOT_FOUND: $file"
  exit 0
fi
echo "=== Showing $file (first 400 lines) ==="
sed -n '1,400p' "$file"
echo "=== Searching for inheritance and vision/image handling within the file ==="
rg -n -C2 "class |extends|handleImg|image_url|image:|image_url|image" "$file" || true

Length of output: 6476


Action: mark GLM-4.5V as vision-enabled in zhipuProvider

OpenAICompatibleProvider already implements the image/vision handling path and ZhipuProvider inherits it, but the GLM-4.5V model entry in src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts is not flagged as vision:true — the upper layer uses that flag to decide whether to insert image_url. Add the vision flag so the multimodal path is used end-to-end (and do the same for other glm-4v* variants if they are multimodal).

Files to update:

  • src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts — add vision: true to the GLM-4.5V model object (and optionally glm-4v*, glm-4v-plus-0111, glm-4v-flash if they support vision).

Suggested diff:
@@
{
id: 'glm-4.5v',
name: 'GLM-4.5V',
group: 'zhipu',
providerId: this.provider.id,
isCustom: false,

  •    vision: true,
       contextLength: 65536,
       maxTokens: 8192
     },
    
🤖 Prompt for AI Agents
In src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts around
lines 142 to 150, the GLM-4.5V model entry is missing the vision flag so the
multimodal image path is not used; add vision: true to that model object (and
optionally add vision: true to other glm-4v* entries such as glm-4v,
glm-4v-plus-0111, glm-4v-flash if those models support vision) so the upper
layers will include image_url and route requests through the provider's vision
handling.

@zerob13 zerob13 merged commit f1b9111 into dev Aug 15, 2025
2 checks passed
@zerob13 zerob13 deleted the feat/add-glm-4.5v-support branch November 23, 2025 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants