Skip to content

feat: add Moonshot (Kimi K2.5) native video understanding provider#12063

Closed
xiaoyaner0201 wants to merge 4 commits intoopenclaw:mainfrom
xiaoyaner0201:feat/moonshot-video-provider
Closed

feat: add Moonshot (Kimi K2.5) native video understanding provider#12063
xiaoyaner0201 wants to merge 4 commits intoopenclaw:mainfrom
xiaoyaner0201:feat/moonshot-video-provider

Conversation

@xiaoyaner0201
Copy link
Contributor

@xiaoyaner0201 xiaoyaner0201 commented Feb 8, 2026

Summary

Add Moonshot AI (Kimi K2.5) as a native video understanding provider for media-understanding.

K2.5 uses MoonViT — a native multimodal vision encoder trained on video tokens, providing true video understanding rather than frame extraction.

Changes

  • New provider: src/media-understanding/providers/moonshot/ (index + video)
  • Registration: Added to provider registry in providers/index.ts
  • Auto-detection: Added moonshot to AUTO_VIDEO_KEY_PROVIDERS in runner.ts

How it works

  • Uses OpenAI-compatible API at https://api.moonshot.ai/v1
  • Sends video as base64 data URI via video_url content type
  • Handles K2.5's thinking mode (reasoning_content fallback)
  • Auto-detects via MOONSHOT_API_KEY environment variable
  • Model: kimi-k2.5 (1T params, 32B activated, 256K context)

Testing

  • Verified with demo video (264KB mp4 → 20K tokens)
  • K2.5 correctly identified video content using native vision
  • API authentication and response parsing working correctly

Greptile Overview

Greptile Summary

This PR wires in a new moonshot media-understanding provider with image+video capabilities. It registers the provider in the media provider registry, adds a Moonshot video implementation that calls an OpenAI-compatible /chat/completions endpoint with video_url as a base64 data URI, and extends auto-detection so video can be resolved via MOONSHOT_API_KEY.

Key integration points are src/media-understanding/providers/index.ts (registry) and src/media-understanding/runner.ts (auto video provider list + provider invocation).

Confidence Score: 3/5

  • This PR is close to mergeable but has a couple of behavioral/security issues that should be fixed first.
  • Score reduced due to (1) video provider config (baseUrl/headers) being ignored, which will break intended configuration like Moonshot CN endpoint, and (2) SSRF policy being widened to allow private-network access based solely on providing a custom baseUrl.
  • src/media-understanding/runner.ts; src/media-understanding/providers/moonshot/video.ts

千乘妙 added 3 commits February 9, 2026 04:05
- New moonshot provider in media-understanding/providers/moonshot/
- Uses OpenAI-compatible chat completions API with video_url content type
- Base64 data URI for video input (native multimodal, not frame extraction)
- Default model: kimi-k2.5, base URL: api.moonshot.cn/v1
- Register in AUTO_VIDEO_KEY_PROVIDERS for auto-detection
- API key resolved via MOONSHOT_API_KEY env var (already in model-auth.ts)
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +25 to +27
const allowPrivate = Boolean(params.baseUrl?.trim());
const model = resolveModel(params.model);
const url = `${baseUrl}/chat/completions`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SSRF policy bypass

allowPrivate is set based on whether params.baseUrl is non-empty, and when true it enables { ssrfPolicy: { allowPrivateNetwork: true } }. Since baseUrl is user-configurable via models/tool config, this effectively allows private-network requests whenever a custom base URL is set, even though the URL target is also controlled by that base URL.

This should follow the existing SSRF policy used by other providers (i.e., don’t widen to private networks just because the caller supplied a base URL), otherwise a misconfiguration or untrusted config source can turn this into a private-network fetch primitive.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/media-understanding/providers/moonshot/video.ts
Line: 25:27

Comment:
**SSRF policy bypass**

`allowPrivate` is set based on whether `params.baseUrl` is non-empty, and when true it enables `{ ssrfPolicy: { allowPrivateNetwork: true } }`. Since `baseUrl` is user-configurable via models/tool config, this effectively allows private-network requests whenever a custom base URL is set, even though the URL target is also controlled by that base URL.

This should follow the existing SSRF policy used by other providers (i.e., don’t widen to private networks just because the caller supplied a base URL), otherwise a misconfiguration or untrusted config source can turn this into a private-network fetch primitive.

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 8, 2026

Additional Comments (1)

src/media-understanding/runner.ts
Video provider config ignored

In runProviderEntry for video, the call passes baseUrl/headers from providerConfig only (providerConfig?.baseUrl, providerConfig?.headers). This drops entry.baseUrl, config.baseUrl, entry.headers, and config.headers that are supported for audio (and appear intended for providers generally). As a result, configuring Moonshot video to use https://api.moonshot.cn/v1 (or custom headers) via tool config won’t take effect.

Also appears at src/media-understanding/runner.ts:912-918 (audio does merge headers correctly, video should mirror that pattern).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/media-understanding/runner.ts
Line: 971:982

Comment:
**Video provider config ignored**

In `runProviderEntry` for video, the call passes `baseUrl`/`headers` from `providerConfig` only (`providerConfig?.baseUrl`, `providerConfig?.headers`). This drops `entry.baseUrl`, `config.baseUrl`, `entry.headers`, and `config.headers` that are supported for audio (and appear intended for providers generally). As a result, configuring Moonshot video to use `https://api.moonshot.cn/v1` (or custom headers) via tool config won’t take effect.

Also appears at `src/media-understanding/runner.ts:912-918` (audio does merge headers correctly, video should mirror that pattern).

How can I resolve this? If you propose a fix, please make it concise.

yumesha added a commit to yumesha/openclaw that referenced this pull request Feb 16, 2026
## Summary
Add Moonshot AI (Kimi K2.5) as a native video understanding provider.
K2.5 uses MoonViT - a native multimodal vision encoder trained on video tokens.

## Changes
- New provider: src/media-understanding/providers/moonshot/ (index + video)
- Registration: Added to provider registry in providers/index.ts
- Auto-detection: Added moonshot to AUTO_VIDEO_KEY_PROVIDERS
- Fix: Video provider now uses entry.baseUrl/headers (not just providerConfig)

Based on PR openclaw#12063
yumesha added a commit to yumesha/openclaw that referenced this pull request Feb 16, 2026
## Summary
Add Moonshot AI (Kimi K2.5) as a native video understanding provider.
K2.5 uses MoonViT - a native multimodal vision encoder trained on video tokens.

## Changes
- New provider: src/media-understanding/providers/moonshot/ (index + video)
- Registration: Added to provider registry in providers/index.ts
- Auto-detection: Added moonshot to AUTO_VIDEO_KEY_PROVIDERS
- Fix: Video provider now uses entry.baseUrl/headers (not just providerConfig)

Based on PR openclaw#12063
yumesha added a commit to yumesha/openclaw that referenced this pull request Feb 16, 2026
## Summary
Add Moonshot AI (Kimi K2.5) as a native video understanding provider.
K2.5 uses MoonViT - a native multimodal vision encoder trained on video tokens.

## Changes
- New provider: src/media-understanding/providers/moonshot/ (index + video)
- Registration: Added to provider registry in providers/index.ts
- Auto-detection: Added moonshot to AUTO_VIDEO_KEY_PROVIDERS
- Fix: Video provider now uses entry.baseUrl/headers (not just providerConfig)

Based on PR openclaw#12063
@openclaw-barnacle
Copy link

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle bot added stale Marked as stale due to inactivity and removed stale Marked as stale due to inactivity labels Feb 21, 2026
@steipete
Copy link
Contributor

Thanks for the contribution. Closing as superseded by the landed main implementation.

Landed SHA:

  • 7837d23103da587937e52aa00d4bc3050553affd

What landed:

  • Added Moonshot media-understanding provider (src/media-understanding/providers/moonshot/*) including native video support.
  • Registered Moonshot in provider registry and auto-video key providers.
  • Refactored video execution path to honor entry/config/provider baseUrl+headers precedence in runner.entries.ts (matching audio behavior).
  • Added regression tests for provider wiring, video request shape, and runner behavior.

This covers the PR intent on main.

@steipete steipete closed this Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants