feat: add Moonshot (Kimi K2.5) native video understanding provider#12063
feat: add Moonshot (Kimi K2.5) native video understanding provider#12063xiaoyaner0201 wants to merge 4 commits intoopenclaw:mainfrom
Conversation
- New moonshot provider in media-understanding/providers/moonshot/ - Uses OpenAI-compatible chat completions API with video_url content type - Base64 data URI for video input (native multimodal, not frame extraction) - Default model: kimi-k2.5, base URL: api.moonshot.cn/v1 - Register in AUTO_VIDEO_KEY_PROVIDERS for auto-detection - API key resolved via MOONSHOT_API_KEY env var (already in model-auth.ts)
| const allowPrivate = Boolean(params.baseUrl?.trim()); | ||
| const model = resolveModel(params.model); | ||
| const url = `${baseUrl}/chat/completions`; |
There was a problem hiding this comment.
SSRF policy bypass
allowPrivate is set based on whether params.baseUrl is non-empty, and when true it enables { ssrfPolicy: { allowPrivateNetwork: true } }. Since baseUrl is user-configurable via models/tool config, this effectively allows private-network requests whenever a custom base URL is set, even though the URL target is also controlled by that base URL.
This should follow the existing SSRF policy used by other providers (i.e., don’t widen to private networks just because the caller supplied a base URL), otherwise a misconfiguration or untrusted config source can turn this into a private-network fetch primitive.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/media-understanding/providers/moonshot/video.ts
Line: 25:27
Comment:
**SSRF policy bypass**
`allowPrivate` is set based on whether `params.baseUrl` is non-empty, and when true it enables `{ ssrfPolicy: { allowPrivateNetwork: true } }`. Since `baseUrl` is user-configurable via models/tool config, this effectively allows private-network requests whenever a custom base URL is set, even though the URL target is also controlled by that base URL.
This should follow the existing SSRF policy used by other providers (i.e., don’t widen to private networks just because the caller supplied a base URL), otherwise a misconfiguration or untrusted config source can turn this into a private-network fetch primitive.
How can I resolve this? If you propose a fix, please make it concise.
Additional Comments (1)
In Also appears at Prompt To Fix With AIThis is a comment left during a code review.
Path: src/media-understanding/runner.ts
Line: 971:982
Comment:
**Video provider config ignored**
In `runProviderEntry` for video, the call passes `baseUrl`/`headers` from `providerConfig` only (`providerConfig?.baseUrl`, `providerConfig?.headers`). This drops `entry.baseUrl`, `config.baseUrl`, `entry.headers`, and `config.headers` that are supported for audio (and appear intended for providers generally). As a result, configuring Moonshot video to use `https://api.moonshot.cn/v1` (or custom headers) via tool config won’t take effect.
Also appears at `src/media-understanding/runner.ts:912-918` (audio does merge headers correctly, video should mirror that pattern).
How can I resolve this? If you propose a fix, please make it concise. |
bfc1ccb to
f92900f
Compare
## Summary Add Moonshot AI (Kimi K2.5) as a native video understanding provider. K2.5 uses MoonViT - a native multimodal vision encoder trained on video tokens. ## Changes - New provider: src/media-understanding/providers/moonshot/ (index + video) - Registration: Added to provider registry in providers/index.ts - Auto-detection: Added moonshot to AUTO_VIDEO_KEY_PROVIDERS - Fix: Video provider now uses entry.baseUrl/headers (not just providerConfig) Based on PR openclaw#12063
## Summary Add Moonshot AI (Kimi K2.5) as a native video understanding provider. K2.5 uses MoonViT - a native multimodal vision encoder trained on video tokens. ## Changes - New provider: src/media-understanding/providers/moonshot/ (index + video) - Registration: Added to provider registry in providers/index.ts - Auto-detection: Added moonshot to AUTO_VIDEO_KEY_PROVIDERS - Fix: Video provider now uses entry.baseUrl/headers (not just providerConfig) Based on PR openclaw#12063
## Summary Add Moonshot AI (Kimi K2.5) as a native video understanding provider. K2.5 uses MoonViT - a native multimodal vision encoder trained on video tokens. ## Changes - New provider: src/media-understanding/providers/moonshot/ (index + video) - Registration: Added to provider registry in providers/index.ts - Auto-detection: Added moonshot to AUTO_VIDEO_KEY_PROVIDERS - Fix: Video provider now uses entry.baseUrl/headers (not just providerConfig) Based on PR openclaw#12063
|
This pull request has been automatically marked as stale due to inactivity. |
|
Thanks for the contribution. Closing as superseded by the landed Landed SHA:
What landed:
This covers the PR intent on |
Summary
Add Moonshot AI (Kimi K2.5) as a native video understanding provider for media-understanding.
K2.5 uses MoonViT — a native multimodal vision encoder trained on video tokens, providing true video understanding rather than frame extraction.
Changes
src/media-understanding/providers/moonshot/(index + video)providers/index.tsmoonshottoAUTO_VIDEO_KEY_PROVIDERSinrunner.tsHow it works
https://api.moonshot.ai/v1video_urlcontent typereasoning_contentfallback)MOONSHOT_API_KEYenvironment variablekimi-k2.5(1T params, 32B activated, 256K context)Testing
Greptile Overview
Greptile Summary
This PR wires in a new
moonshotmedia-understanding provider with image+video capabilities. It registers the provider in the media provider registry, adds a Moonshot video implementation that calls an OpenAI-compatible/chat/completionsendpoint withvideo_urlas a base64 data URI, and extends auto-detection so video can be resolved viaMOONSHOT_API_KEY.Key integration points are
src/media-understanding/providers/index.ts(registry) andsrc/media-understanding/runner.ts(auto video provider list + provider invocation).Confidence Score: 3/5