Skip to content

Commit cd5b165

Browse files
committed
feat: declare explicit media provider capabilities
1 parent 29df67c commit cd5b165

46 files changed

Lines changed: 1625 additions & 395 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Docs: https://docs.openclaw.ai
77
### Changes
88

99
- Plugins/webhooks: add a bundled webhook ingress plugin so external automation can create and drive bound TaskFlows through per-route shared-secret endpoints. (#61892) Thanks @mbelinky.
10+
- Tools/media: document per-provider music and video generation capabilities, and add shared live video-to-video sweep coverage for providers that support local reference clips.
1011

1112
### Fixes
1213

docs/help/testing.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -475,10 +475,45 @@ If you want to rely on env keys (e.g. exported in your `~/.profile`), run local
475475
- Exercises the shared bundled music-generation provider path
476476
- Currently covers Google and MiniMax
477477
- Loads provider env vars from your login shell (`~/.profile`) before probing
478+
- Uses live/env API keys ahead of stored auth profiles by default, so stale test keys in `auth-profiles.json` do not mask real shell credentials
478479
- Skips providers with no usable auth/profile/model
480+
- Runs both declared runtime modes when available:
481+
- `generate` with prompt-only input
482+
- `edit` when the provider declares `capabilities.edit.enabled`
483+
- Current shared-lane coverage:
484+
- `google`: `generate`, `edit`
485+
- `minimax`: `generate`
486+
- `comfy`: separate Comfy live file, not this shared sweep
479487
- Optional narrowing:
480488
- `OPENCLAW_LIVE_MUSIC_GENERATION_PROVIDERS="google,minimax"`
481489
- `OPENCLAW_LIVE_MUSIC_GENERATION_MODELS="google/lyria-3-clip-preview,minimax/music-2.5+"`
490+
- Optional auth behavior:
491+
- `OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1` to force profile-store auth and ignore env-only overrides
492+
493+
## Video generation live
494+
495+
- Test: `extensions/video-generation-providers.live.test.ts`
496+
- Enable: `OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/video-generation-providers.live.test.ts`
497+
- Scope:
498+
- Exercises the shared bundled video-generation provider path
499+
- Loads provider env vars from your login shell (`~/.profile`) before probing
500+
- Uses live/env API keys ahead of stored auth profiles by default, so stale test keys in `auth-profiles.json` do not mask real shell credentials
501+
- Skips providers with no usable auth/profile/model
502+
- Runs both declared runtime modes when available:
503+
- `generate` with prompt-only input
504+
- `imageToVideo` when the provider declares `capabilities.imageToVideo.enabled`
505+
- `videoToVideo` when the provider declares `capabilities.videoToVideo.enabled` and the selected provider/model accepts buffer-backed local video input in the shared sweep
506+
- Current `videoToVideo` live coverage:
507+
- `google`
508+
- `openai`
509+
- `runway` only when the selected model is `runway/gen4_aleph`
510+
- Current declared-but-skipped `videoToVideo` providers in the shared sweep:
511+
- `alibaba`, `qwen`, `xai` because those paths currently require remote `http(s)` / MP4 reference URLs
512+
- Optional narrowing:
513+
- `OPENCLAW_LIVE_VIDEO_GENERATION_PROVIDERS="google,openai,runway"`
514+
- `OPENCLAW_LIVE_VIDEO_GENERATION_MODELS="google/veo-3.1-fast-generate-preview,openai/sora-2,runway/gen4_aleph"`
515+
- Optional auth behavior:
516+
- `OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1` to force profile-store auth and ignore env-only overrides
482517

483518
## Docker runners (optional "works in Linux" checks)
484519

docs/plugins/sdk-provider-plugins.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -643,10 +643,15 @@ API key auth, and dynamic model resolution.
643643
[Internals: Capability Ownership](/plugins/architecture#capability-ownership-model).
644644

645645
For video generation, prefer the mode-aware capability shape shown above:
646-
`generate`, `imageToVideo`, and `videoToVideo`. The older flat fields such
647-
as `maxInputImages`, `maxInputVideos`, and `maxDurationSeconds` still work
648-
as aggregate fallback caps, but they cannot describe per-mode limits or
649-
disabled transform modes as cleanly.
646+
`generate`, `imageToVideo`, and `videoToVideo`. Flat aggregate fields such
647+
as `maxInputImages`, `maxInputVideos`, and `maxDurationSeconds` are not
648+
enough to advertise transform-mode support or disabled modes cleanly.
649+
650+
Music-generation providers should follow the same pattern:
651+
`generate` for prompt-only generation and `edit` for reference-image-based
652+
generation. Flat aggregate fields such as `maxInputImages`,
653+
`supportsLyrics`, and `supportsFormat` are not enough to advertise edit
654+
support; explicit `generate` / `edit` blocks are the expected contract.
650655

651656
</Step>
652657

docs/tools/music-generation.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,17 @@ Example:
8585
| Google | `lyria-3-clip-preview` | Up to 10 images | `lyrics`, `instrumental`, `format` | `GEMINI_API_KEY`, `GOOGLE_API_KEY` |
8686
| MiniMax | `music-2.5+` | None | `lyrics`, `instrumental`, `durationSeconds`, `format=mp3` | `MINIMAX_API_KEY` |
8787

88+
### Declared capability matrix
89+
90+
This is the explicit mode contract used by `music_generate`, contract tests,
91+
and the shared live sweep.
92+
93+
| Provider | `generate` | `edit` | Edit limit | Shared live lanes |
94+
| -------- | ---------- | ------ | ---------- | ------------------------------------------------------------------------- |
95+
| ComfyUI | Yes | Yes | 1 image | Not in the shared sweep; covered by `extensions/comfy/comfy.live.test.ts` |
96+
| Google | Yes | Yes | 10 images | `generate`, `edit` |
97+
| MiniMax | Yes | No | None | `generate` |
98+
8899
Use `action: "list"` to inspect available shared providers and models at
89100
runtime:
90101

@@ -174,6 +185,36 @@ error includes details from each attempt.
174185
- ComfyUI support is workflow-driven and depends on the configured graph plus
175186
node mapping for prompt/output fields.
176187

188+
## Provider capability modes
189+
190+
The shared music-generation contract now supports explicit mode declarations:
191+
192+
- `generate` for prompt-only generation
193+
- `edit` when the request includes one or more reference images
194+
195+
New provider implementations should prefer explicit mode blocks:
196+
197+
```typescript
198+
capabilities: {
199+
generate: {
200+
maxTracks: 1,
201+
supportsLyrics: true,
202+
supportsFormat: true,
203+
},
204+
edit: {
205+
enabled: true,
206+
maxTracks: 1,
207+
maxInputImages: 1,
208+
supportsFormat: true,
209+
},
210+
}
211+
```
212+
213+
Legacy flat fields such as `maxInputImages`, `supportsLyrics`, and
214+
`supportsFormat` are not enough to advertise edit support. Providers should
215+
declare `generate` and `edit` explicitly so live tests, contract tests, and
216+
the shared `music_generate` tool can validate mode support deterministically.
217+
177218
## Choosing the right path
178219

179220
- Use the shared provider-backed path when you want model selection, provider failover, and the built-in async task/status flow.
@@ -188,6 +229,16 @@ Opt-in live coverage for the shared bundled providers:
188229
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.ts
189230
```
190231

232+
This live file loads missing provider env vars from `~/.profile`, prefers
233+
live/env API keys ahead of stored auth profiles by default, and runs both
234+
`generate` and declared `edit` coverage when the provider enables edit mode.
235+
236+
Today that means:
237+
238+
- `google`: `generate` plus `edit`
239+
- `minimax`: `generate` only
240+
- `comfy`: separate Comfy live coverage, not the shared provider sweep
241+
191242
Opt-in live coverage for the bundled ComfyUI music path:
192243

193244
```bash

docs/tools/video-generation.md

Lines changed: 48 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,26 @@ Some providers accept additional or alternate API key env vars. See individual [
7979
Run `video_generate action=list` to inspect available providers, models, and
8080
runtime modes at runtime.
8181

82+
### Declared capability matrix
83+
84+
This is the explicit mode contract used by `video_generate`, contract tests,
85+
and the shared live sweep.
86+
87+
| Provider | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today |
88+
| -------- | ---------- | -------------- | -------------- | ---------------------------------------------------------------------------------------------------------- |
89+
| Alibaba | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
90+
| BytePlus | Yes | Yes | No | `generate`, `imageToVideo` |
91+
| ComfyUI | Yes | Yes | No | Not in the shared sweep; workflow-specific coverage lives with Comfy tests |
92+
| fal | Yes | Yes | No | `generate`, `imageToVideo` |
93+
| Google | Yes | Yes | Yes | `generate`, `imageToVideo`, `videoToVideo` |
94+
| MiniMax | Yes | Yes | No | `generate`, `imageToVideo` |
95+
| OpenAI | Yes | Yes | Yes | `generate`, `imageToVideo`, `videoToVideo` |
96+
| Qwen | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
97+
| Runway | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph` |
98+
| Together | Yes | Yes | No | `generate`, `imageToVideo` |
99+
| Vydra | Yes | Yes | No | `generate`, `imageToVideo` |
100+
| xAI | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL |
101+
82102
## Tool parameters
83103

84104
### Required
@@ -201,9 +221,34 @@ capabilities: {
201221
}
202222
```
203223

204-
Legacy flat fields such as `maxInputImages` and `maxInputVideos` still work as
205-
backward-compatible aggregate caps, but they cannot express per-mode limits as
206-
precisely.
224+
Flat aggregate fields such as `maxInputImages` and `maxInputVideos` are not
225+
enough to advertise transform-mode support. Providers should declare
226+
`generate`, `imageToVideo`, and `videoToVideo` explicitly so live tests,
227+
contract tests, and the shared `video_generate` tool can validate mode support
228+
deterministically.
229+
230+
## Live tests
231+
232+
Opt-in live coverage for the shared bundled providers:
233+
234+
```bash
235+
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/video-generation-providers.live.test.ts
236+
```
237+
238+
This live file loads missing provider env vars from `~/.profile`, prefers
239+
live/env API keys ahead of stored auth profiles by default, and runs the
240+
declared modes it can exercise safely with local media:
241+
242+
- `generate` for every provider in the sweep
243+
- `imageToVideo` when `capabilities.imageToVideo.enabled`
244+
- `videoToVideo` when `capabilities.videoToVideo.enabled` and the provider/model
245+
accepts buffer-backed local video input in the shared sweep
246+
247+
Today the shared `videoToVideo` live lane covers:
248+
249+
- `google`
250+
- `openai`
251+
- `runway` only when you select `runway/gen4_aleph`
207252

208253
## Configuration
209254

extensions/alibaba/video-generation-provider.ts

Lines changed: 31 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -198,15 +198,37 @@ export function buildAlibabaVideoGenerationProvider(): VideoGenerationProvider {
198198
agentDir,
199199
}),
200200
capabilities: {
201-
maxVideos: 1,
202-
maxInputImages: 1,
203-
maxInputVideos: 4,
204-
maxDurationSeconds: 10,
205-
supportsSize: true,
206-
supportsAspectRatio: true,
207-
supportsResolution: true,
208-
supportsAudio: true,
209-
supportsWatermark: true,
201+
generate: {
202+
maxVideos: 1,
203+
maxDurationSeconds: 10,
204+
supportsSize: true,
205+
supportsAspectRatio: true,
206+
supportsResolution: true,
207+
supportsAudio: true,
208+
supportsWatermark: true,
209+
},
210+
imageToVideo: {
211+
enabled: true,
212+
maxVideos: 1,
213+
maxInputImages: 1,
214+
maxDurationSeconds: 10,
215+
supportsSize: true,
216+
supportsAspectRatio: true,
217+
supportsResolution: true,
218+
supportsAudio: true,
219+
supportsWatermark: true,
220+
},
221+
videoToVideo: {
222+
enabled: true,
223+
maxVideos: 1,
224+
maxInputVideos: 4,
225+
maxDurationSeconds: 10,
226+
supportsSize: true,
227+
supportsAspectRatio: true,
228+
supportsResolution: true,
229+
supportsAudio: true,
230+
supportsWatermark: true,
231+
},
210232
},
211233
async generateVideo(req): Promise<VideoGenerationResult> {
212234
const fetchFn = fetch;

extensions/byteplus/video-generation-provider.ts

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -135,14 +135,27 @@ export function buildBytePlusVideoGenerationProvider(): VideoGenerationProvider
135135
agentDir,
136136
}),
137137
capabilities: {
138-
maxVideos: 1,
139-
maxInputImages: 1,
140-
maxInputVideos: 0,
141-
maxDurationSeconds: 12,
142-
supportsAspectRatio: true,
143-
supportsResolution: true,
144-
supportsAudio: true,
145-
supportsWatermark: true,
138+
generate: {
139+
maxVideos: 1,
140+
maxDurationSeconds: 12,
141+
supportsAspectRatio: true,
142+
supportsResolution: true,
143+
supportsAudio: true,
144+
supportsWatermark: true,
145+
},
146+
imageToVideo: {
147+
enabled: true,
148+
maxVideos: 1,
149+
maxInputImages: 1,
150+
maxDurationSeconds: 12,
151+
supportsAspectRatio: true,
152+
supportsResolution: true,
153+
supportsAudio: true,
154+
supportsWatermark: true,
155+
},
156+
videoToVideo: {
157+
enabled: false,
158+
},
146159
},
147160
async generateVideo(req) {
148161
if ((req.inputVideos?.length ?? 0) > 0) {

extensions/comfy/music-generation-provider.test.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ describe("comfy music-generation provider", () => {
1212

1313
expect(provider.defaultModel).toBe("workflow");
1414
expect(provider.models).toEqual(["workflow"]);
15-
expect(provider.capabilities.maxInputImages).toBe(1);
15+
expect(provider.capabilities.edit?.maxInputImages).toBe(1);
1616
});
1717

1818
it("runs a music workflow and returns audio outputs", async () => {

extensions/comfy/music-generation-provider.ts

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,11 @@ export function buildComfyMusicGenerationProvider(): MusicGenerationProvider {
5050
capability: "music",
5151
}),
5252
capabilities: {
53-
maxInputImages: COMFY_MAX_INPUT_IMAGES,
53+
generate: {},
54+
edit: {
55+
enabled: true,
56+
maxInputImages: COMFY_MAX_INPUT_IMAGES,
57+
},
5458
},
5559
async generateMusic(req) {
5660
if ((req.inputImages?.length ?? 0) > COMFY_MAX_INPUT_IMAGES) {

extensions/comfy/video-generation-provider.ts

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -39,14 +39,27 @@ export function buildComfyVideoGenerationProvider(): VideoGenerationProvider {
3939
capability: "video",
4040
}),
4141
capabilities: {
42-
maxVideos: 1,
43-
maxInputImages: 1,
44-
maxInputVideos: 0,
45-
supportsSize: false,
46-
supportsAspectRatio: false,
47-
supportsResolution: false,
48-
supportsAudio: false,
49-
supportsWatermark: false,
42+
generate: {
43+
maxVideos: 1,
44+
supportsSize: false,
45+
supportsAspectRatio: false,
46+
supportsResolution: false,
47+
supportsAudio: false,
48+
supportsWatermark: false,
49+
},
50+
imageToVideo: {
51+
enabled: true,
52+
maxVideos: 1,
53+
maxInputImages: 1,
54+
supportsSize: false,
55+
supportsAspectRatio: false,
56+
supportsResolution: false,
57+
supportsAudio: false,
58+
supportsWatermark: false,
59+
},
60+
videoToVideo: {
61+
enabled: false,
62+
},
5063
},
5164
async generateVideo(req) {
5265
if ((req.inputImages?.length ?? 0) > 1) {

0 commit comments

Comments
 (0)