Skip to content

fix(config): accept video and audio in models.input schema#39338

Merged
steipete merged 3 commits intoopenclaw:mainfrom
alvinttang:fix/20721-models-input-video-audio
Apr 26, 2026
Merged

fix(config): accept video and audio in models.input schema#39338
steipete merged 3 commits intoopenclaw:mainfrom
alvinttang:fix/20721-models-input-video-audio

Conversation

@alvinttang
Copy link
Copy Markdown
Contributor

Summary

  • Add "video" and "audio" to the models.providers[].models[].input zod schema and TypeScript type
  • Previously only "text" and "image" were accepted, blocking Gemini native multimodal capabilities
  • The runtime already supports these media types (MAX_VIDEO_BYTES, MAX_AUDIO_BYTES, mediaKindFromMime())
  • Aligns with MediaUnderstandingCapabilitiesSchema which already accepts all four types

Fixes #20721

Test plan

  • Verify config with "input": ["text", "image", "video"] passes validation
  • Verify existing "input": ["text", "image"] configs still work
  • Run openclaw doctor with video/audio model input declarations

🤖 Generated with Claude Code

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Mar 8, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 8, 2026

Greptile Summary

This PR has two distinct goals bundled together: (1) extend ModelDefinitionSchema and ModelDefinitionConfig to accept "video" and "audio" as valid input values, unblocking Gemini native multimodal configs; and (2) add a reverse-lookup in normalizeProviders that replaces a resolved plaintext API key back to its canonical env var name before writing models.json, preventing secret persistence.

Key changes:

  • src/config/zod-schema.core.ts / src/config/types.models.ts: straightforward union extension — the schema and type are now consistent with MediaUnderstandingCapabilitiesSchema and the runtime constants (MAX_VIDEO_BYTES, MAX_AUDIO_BYTES). ✅
  • src/agents/models-config.providers.ts: new ENV_VAR_NAME_RE regex and reverse-lookup block in normalizeProviders. The fix is correct for the canonical-env-var case but has two documented limitations worth noting: it won't recover secrets resolved from non-canonical env var names, and it can silently replace an explicitly hardcoded key that coincidentally equals the canonical env var value in the current environment.
  • Tests cover the API-key persistence fix but no automated test was added for the video/audio schema change itself (the PR test plan remains unchecked). Consider adding a small schema parse test (e.g. ModelDefinitionSchema.parse({ ..., input: ["text", "video", "audio"] })) to lock in the new accepted values.

Confidence Score: 4/5

  • Safe to merge — schema changes are correct and well-aligned; the API key fix is a reasonable best-effort improvement with minor edge cases.
  • The video/audio schema additions are trivial, backwards-compatible, and consistent with existing runtime support. The API key reverse-lookup is a sensible security improvement with passing tests. Two minor style-level concerns (partial coverage of non-canonical env vars, possible false-positive mutation of explicitly hardcoded keys) prevent a perfect score but do not block merging.
  • src/agents/models-config.providers.ts — review the two edge cases in the reverse-lookup logic noted in the inline comments.

Last reviewed commit: 7f2be15

Comment thread src/agents/models-config.providers.ts Outdated
Comment on lines +523 to +538
// Reverse-lookup: if apiKey looks like a resolved secret value (not an env
// var name), check whether it matches the canonical env var for this provider.
// This prevents resolveConfigEnvVars()-resolved secrets from being persisted
// to models.json as plaintext. (Fixes #38757)
const currentApiKey = normalizedProvider.apiKey;
if (
typeof currentApiKey === "string" &&
currentApiKey.trim() &&
!ENV_VAR_NAME_RE.test(currentApiKey.trim())
) {
const envVarName = resolveEnvApiKeyVarName(normalizedKey);
if (envVarName && process.env[envVarName] === currentApiKey) {
mutated = true;
normalizedProvider = { ...normalizedProvider, apiKey: envVarName };
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverse-lookup only covers canonical env vars

The fix correctly handles the case where resolveConfigEnvVars() has expanded a canonical env var reference (e.g. ${OPENAI_API_KEY} → the real key value). However, if the user referenced a non-canonical env var like ${MY_CUSTOM_OPENAI_KEY}, resolveEnvApiKeyVarName(normalizedKey) will return the canonical name (OPENAI_API_KEY), whose value won't match the resolved secret, so the plaintext will still be persisted.

This is worth documenting as a known limitation in the comment, so future maintainers understand the scope of the fix:

// Note: this reverse-lookup only fires for the *canonical* env var
// for the provider (e.g. OPENAI_API_KEY).  Secrets resolved from
// non-canonical env vars (e.g. ${MY_CUSTOM_KEY}) are not recovered
// here and will still be written as plaintext.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/models-config.providers.ts
Line: 523-538

Comment:
**Reverse-lookup only covers canonical env vars**

The fix correctly handles the case where `resolveConfigEnvVars()` has expanded a canonical env var reference (e.g. `${OPENAI_API_KEY}` → the real key value). However, if the user referenced a non-canonical env var like `${MY_CUSTOM_OPENAI_KEY}`, `resolveEnvApiKeyVarName(normalizedKey)` will return the canonical name (`OPENAI_API_KEY`), whose value won't match the resolved secret, so the plaintext will still be persisted.

This is worth documenting as a known limitation in the comment, so future maintainers understand the scope of the fix:

```
// Note: this reverse-lookup only fires for the *canonical* env var
// for the provider (e.g. OPENAI_API_KEY).  Secrets resolved from
// non-canonical env vars (e.g. ${MY_CUSTOM_KEY}) are not recovered
// here and will still be written as plaintext.
```

How can I resolve this? If you propose a fix, please make it concise.

Comment thread src/agents/models-config.providers.ts Outdated
Comment on lines +528 to +537
if (
typeof currentApiKey === "string" &&
currentApiKey.trim() &&
!ENV_VAR_NAME_RE.test(currentApiKey.trim())
) {
const envVarName = resolveEnvApiKeyVarName(normalizedKey);
if (envVarName && process.env[envVarName] === currentApiKey) {
mutated = true;
normalizedProvider = { ...normalizedProvider, apiKey: envVarName };
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible unintended mutation for explicitly-hardcoded keys

If a user has deliberately hardcoded apiKey: "sk-abc" in their config (i.e. it was never an env var reference), but OPENAI_API_KEY happens to be set to the same value in the environment, the reverse-lookup will silently rewrite the config entry to "OPENAI_API_KEY". This means that if the user later rotates the env var, the formerly-hardcoded key will start resolving to the new value — a potentially surprising behaviour change.

Adding a guard that skips the replacement when the config was loaded from a source that never contained ${…} syntax would eliminate this false-positive, but even a code comment noting this scenario would reduce surprise for future reviewers.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/models-config.providers.ts
Line: 528-537

Comment:
**Possible unintended mutation for explicitly-hardcoded keys**

If a user has deliberately hardcoded `apiKey: "sk-abc"` in their config (i.e. it was never an env var reference), but `OPENAI_API_KEY` happens to be set to the same value in the environment, the reverse-lookup will silently rewrite the config entry to `"OPENAI_API_KEY"`. This means that if the user later rotates the env var, the formerly-hardcoded key will start resolving to the new value — a potentially surprising behaviour change.

Adding a guard that skips the replacement when the config was loaded from a source that never contained `${…}` syntax would eliminate this false-positive, but even a code comment noting this scenario would reduce surprise for future reviewers.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@xkonjin xkonjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick review pass:

  • Main risk area here is auth/session state and stale credential handling.
  • Good to see test coverage move with the code; I’d still make sure it exercises the unhappy path around auth/session state and stale credential handling rather than only the happy path.
  • Before merge, I’d smoke-test the behavior touched by models-config.fills-missing-provider-apikey-from-env-var.test.ts, models-config.providers.normalize-keys.test.ts, models-config.providers.ts (+2 more) with malformed input / retry / rollback cases, since that’s where this class of change usually breaks.

@gambletan
Copy link
Copy Markdown
Contributor

Thanks for the review @xkonjin, really appreciate the thorough pass.

You raise a good point about unhappy-path coverage. The current tests focus on the happy path for both the schema change and the API key reverse-lookup, but we should definitely exercise edge cases like:

  • Malformed or unexpected values in models.input (e.g. typos, empty strings, mixed-case variants) to make sure the schema rejects them cleanly
  • Stale/rotated credentials and the interaction with normalizeProviders when env var state is inconsistent
  • Retry/rollback scenarios around models.json writes when the reverse-lookup fires

I'll add coverage for these edge cases before merge. The auth/session state concern is well-taken -- that's exactly the area where silent failures would be hardest to diagnose in production.

@alvinttang alvinttang force-pushed the fix/20721-models-input-video-audio branch from 7f2be15 to e245163 Compare March 8, 2026 09:05
@openclaw-barnacle openclaw-barnacle Bot added the channel: bluebubbles Channel integration: bluebubbles label Mar 8, 2026
@alvinttang alvinttang force-pushed the fix/20721-models-input-video-audio branch from e245163 to 3d2e325 Compare March 8, 2026 09:06
@openclaw-barnacle openclaw-barnacle Bot added size: XS and removed channel: bluebubbles Channel integration: bluebubbles agents Agent runtime and tooling size: S labels Mar 8, 2026
@gambletan
Copy link
Copy Markdown
Contributor

Thanks for the feedback @xkonjin — I'll add unhappy-path test coverage for malformed input and stale credential handling. Will push an update shortly.

@gambletan
Copy link
Copy Markdown
Contributor

Working on adding unhappy-path test coverage for malformed input, stale credentials, and retry/rollback scenarios. Will also address the two code comments from greptile. Update coming.

@altaywtf altaywtf self-assigned this Mar 13, 2026
Copy link
Copy Markdown
Member

@altaywtf altaywtf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the schema/type validation side of #20721. I found one remaining behavior gap before this is ready.

src/agents/model-catalog.ts:100 still filters configured model input down to text | image | document:

const normalized = input.filter(
  (item): item is ModelInputType => item === "text" || item === "image" || item === "document",
);

That means configured opt-in provider models still lose audio and video when they flow through loadModelCatalog(), even after this PR makes the config schema accept them.

I verified it with this repro:

node --import tsx --eval 'import { loadModelCatalog } from "./src/agents/model-catalog.ts"; const result = await loadModelCatalog({ useCache: false, config: { models: { providers: { kilocode: { baseUrl: "https://api.kilo.ai/api/gateway/", api: "openai-completions", models: [{ id: "google/gemini-3-pro-preview", name: "Gemini 3 Pro Preview", input: ["text", "image", "video", "audio"], reasoning: true, cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }, contextWindow: 1048576, maxTokens: 65536 }] } } } } }); console.log(JSON.stringify(result.filter((entry) => entry.provider === "kilocode"), null, 2));'

Observed result:

[
  {
    "id": "google/gemini-3-pro-preview",
    "name": "Gemini 3 Pro Preview",
    "provider": "kilocode",
    "contextWindow": 1048576,
    "reasoning": true,
    "input": ["text", "image"]
  }
]

Since models.list and other callers consume loadGatewayModelCatalog() / loadModelCatalog(), the new capabilities are still lost on that path.

I’d recommend widening ModelInputType and normalizeConfiguredModelInput() in src/agents/model-catalog.ts too, and adding a regression test in src/agents/model-catalog.test.ts that asserts configured provider models retain audio / video through loadModelCatalog().

@altaywtf altaywtf added the stale Marked as stale due to inactivity label Mar 23, 2026
@altaywtf altaywtf removed their assignment Mar 23, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label Apr 23, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 26, 2026

Codex automated review: keeping this open.

Keep open: current main still rejects "audio" and "video" in models.providers[].models[].input, and the submitted two-file PR is now stale relative to newer narrow typings elsewhere, so this is not a strong auto-close candidate.

Best possible solution:

Continue tracking this item until the missing behavior is implemented or a maintainer decides the product direction.

What I checked:

  • Current config schema still only allows text/image: ModelDefinitionSchema on the current checkout only accepts "text" and "image" for models.providers[].models[].input, so the reported config limitation is still present on main. (src/config/zod-schema.core.ts:315, b8b270d5b89b)
  • Current config type is still narrow: ModelDefinitionConfig still types input as Array<"text" | "image">, matching the schema restriction rather than the PR's requested widening. (src/config/types.models.ts:83, b8b270d5b89b)
  • Adjacent config surface already accepts audio/video: MediaUnderstandingCapabilitiesSchema already accepts "image", "audio", and "video", which makes the PR's mismatch claim coherent rather than obsolete. (src/config/zod-schema.core.ts:720, b8b270d5b89b)
  • Current main has additional narrow input types beyond the PR diff: ListRowModel still constrains input to Array<"text" | "image">, so the PR's original two-file change no longer demonstrates a complete current-main fix and should be refreshed instead of auto-closed. (src/commands/models/list.model-row.ts:11, b8b270d5b89b)
  • Discussion pointed toward follow-up work, not obsolescence: Maintainer comments on 2026-03-08 and 2026-03-10 asked for unhappy-path coverage and follow-up updates; that suggests stale-but-salvageable work, not a resolved or incoherent PR.

Remaining risk / open question:

  • gh could not be used in this environment because GH_TOKEN is unavailable, so the review relied on the provided PR discussion/timeline plus the local checkout.
  • The local repository history is shallow/grafted, which limited commit-lineage tracing; the keep-open decision is based on current main behavior rather than full merge history.

Codex Review notes: reviewed against 017252e4f8e7.

@steipete steipete force-pushed the fix/20721-models-input-video-audio branch from 3d2e325 to 70d6616 Compare April 26, 2026 03:49
@steipete
Copy link
Copy Markdown
Contributor

Thanks for the review. I fixed the remaining catalog path: configured provider catalog entries now preserve audio/video inputs, ModelInputType includes them, and LM Studio dynamic discovery narrows back to text/image before constructing pi-ai runtime models.

Verified locally:

  • pnpm test src/plugin-sdk/provider-catalog-shared.test.ts src/agents/model-catalog.test.ts
  • pnpm tsgo:core
  • pnpm tsgo:extensions
  • pnpm config:schema:check && pnpm config:docs:check
  • pnpm check:changed

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling extensions: lmstudio size: S and removed size: XS labels Apr 26, 2026
@steipete steipete force-pushed the fix/20721-models-input-video-audio branch from ae23c25 to ee15ca1 Compare April 26, 2026 03:58
gambletan and others added 2 commits April 26, 2026 05:10
The models.providers[].models[].input zod schema only accepted
'text' and 'image', rejecting 'video' and 'audio' values. This
blocked users from declaring Gemini native multimodal capabilities,
even though the runtime already supports these media types
(MAX_VIDEO_BYTES, MAX_AUDIO_BYTES, mediaKindFromMime).

Align the config schema and TypeScript type with the existing
MediaUnderstandingCapabilitiesSchema which already accepts all
four media types.

Fixes openclaw#20721

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@steipete steipete force-pushed the fix/20721-models-input-video-audio branch from ee15ca1 to 5ed830a Compare April 26, 2026 04:11
@steipete steipete merged commit 4428661 into openclaw:main Apr 26, 2026
64 of 67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: models.input type union rejects "video" / "audio" — blocks Gemini native multimodal config

5 participants