fix(config): accept video and audio in models.input schema by alvinttang · Pull Request #39338 · openclaw/openclaw

alvinttang · 2026-03-08T01:20:27Z

Summary

Add "video" and "audio" to the models.providers[].models[].input zod schema and TypeScript type
Previously only "text" and "image" were accepted, blocking Gemini native multimodal capabilities
The runtime already supports these media types (MAX_VIDEO_BYTES, MAX_AUDIO_BYTES, mediaKindFromMime())
Aligns with MediaUnderstandingCapabilitiesSchema which already accepts all four types

Fixes #20721

Test plan

Verify config with "input": ["text", "image", "video"] passes validation
Verify existing "input": ["text", "image"] configs still work
Run openclaw doctor with video/audio model input declarations

🤖 Generated with Claude Code

greptile-apps · 2026-03-08T01:23:39Z

Greptile Summary

This PR has two distinct goals bundled together: (1) extend ModelDefinitionSchema and ModelDefinitionConfig to accept "video" and "audio" as valid input values, unblocking Gemini native multimodal configs; and (2) add a reverse-lookup in normalizeProviders that replaces a resolved plaintext API key back to its canonical env var name before writing models.json, preventing secret persistence.

Key changes:

src/config/zod-schema.core.ts / src/config/types.models.ts: straightforward union extension — the schema and type are now consistent with MediaUnderstandingCapabilitiesSchema and the runtime constants (MAX_VIDEO_BYTES, MAX_AUDIO_BYTES). ✅
src/agents/models-config.providers.ts: new ENV_VAR_NAME_RE regex and reverse-lookup block in normalizeProviders. The fix is correct for the canonical-env-var case but has two documented limitations worth noting: it won't recover secrets resolved from non-canonical env var names, and it can silently replace an explicitly hardcoded key that coincidentally equals the canonical env var value in the current environment.
Tests cover the API-key persistence fix but no automated test was added for the video/audio schema change itself (the PR test plan remains unchecked). Consider adding a small schema parse test (e.g. ModelDefinitionSchema.parse({ ..., input: ["text", "video", "audio"] })) to lock in the new accepted values.

Confidence Score: 4/5

Safe to merge — schema changes are correct and well-aligned; the API key fix is a reasonable best-effort improvement with minor edge cases.
The video/audio schema additions are trivial, backwards-compatible, and consistent with existing runtime support. The API key reverse-lookup is a sensible security improvement with passing tests. Two minor style-level concerns (partial coverage of non-canonical env vars, possible false-positive mutation of explicitly hardcoded keys) prevent a perfect score but do not block merging.
src/agents/models-config.providers.ts — review the two edge cases in the reverse-lookup logic noted in the inline comments.

_{Last reviewed commit: 7f2be15}

greptile-apps · 2026-03-08T01:23:43Z

+    // Reverse-lookup: if apiKey looks like a resolved secret value (not an env
+    // var name), check whether it matches the canonical env var for this provider.
+    // This prevents resolveConfigEnvVars()-resolved secrets from being persisted
+    // to models.json as plaintext. (Fixes #38757)
+    const currentApiKey = normalizedProvider.apiKey;
+    if (
+      typeof currentApiKey === "string" &&
+      currentApiKey.trim() &&
+      !ENV_VAR_NAME_RE.test(currentApiKey.trim())
+    ) {
+      const envVarName = resolveEnvApiKeyVarName(normalizedKey);
+      if (envVarName && process.env[envVarName] === currentApiKey) {
+        mutated = true;
+        normalizedProvider = { ...normalizedProvider, apiKey: envVarName };
+      }
+    }


Reverse-lookup only covers canonical env vars

The fix correctly handles the case where resolveConfigEnvVars() has expanded a canonical env var reference (e.g. ${OPENAI_API_KEY} → the real key value). However, if the user referenced a non-canonical env var like ${MY_CUSTOM_OPENAI_KEY}, resolveEnvApiKeyVarName(normalizedKey) will return the canonical name (OPENAI_API_KEY), whose value won't match the resolved secret, so the plaintext will still be persisted.

This is worth documenting as a known limitation in the comment, so future maintainers understand the scope of the fix:

// Note: this reverse-lookup only fires for the *canonical* env var // for the provider (e.g. OPENAI_API_KEY). Secrets resolved from // non-canonical env vars (e.g. ${MY_CUSTOM_KEY}) are not recovered // here and will still be written as plaintext.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/models-config.providers.ts Line: 523-538 Comment: **Reverse-lookup only covers canonical env vars** The fix correctly handles the case where `resolveConfigEnvVars()` has expanded a canonical env var reference (e.g. `${OPENAI_API_KEY}` → the real key value). However, if the user referenced a non-canonical env var like `${MY_CUSTOM_OPENAI_KEY}`, `resolveEnvApiKeyVarName(normalizedKey)` will return the canonical name (`OPENAI_API_KEY`), whose value won't match the resolved secret, so the plaintext will still be persisted. This is worth documenting as a known limitation in the comment, so future maintainers understand the scope of the fix: ``` // Note: this reverse-lookup only fires for the *canonical* env var // for the provider (e.g. OPENAI_API_KEY). Secrets resolved from // non-canonical env vars (e.g. ${MY_CUSTOM_KEY}) are not recovered // here and will still be written as plaintext. ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-08T01:23:44Z

+    if (
+      typeof currentApiKey === "string" &&
+      currentApiKey.trim() &&
+      !ENV_VAR_NAME_RE.test(currentApiKey.trim())
+    ) {
+      const envVarName = resolveEnvApiKeyVarName(normalizedKey);
+      if (envVarName && process.env[envVarName] === currentApiKey) {
+        mutated = true;
+        normalizedProvider = { ...normalizedProvider, apiKey: envVarName };
+      }


Possible unintended mutation for explicitly-hardcoded keys

If a user has deliberately hardcoded apiKey: "sk-abc" in their config (i.e. it was never an env var reference), but OPENAI_API_KEY happens to be set to the same value in the environment, the reverse-lookup will silently rewrite the config entry to "OPENAI_API_KEY". This means that if the user later rotates the env var, the formerly-hardcoded key will start resolving to the new value — a potentially surprising behaviour change.

Adding a guard that skips the replacement when the config was loaded from a source that never contained ${…} syntax would eliminate this false-positive, but even a code comment noting this scenario would reduce surprise for future reviewers.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/models-config.providers.ts Line: 528-537 Comment: **Possible unintended mutation for explicitly-hardcoded keys** If a user has deliberately hardcoded `apiKey: "sk-abc"` in their config (i.e. it was never an env var reference), but `OPENAI_API_KEY` happens to be set to the same value in the environment, the reverse-lookup will silently rewrite the config entry to `"OPENAI_API_KEY"`. This means that if the user later rotates the env var, the formerly-hardcoded key will start resolving to the new value — a potentially surprising behaviour change. Adding a guard that skips the replacement when the config was loaded from a source that never contained `${…}` syntax would eliminate this false-positive, but even a code comment noting this scenario would reduce surprise for future reviewers. How can I resolve this? If you propose a fix, please make it concise.

xkonjin

Quick review pass:

Main risk area here is auth/session state and stale credential handling.
Good to see test coverage move with the code; I’d still make sure it exercises the unhappy path around auth/session state and stale credential handling rather than only the happy path.
Before merge, I’d smoke-test the behavior touched by models-config.fills-missing-provider-apikey-from-env-var.test.ts, models-config.providers.normalize-keys.test.ts, models-config.providers.ts (+2 more) with malformed input / retry / rollback cases, since that’s where this class of change usually breaks.

gambletan · 2026-03-08T07:07:04Z

Thanks for the review @xkonjin, really appreciate the thorough pass.

You raise a good point about unhappy-path coverage. The current tests focus on the happy path for both the schema change and the API key reverse-lookup, but we should definitely exercise edge cases like:

Malformed or unexpected values in models.input (e.g. typos, empty strings, mixed-case variants) to make sure the schema rejects them cleanly
Stale/rotated credentials and the interaction with normalizeProviders when env var state is inconsistent
Retry/rollback scenarios around models.json writes when the reverse-lookup fires

I'll add coverage for these edge cases before merge. The auth/session state concern is well-taken -- that's exactly the area where silent failures would be hardest to diagnose in production.

gambletan · 2026-03-10T02:23:12Z

Thanks for the feedback @xkonjin — I'll add unhappy-path test coverage for malformed input and stale credential handling. Will push an update shortly.

gambletan · 2026-03-10T21:51:48Z

Working on adding unhappy-path test coverage for malformed input, stale credentials, and retry/rollback scenarios. Will also address the two code comments from greptile. Update coming.

altaywtf

Thanks for fixing the schema/type validation side of #20721. I found one remaining behavior gap before this is ready.

src/agents/model-catalog.ts:100 still filters configured model input down to text | image | document:

const normalized = input.filter(
  (item): item is ModelInputType => item === "text" || item === "image" || item === "document",
);

That means configured opt-in provider models still lose audio and video when they flow through loadModelCatalog(), even after this PR makes the config schema accept them.

I verified it with this repro:

node --import tsx --eval 'import { loadModelCatalog } from "./src/agents/model-catalog.ts"; const result = await loadModelCatalog({ useCache: false, config: { models: { providers: { kilocode: { baseUrl: "https://api.kilo.ai/api/gateway/", api: "openai-completions", models: [{ id: "google/gemini-3-pro-preview", name: "Gemini 3 Pro Preview", input: ["text", "image", "video", "audio"], reasoning: true, cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }, contextWindow: 1048576, maxTokens: 65536 }] } } } } }); console.log(JSON.stringify(result.filter((entry) => entry.provider === "kilocode"), null, 2));'

Observed result:

[
  {
    "id": "google/gemini-3-pro-preview",
    "name": "Gemini 3 Pro Preview",
    "provider": "kilocode",
    "contextWindow": 1048576,
    "reasoning": true,
    "input": ["text", "image"]
  }
]

Since models.list and other callers consume loadGatewayModelCatalog() / loadModelCatalog(), the new capabilities are still lost on that path.

I’d recommend widening ModelInputType and normalizeConfiguredModelInput() in src/agents/model-catalog.ts too, and adding a regression test in src/agents/model-catalog.test.ts that asserts configured provider models retain audio / video through loadModelCatalog().

clawsweeper · 2026-04-26T02:00:08Z

Codex automated review: keeping this open.

Keep open: current main still rejects "audio" and "video" in models.providers[].models[].input, and the submitted two-file PR is now stale relative to newer narrow typings elsewhere, so this is not a strong auto-close candidate.

Best possible solution:

Continue tracking this item until the missing behavior is implemented or a maintainer decides the product direction.

What I checked:

Current config schema still only allows text/image: ModelDefinitionSchema on the current checkout only accepts "text" and "image" for models.providers[].models[].input, so the reported config limitation is still present on main. (src/config/zod-schema.core.ts:315, b8b270d5b89b)
Current config type is still narrow: ModelDefinitionConfig still types input as Array<"text" | "image">, matching the schema restriction rather than the PR's requested widening. (src/config/types.models.ts:83, b8b270d5b89b)
Adjacent config surface already accepts audio/video: MediaUnderstandingCapabilitiesSchema already accepts "image", "audio", and "video", which makes the PR's mismatch claim coherent rather than obsolete. (src/config/zod-schema.core.ts:720, b8b270d5b89b)
Current main has additional narrow input types beyond the PR diff: ListRowModel still constrains input to Array<"text" | "image">, so the PR's original two-file change no longer demonstrates a complete current-main fix and should be refreshed instead of auto-closed. (src/commands/models/list.model-row.ts:11, b8b270d5b89b)
Discussion pointed toward follow-up work, not obsolescence: Maintainer comments on 2026-03-08 and 2026-03-10 asked for unhappy-path coverage and follow-up updates; that suggests stale-but-salvageable work, not a resolved or incoherent PR.

Remaining risk / open question:

gh could not be used in this environment because GH_TOKEN is unavailable, so the review relied on the provided PR discussion/timeline plus the local checkout.
The local repository history is shallow/grafted, which limited commit-lineage tracing; the keep-open decision is based on current main behavior rather than full merge history.

Codex Review notes: reviewed against 017252e4f8e7.

steipete · 2026-04-26T03:49:18Z

Thanks for the review. I fixed the remaining catalog path: configured provider catalog entries now preserve audio/video inputs, ModelInputType includes them, and LM Studio dynamic discovery narrows back to text/image before constructing pi-ai runtime models.

Verified locally:

pnpm test src/plugin-sdk/provider-catalog-shared.test.ts src/agents/model-catalog.test.ts
pnpm tsgo:core
pnpm tsgo:extensions
pnpm config:schema:check && pnpm config:docs:check
pnpm check:changed

The models.providers[].models[].input zod schema only accepted 'text' and 'image', rejecting 'video' and 'audio' values. This blocked users from declaring Gemini native multimodal capabilities, even though the runtime already supports these media types (MAX_VIDEO_BYTES, MAX_AUDIO_BYTES, mediaKindFromMime). Align the config schema and TypeScript type with the existing MediaUnderstandingCapabilitiesSchema which already accepts all four media types. Fixes openclaw#20721 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Mar 8, 2026

greptile-apps Bot reviewed Mar 8, 2026

View reviewed changes

xkonjin reviewed Mar 8, 2026

View reviewed changes

alvinttang force-pushed the fix/20721-models-input-video-audio branch from 7f2be15 to e245163 Compare March 8, 2026 09:05

openclaw-barnacle Bot added the channel: bluebubbles Channel integration: bluebubbles label Mar 8, 2026

alvinttang force-pushed the fix/20721-models-input-video-audio branch from e245163 to 3d2e325 Compare March 8, 2026 09:06

openclaw-barnacle Bot added size: XS and removed channel: bluebubbles Channel integration: bluebubbles agents Agent runtime and tooling size: S labels Mar 8, 2026

altaywtf self-assigned this Mar 13, 2026

altaywtf requested changes Mar 13, 2026

View reviewed changes

altaywtf added the stale Marked as stale due to inactivity label Mar 23, 2026

altaywtf removed their assignment Mar 23, 2026

openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label Apr 23, 2026

steipete force-pushed the fix/20721-models-input-video-audio branch from 3d2e325 to 70d6616 Compare April 26, 2026 03:49

openclaw-barnacle Bot added agents Agent runtime and tooling extensions: lmstudio size: S and removed size: XS labels Apr 26, 2026

steipete force-pushed the fix/20721-models-input-video-audio branch from ae23c25 to ee15ca1 Compare April 26, 2026 03:58

gambletan and others added 2 commits April 26, 2026 05:10

fix(config): preserve audio video model inputs

67820cb

docs(changelog): credit model input config fix

5ed830a

steipete force-pushed the fix/20721-models-input-video-audio branch from ee15ca1 to 5ed830a Compare April 26, 2026 04:11

steipete merged commit 4428661 into openclaw:main Apr 26, 2026
64 of 67 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(config): accept video and audio in models.input schema#39338

fix(config): accept video and audio in models.input schema#39338
steipete merged 3 commits intoopenclaw:mainfrom
alvinttang:fix/20721-models-input-video-audio

alvinttang commented Mar 8, 2026

Uh oh!

greptile-apps Bot commented Mar 8, 2026

Uh oh!

greptile-apps Bot Mar 8, 2026

Uh oh!

greptile-apps Bot Mar 8, 2026

Uh oh!

xkonjin left a comment

Uh oh!

gambletan commented Mar 8, 2026

Uh oh!

gambletan commented Mar 10, 2026

Uh oh!

gambletan commented Mar 10, 2026

Uh oh!

altaywtf left a comment

Uh oh!

clawsweeper Bot commented Apr 26, 2026

Uh oh!

steipete commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

alvinttang commented Mar 8, 2026

Summary

Test plan

Uh oh!

greptile-apps Bot commented Mar 8, 2026

Greptile Summary

Confidence Score: 4/5

Uh oh!

greptile-apps Bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

xkonjin left a comment

Choose a reason for hiding this comment

Uh oh!

gambletan commented Mar 8, 2026

Uh oh!

gambletan commented Mar 10, 2026

Uh oh!

gambletan commented Mar 10, 2026

Uh oh!

altaywtf left a comment

Choose a reason for hiding this comment

Uh oh!

clawsweeper Bot commented Apr 26, 2026

Uh oh!

steipete commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants