fix: add "audio" to openai provider capabilities by openjay · Pull Request #12717 · openclaw/openclaw

openjay · 2026-02-09T14:51:48Z

Summary

The openai media-understanding provider implements transcribeAudio via transcribeOpenAiCompatibleAudio (Whisper API), but its capabilities array only declared ["image"].

This caused the media-understanding runner to skip the openai provider when processing inbound audio messages (e.g., voice messages on Discord/WhatsApp), resulting in raw audio files being passed directly to agents instead of transcribed text.

Fix

Add "audio" to the openai provider's capabilities array so the runner correctly selects the openai provider for audio transcription when configured with tools.media.audio.

Test

Before fix:

[tools] image failed: Unsupported media type: audio

Agent received raw .ogg file path instead of transcribed text.

After fix:
Audio messages are transcribed via Whisper API before reaching the agent.

Greptile Overview

Greptile Summary

This PR updates the OpenAI media-understanding provider (src/media-understanding/providers/openai/index.ts) to declare support for audio by adding "audio" to its capabilities list. This aligns the provider’s declared capabilities with its existing transcribeAudio implementation, allowing the media-understanding runner to select the OpenAI provider for inbound audio messages when tools.media.audio is enabled, so audio is transcribed before being passed to agents.

Confidence Score: 5/5

This PR is safe to merge with minimal risk.
Single-line change that makes declared capabilities match already-exported functionality (transcribeAudio). No behavioral change beyond provider selection logic for audio, and it should unblock intended transcription flow.
No files require special attention

_{(2/5) Greptile learns from your feedback when you react with thumbs up/down!}

The openai provider implements transcribeAudio via transcribeOpenAiCompatibleAudio (Whisper API), but its capabilities array only declared ["image"]. This caused the media-understanding runner to skip the openai provider when processing inbound audio messages, resulting in raw audio files being passed to agents instead of transcribed text. Fix: Add "audio" to the capabilities array so the runner correctly selects the openai provider for audio transcription. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

This PR fixes media-provider selection for inbound audio by correctly declaring the OpenAI media-understanding provider’s support for audio transcription, ensuring audio attachments are routed to Whisper transcription instead of being passed through as raw files.

Changes:

Add "audio" to the OpenAI media-understanding provider’s capabilities so the runner can select it for audio inputs.

openclaw-barnacle · 2026-02-21T04:18:17Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

…enjay

…enjay)

steipete · 2026-03-02T21:34:18Z

Landed via temp rebase onto main.

Gate: -
RUN v4.0.18 /Users/steipete/Projects/clawdbot3

✓ src/media-understanding/providers/index.test.ts (3 tests) 2ms

Test Files 1 passed (1)
Tests 3 passed (3)
Start at 21:34:12
Duration 1.78s (transform 950ms, setup 110ms, import 1.59s, tests 2ms, environment 0ms) (pass) - \

openclaw@2026.3.2 check /Users/steipete/Projects/clawdbot3
pnpm format:check && pnpm tsgo && pnpm lint && pnpm lint:tmp:no-random-messaging && pnpm lint:tmp:channel-agnostic-boundaries && pnpm lint:tmp:no-raw-channel-fetch && pnpm lint:plugins:no-register-http-handler && pnpm lint:webhook:no-low-level-body-read && pnpm lint:auth:no-pairing-store-group && pnpm lint:auth:pairing-account-scope && pnpm check:host-env-policy:swift

openclaw@2026.3.2 format:check /Users/steipete/Projects/clawdbot3
oxfmt --check

Checking formatting...

ELIFECYCLE Command failed with exit code 2.
ELIFECYCLE Command failed with exit code 2. (blocked by existing syntax regression in \ on main from #28575)

Land commit: 76d6514
Merge commit: 76d6514

Thanks @openjay!

Copilot AI review requested due to automatic review settings February 9, 2026 14:51

Copilot started reviewing on behalf of openjay February 9, 2026 14:52 View session

Copilot AI reviewed Feb 9, 2026

View reviewed changes

sebslight mentioned this pull request Feb 13, 2026

fix: add audio capability to OpenAI provider #3586

Closed

thewilloftheshadow force-pushed the main branch from bfc1ccb to f92900f Compare February 15, 2026 18:46

openclaw-barnacle bot added stale Marked as stale due to inactivity and removed stale Marked as stale due to inactivity labels Feb 21, 2026

Merge branch 'main' into fix/openai-audio-capability

05b0d61

openclaw-barnacle bot added the size: XS label Feb 25, 2026

steipete added a commit that referenced this pull request Mar 2, 2026

fix: verify openai media capability registration (#12717) (thanks @op…

c130d12

…enjay)

steipete merged commit 76d6514 into openclaw:main Mar 2, 2026
24 of 26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: add "audio" to openai provider capabilities#12717

fix: add "audio" to openai provider capabilities#12717
steipete merged 2 commits intoopenclaw:mainfrom
openjay:fix/openai-audio-capability

openjay commented Feb 9, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

openclaw-barnacle bot commented Feb 21, 2026

Uh oh!

Uh oh!

steipete commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

openjay commented Feb 9, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Test

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

openclaw-barnacle bot commented Feb 21, 2026

Uh oh!

Uh oh!

steipete commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

openjay commented Feb 9, 2026 •

edited by greptile-apps bot

Loading