Skip to content

fix: add "audio" to openai provider capabilities#12717

Merged
steipete merged 2 commits intoopenclaw:mainfrom
openjay:fix/openai-audio-capability
Mar 2, 2026
Merged

fix: add "audio" to openai provider capabilities#12717
steipete merged 2 commits intoopenclaw:mainfrom
openjay:fix/openai-audio-capability

Conversation

@openjay
Copy link
Contributor

@openjay openjay commented Feb 9, 2026

Summary

The openai media-understanding provider implements transcribeAudio via transcribeOpenAiCompatibleAudio (Whisper API), but its capabilities array only declared ["image"].

This caused the media-understanding runner to skip the openai provider when processing inbound audio messages (e.g., voice messages on Discord/WhatsApp), resulting in raw audio files being passed directly to agents instead of transcribed text.

Fix

Add "audio" to the openai provider's capabilities array so the runner correctly selects the openai provider for audio transcription when configured with tools.media.audio.

Test

Before fix:

[tools] image failed: Unsupported media type: audio

Agent received raw .ogg file path instead of transcribed text.

After fix:
Audio messages are transcribed via Whisper API before reaching the agent.

Greptile Overview

Greptile Summary

This PR updates the OpenAI media-understanding provider (src/media-understanding/providers/openai/index.ts) to declare support for audio by adding "audio" to its capabilities list. This aligns the provider’s declared capabilities with its existing transcribeAudio implementation, allowing the media-understanding runner to select the OpenAI provider for inbound audio messages when tools.media.audio is enabled, so audio is transcribed before being passed to agents.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk.
  • Single-line change that makes declared capabilities match already-exported functionality (transcribeAudio). No behavioral change beyond provider selection logic for audio, and it should unblock intended transcription flow.
  • No files require special attention

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

The openai provider implements transcribeAudio via
transcribeOpenAiCompatibleAudio (Whisper API), but its capabilities
array only declared ["image"]. This caused the media-understanding
runner to skip the openai provider when processing inbound audio
messages, resulting in raw audio files being passed to agents
instead of transcribed text.

Fix: Add "audio" to the capabilities array so the runner correctly
selects the openai provider for audio transcription.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copilot AI review requested due to automatic review settings February 9, 2026 14:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes media-provider selection for inbound audio by correctly declaring the OpenAI media-understanding provider’s support for audio transcription, ensuring audio attachments are routed to Whisper transcription instead of being passed through as raw files.

Changes:

  • Add "audio" to the OpenAI media-understanding provider’s capabilities so the runner can select it for audio inputs.

@openclaw-barnacle
Copy link

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle bot added stale Marked as stale due to inactivity and removed stale Marked as stale due to inactivity labels Feb 21, 2026
@steipete steipete merged commit 76d6514 into openclaw:main Mar 2, 2026
24 of 26 checks passed
@steipete
Copy link
Contributor

steipete commented Mar 2, 2026

Landed via temp rebase onto main.

  • Gate: -
    RUN v4.0.18 /Users/steipete/Projects/clawdbot3

✓ src/media-understanding/providers/index.test.ts (3 tests) 2ms

Test Files 1 passed (1)
Tests 3 passed (3)
Start at 21:34:12
Duration 1.78s (transform 950ms, setup 110ms, import 1.59s, tests 2ms, environment 0ms) (pass) - \

openclaw@2026.3.2 check /Users/steipete/Projects/clawdbot3
pnpm format:check && pnpm tsgo && pnpm lint && pnpm lint:tmp:no-random-messaging && pnpm lint:tmp:channel-agnostic-boundaries && pnpm lint:tmp:no-raw-channel-fetch && pnpm lint:plugins:no-register-http-handler && pnpm lint:webhook:no-low-level-body-read && pnpm lint:auth:no-pairing-store-group && pnpm lint:auth:pairing-account-scope && pnpm check:host-env-policy:swift

openclaw@2026.3.2 format:check /Users/steipete/Projects/clawdbot3
oxfmt --check

Checking formatting...

 ELIFECYCLE  Command failed with exit code 2.
 ELIFECYCLE  Command failed with exit code 2. (blocked by existing syntax regression in \ on main from #28575)

Thanks @openjay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants