Skip to content

Add runtime.stt.transcribeAudioFile for plugin STT access#22402

Merged
steipete merged 1 commit intoopenclaw:mainfrom
benthecarman:add-runtime-stt
Mar 2, 2026
Merged

Add runtime.stt.transcribeAudioFile for plugin STT access#22402
steipete merged 1 commit intoopenclaw:mainfrom
benthecarman:add-runtime-stt

Conversation

@benthecarman
Copy link
Contributor

@benthecarman benthecarman commented Feb 21, 2026

Summary

  • Add runtime.stt.transcribeAudioFile() to PluginRuntime so external plugins can use openclaw's media-understanding provider framework for speech-to-text
  • New src/media-understanding/transcribe-audio.ts wraps runCapability({capability: "audio"}) — same pattern as the Discord VC implementation in Discord: VC Support #18774
  • Reads provider/model/apiKey from tools.media.audio in the config, with automatic provider fallback

Motivation

The marmot plugin needs to transcribe call audio chunks but can't import internal media-understanding modules (ERR_PACKAGE_PATH_NOT_EXPORTED). This mirrors how runtime.tts.textToSpeechTelephony already exposes TTS to plugins.

Usage (from a plugin)

const result = await runtime.stt.transcribeAudioFile({
  filePath: "/tmp/audio-chunk.wav",
  cfg: runtime.config.loadConfig(),
});
if (result.text) {
  // dispatch transcript to agent
}

Test plan

  • TypeScript compiles
  • Existing media-understanding tests still pass
  • Marmot plugin can call runtime.stt.transcribeAudioFile() after openclaw is rebuilt

🤖 Generated with Claude Code

Greptile Summary

Adds runtime.stt.transcribeAudioFile() to expose speech-to-text functionality to external plugins. The implementation follows the same pattern as the Discord voice manager's transcribeAudio() function, wrapping runCapability({capability: "audio"}) from the media-understanding framework.

Key changes:

  • New src/media-understanding/transcribe-audio.ts provides a standalone wrapper function
  • Function exported via PluginRuntime.stt.transcribeAudioFile
  • Uses same provider/model/apiKey resolution from tools.media.audio config
  • Properly handles cleanup via cache.cleanup() in finally block

The implementation is clean and matches established patterns in the codebase.

Confidence Score: 5/5

  • Safe to merge - straightforward implementation following existing patterns
  • The implementation directly mirrors the proven Discord voice manager pattern, properly handles resource cleanup, and uses the existing media-understanding provider framework without introducing new dependencies or risks. The only minor suggestion is around MIME type flexibility.
  • No files require special attention

Last reviewed commit: 70009ce

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@openclaw-barnacle openclaw-barnacle bot added the channel: bluebubbles Channel integration: bluebubbles label Feb 21, 2026
@benthecarman
Copy link
Contributor Author

CI check failure is pre-existing on main — the memory/manager.async-search.test.ts and memory/qmd-manager.test.ts type errors are not from this PR. All recent main CI runs show the same failures.

@openclaw-barnacle
Copy link

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle bot added the stale Marked as stale due to inactivity label Feb 28, 2026
Expose audio transcription through the PluginRuntime so external
plugins (e.g. marmot) can use openclaw's media-understanding provider
framework without importing unexported internal modules.

The new transcribeAudioFile() wraps runCapability({capability: "audio"})
and reads provider/model/apiKey from tools.media.audio in the config,
matching the pattern used by the Discord VC implementation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openclaw-barnacle openclaw-barnacle bot removed the stale Marked as stale due to inactivity label Mar 1, 2026
@steipete steipete merged commit faa4ffe into openclaw:main Mar 2, 2026
25 of 26 checks passed
@steipete
Copy link
Contributor

steipete commented Mar 2, 2026

Landed via temp rebase onto main.

  • Gate: pnpm vitest src/media-understanding/transcribe-audio.test.ts src/plugins/runtime/index.test.ts extensions/bluebubbles/src/monitor.test.ts
  • Land commit: 98e4cb342e92f3ddbcb6e3d2663ea53a179e2208
  • Merge commit: faa4ffe

Thanks @benthecarman!

@benthecarman benthecarman deleted the add-runtime-stt branch March 3, 2026 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: bluebubbles Channel integration: bluebubbles size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants