Skip to content

refactor(media): extract STT from media-understanding into src/stt/ #424

@alexey-pelykh

Description

@alexey-pelykh

Context

Part of #415 Phase 3, item 9. Speech-to-text is a middleware concern (like TTS) — it should work identically regardless of which CLI runtime is configured. Currently it's embedded in the media-understanding subsystem which is being decomposed.

Rationale

  • STT is runtime-agnostic: Voice messages need transcription before ANY runtime can process them. Even Gemini (which supports audio natively) benefits from middleware STT for consistent behavior.
  • Image/video understanding is runtime-dependent: Some runtimes handle images natively (Gemini, Claude), others need middleware fallback. This is NOT the same concern as STT.
  • TTS precedent: TTS already exists as a standalone src/tts/ module. STT should mirror this structure.

Scope

  1. Create src/stt/ module:

  2. Wire into ChannelBridge media routing (feat(middleware): propagate multimodal media through ChannelBridge and auto-reply #387):

    • When audio media arrives and runtime doesn't accept audio natively → run STT → prepend transcript to prompt
    • When runtime accepts audio natively → pass through (skip STT)
  3. Preserve existing behavior:

    • Voice messages should continue working for all runtimes
    • Transcription quality and provider selection unchanged

Files to extract from

  • src/media-understanding/runner.ts — audio handling paths
  • src/media-understanding/runner.entries.ts — audio entry processing
  • src/media-understanding/providers/ — audio provider implementations

Tests

  • STT produces transcript from audio file (unit, mocked provider)
  • STT provider selection follows config
  • STT credentials resolved from auth profiles
  • Integration: voice message → STT → text prompt → runtime

Depends on

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions