Skip to content

feat(channels): require STT/TTS auth credentials for voice-only channels #471

@alexey-pelykh

Description

@alexey-pelykh

Problem

Voice-only channels (e.g., `voice-call` extension) are unusable without STT (inbound voice → text) and TTS (outbound text → voice). Today:

  • No `voiceOnly` capability field exists on `ChannelCapabilities`
  • No validation during onboarding or channel setup that TTS/STT credentials are configured
  • Errors only surface lazily at runtime when an agent tries to use the channel
  • STT config is plugin-specific (voice-call only), not global
  • TTS has a free fallback (Edge) but STT has none — voice channels silently break without STT credentials

After #424 (STT as middleware) and #402/#403 (TTS auth unification), both STT and TTS credentials live in the global auth profile store. Voice-only channels should validate that the required auth profiles exist.

Scope

1. Add `voiceOnly` to `ChannelCapabilities`

`src/channels/plugins/types.core.ts`:
```typescript
export type ChannelCapabilities = {
chatTypes: Array<ChatType | "thread">;
voiceOnly?: boolean; // channel only supports voice I/O
// ... existing fields
};
```

`voice-call` extension sets `voiceOnly: true` in its capabilities.

2. Validation at channel enable time

When a channel with `voiceOnly: true` is enabled (onboarding or `remoteclaw channels add`):

  • Check that a TTS provider is configured with valid auth credentials
  • Check that an STT provider is configured with valid auth credentials
  • If either is missing: warn the user, explain what's needed, optionally block enabling

3. Runtime validation

Before accepting messages for a voice-only channel:

  • Verify TTS is available (can resolve API key for configured TTS provider)
  • Verify STT is available (can resolve API key for configured STT provider)
  • If unavailable: return actionable error to the channel ("Voice channel requires TTS/STT configuration")

4. Health check integration

`remoteclaw doctor` should flag:

  • Voice-only channels enabled without TTS credentials
  • Voice-only channels enabled without STT credentials

Design notes

  • Edge TTS is free (no API key) — counts as valid TTS for this validation
  • STT has no free fallback — always requires credentials
  • Validation checks auth profiles via `resolveApiKeyForProvider` from `src/auth/`
  • This does NOT add TTS/STT prompting to onboarding (that's a separate concern per feat: unified auth + multimodal AgentRuntime — implementation plan #415 Phase 5/6 notes)

Depends on

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions