Skip to content

feat(plugin-sdk): allow extensions to register custom TTS providers #498

@alexey-pelykh

Description

@alexey-pelykh

Problem

The TTS system (src/tts/tts.ts) uses a hardcoded switch-case to dispatch to 3 built-in providers (OpenAI, ElevenLabs, Edge). There is no TtsProvider interface — each provider is implemented inline in a monolithic function. Extensions cannot register custom TTS providers.

The voice-call extension implements its own OpenAITTSProvider for telephony-optimized output (PCM/mu-law) but bypasses the core TTS system entirely. The TtsProvider type is a closed union: "elevenlabs" | "openai" | "edge".

Current state

No TTS provider interface exists. Unlike STT (which has a clean SttProvider interface), TTS is a monolithic function with hardcoded provider logic:

// src/tts/tts.ts — textToSpeech()
if (provider === "openai") { /* ... */ }
else if (provider === "elevenlabs") { /* ... */ }
else if (provider === "edge") { /* ... */ }

Config schema is closed:

export type TtsProvider = "elevenlabs" | "openai" | "edge";
export const TtsProviderSchema = z.enum(["elevenlabs", "openai", "edge"]);

The plugin SDK re-exports TtsProviderSchema but it's static.

Proposed changes

1. Define a TtsProvider interface

export type TtsProviderImpl = {
  id: string;
  synthesize: (req: TtsSynthesisRequest) => Promise<TtsSynthesisResult>;
  readonly requiresApiKey: boolean;
};

export type TtsSynthesisRequest = {
  text: string;
  voice?: string;
  model?: string;
  apiKey?: string;
  outputFormat: "mp3" | "opus" | "pcm";
  speed?: number;
  // provider-specific options via extras
  extras?: Record<string, unknown>;
};

export type TtsSynthesisResult = {
  audioBuffer: Buffer;
  format: string;
  sampleRate?: number;
};

2. Refactor built-in providers to implement the interface

Extract OpenAI, ElevenLabs, Edge implementations from the monolithic textToSpeech() into separate provider modules implementing TtsProviderImpl. This is a prerequisite for plugin registration.

3. Create TTS provider registry

Similar to STT's buildSttProviderRegistry:

buildTtsProviderRegistry(
  builtIn: TtsProviderImpl[],
  pluginProviders?: TtsProviderImpl[]
): Map<string, TtsProviderImpl>

4. Export types from plugin SDK

src/plugin-sdk/index.ts should export TtsProviderImpl, TtsSynthesisRequest, TtsSynthesisResult.

5. Add tts field to plugin manifest

{
  "id": "my-extension",
  "tts": ["my-custom-tts"]
}

6. Make TtsProviderSchema dynamic

Allow registered provider IDs in config validation, not just the hardcoded 3.

Files to change

File Change
src/tts/tts.ts Refactor to use provider registry instead of switch-case
src/tts/providers/ New: extracted provider implementations (openai, elevenlabs, edge)
src/tts/types.ts New: TtsProviderImpl, TtsSynthesisRequest, TtsSynthesisResult
src/plugin-sdk/index.ts Export TTS provider types
src/plugins/manifest-registry.ts Parse tts field from manifests
src/plugins/types.ts Add PluginTtsProviderRegistration type
src/plugins/registry.ts Add TTS provider list
src/config/zod-schema.core.ts Make TtsProviderSchema dynamic
src/config/types.tts.ts Open TtsProvider type to string

Design notes

  • More work than STT because TTS lacks a provider interface entirely — refactoring the monolithic function is a prerequisite
  • Telephony output: the interface should support PCM output natively so voice-call can use the registry instead of bypassing it
  • TTS directives: [[tts:provider=X]] inline overrides should work with plugin-registered providers
  • Fallback chain: plugin providers participate alongside built-in (Edge remains always-available fallback)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions