Problem
The TTS system (src/tts/tts.ts) uses a hardcoded switch-case to dispatch to 3 built-in providers (OpenAI, ElevenLabs, Edge). There is no TtsProvider interface — each provider is implemented inline in a monolithic function. Extensions cannot register custom TTS providers.
The voice-call extension implements its own OpenAITTSProvider for telephony-optimized output (PCM/mu-law) but bypasses the core TTS system entirely. The TtsProvider type is a closed union: "elevenlabs" | "openai" | "edge".
Current state
No TTS provider interface exists. Unlike STT (which has a clean SttProvider interface), TTS is a monolithic function with hardcoded provider logic:
// src/tts/tts.ts — textToSpeech()
if (provider === "openai") { /* ... */ }
else if (provider === "elevenlabs") { /* ... */ }
else if (provider === "edge") { /* ... */ }
Config schema is closed:
export type TtsProvider = "elevenlabs" | "openai" | "edge";
export const TtsProviderSchema = z.enum(["elevenlabs", "openai", "edge"]);
The plugin SDK re-exports TtsProviderSchema but it's static.
Proposed changes
1. Define a TtsProvider interface
export type TtsProviderImpl = {
id: string;
synthesize: (req: TtsSynthesisRequest) => Promise<TtsSynthesisResult>;
readonly requiresApiKey: boolean;
};
export type TtsSynthesisRequest = {
text: string;
voice?: string;
model?: string;
apiKey?: string;
outputFormat: "mp3" | "opus" | "pcm";
speed?: number;
// provider-specific options via extras
extras?: Record<string, unknown>;
};
export type TtsSynthesisResult = {
audioBuffer: Buffer;
format: string;
sampleRate?: number;
};
2. Refactor built-in providers to implement the interface
Extract OpenAI, ElevenLabs, Edge implementations from the monolithic textToSpeech() into separate provider modules implementing TtsProviderImpl. This is a prerequisite for plugin registration.
3. Create TTS provider registry
Similar to STT's buildSttProviderRegistry:
buildTtsProviderRegistry(
builtIn: TtsProviderImpl[],
pluginProviders?: TtsProviderImpl[]
): Map<string, TtsProviderImpl>
4. Export types from plugin SDK
src/plugin-sdk/index.ts should export TtsProviderImpl, TtsSynthesisRequest, TtsSynthesisResult.
5. Add tts field to plugin manifest
{
"id": "my-extension",
"tts": ["my-custom-tts"]
}
6. Make TtsProviderSchema dynamic
Allow registered provider IDs in config validation, not just the hardcoded 3.
Files to change
| File |
Change |
src/tts/tts.ts |
Refactor to use provider registry instead of switch-case |
src/tts/providers/ |
New: extracted provider implementations (openai, elevenlabs, edge) |
src/tts/types.ts |
New: TtsProviderImpl, TtsSynthesisRequest, TtsSynthesisResult |
src/plugin-sdk/index.ts |
Export TTS provider types |
src/plugins/manifest-registry.ts |
Parse tts field from manifests |
src/plugins/types.ts |
Add PluginTtsProviderRegistration type |
src/plugins/registry.ts |
Add TTS provider list |
src/config/zod-schema.core.ts |
Make TtsProviderSchema dynamic |
src/config/types.tts.ts |
Open TtsProvider type to string |
Design notes
- More work than STT because TTS lacks a provider interface entirely — refactoring the monolithic function is a prerequisite
- Telephony output: the interface should support PCM output natively so voice-call can use the registry instead of bypassing it
- TTS directives:
[[tts:provider=X]] inline overrides should work with plugin-registered providers
- Fallback chain: plugin providers participate alongside built-in (Edge remains always-available fallback)
Related
Problem
The TTS system (
src/tts/tts.ts) uses a hardcoded switch-case to dispatch to 3 built-in providers (OpenAI, ElevenLabs, Edge). There is noTtsProviderinterface — each provider is implemented inline in a monolithic function. Extensions cannot register custom TTS providers.The
voice-callextension implements its ownOpenAITTSProviderfor telephony-optimized output (PCM/mu-law) but bypasses the core TTS system entirely. TheTtsProvidertype is a closed union:"elevenlabs" | "openai" | "edge".Current state
No TTS provider interface exists. Unlike STT (which has a clean
SttProviderinterface), TTS is a monolithic function with hardcoded provider logic:Config schema is closed:
The plugin SDK re-exports
TtsProviderSchemabut it's static.Proposed changes
1. Define a
TtsProviderinterface2. Refactor built-in providers to implement the interface
Extract OpenAI, ElevenLabs, Edge implementations from the monolithic
textToSpeech()into separate provider modules implementingTtsProviderImpl. This is a prerequisite for plugin registration.3. Create TTS provider registry
Similar to STT's
buildSttProviderRegistry:4. Export types from plugin SDK
src/plugin-sdk/index.tsshould exportTtsProviderImpl,TtsSynthesisRequest,TtsSynthesisResult.5. Add
ttsfield to plugin manifest{ "id": "my-extension", "tts": ["my-custom-tts"] }6. Make
TtsProviderSchemadynamicAllow registered provider IDs in config validation, not just the hardcoded 3.
Files to change
src/tts/tts.tssrc/tts/providers/src/tts/types.tsTtsProviderImpl,TtsSynthesisRequest,TtsSynthesisResultsrc/plugin-sdk/index.tssrc/plugins/manifest-registry.tsttsfield from manifestssrc/plugins/types.tsPluginTtsProviderRegistrationtypesrc/plugins/registry.tssrc/config/zod-schema.core.tsTtsProviderSchemadynamicsrc/config/types.tts.tsTtsProvidertype tostringDesign notes
[[tts:provider=X]]inline overrides should work with plugin-registered providersRelated
resolveApiKeyForProviderpattern)