Summary
Add comprehensive aprender::audio module for audio I/O and signal processing in pure Rust. This is the foundation for ASR (whisper.apr), TTS, voice cloning, and other audio ML applications.
Module Structure
aprender/src/audio/
├── mod.rs
├── capture.rs # Microphone input (ALSA/CoreAudio/WASAPI/WebAudio)
├── playback.rs # Speaker output
├── stream.rs # Chunked streaming primitives
├── codec.rs # Decode: wav, mp3, aac, flac, opus, ogg, mp4, webm, mkv
├── resample.rs # High-quality sample rate conversion
├── format.rs # Container parsing
└── mel.rs # Mel spectrogram (MOVED from whisper.apr)
Blocking
This module blocks:
whisper-apr stream command
whisper-apr record command
whisper-apr command command
- Future TTS/voice cloning features
API Surface
capture.rs
pub struct AudioDevice { pub id: String, pub name: String, pub sample_rates: Vec<u32>, pub channels: u8 }
pub struct CaptureConfig { pub sample_rate: u32, pub channels: u8, pub buffer_size_ms: u32 }
pub fn list_devices() -> Result<Vec<AudioDevice>, AudioError>;
pub fn open(device: Option<&str>, config: CaptureConfig) -> Result<AudioCapture, AudioError>;
pub struct AudioCapture { /* read(), close() */ }
codec.rs
pub enum AudioFormat { Wav, Mp3, Aac, Flac, Opus, Ogg, M4a }
pub fn decode(data: &[u8], format: AudioFormat) -> Result<DecodedAudio, CodecError>;
pub fn decode_container(data: &[u8]) -> Result<DecodedAudio, CodecError>; // mp4, webm, mkv
pub struct DecodedAudio { pub samples: Vec<f32>, pub sample_rate: u32, pub channels: u8, pub duration_ms: u64 }
mel.rs (move from whisper.apr)
pub struct MelConfig { pub sample_rate: u32, pub n_fft: usize, pub hop_length: usize, pub n_mels: usize, pub fmin: f32, pub fmax: f32 }
pub fn mel_spectrogram(samples: &[f32], config: &MelConfig) -> Vec<Vec<f32>>;
pub struct MelFilterbank { /* new(), apply() */ }
Platform Support
| Platform |
Capture |
Playback |
| Linux |
ALSA |
ALSA |
| macOS |
CoreAudio |
CoreAudio |
| Windows |
WASAPI |
WASAPI |
| WASM |
Web Audio API |
Web Audio API |
Codec Support (Pure Rust)
| Format |
Decode |
Encode |
Implementation |
| WAV |
✅ |
✅ |
Native |
| MP3 |
✅ |
❌ |
symphonia |
| AAC |
✅ |
❌ |
symphonia |
| FLAC |
✅ |
✅ |
symphonia |
| Opus |
✅ |
✅ |
Pure Rust |
| OGG |
✅ |
❌ |
lewton |
| MP4/WebM/MKV |
✅ |
❌ |
Container extraction |
Implementation Priority
- High:
mel.rs (move from whisper.apr) - blocks transcription
- High:
capture.rs - blocks streaming commands
- Medium:
codec.rs - blocks mp3/mp4 input support
- Medium:
resample.rs - needed for non-16kHz audio
- Low:
playback.rs - needed for TTS output
References
Labels
enhancement, audio, whisper-apr-dependency
Summary
Add comprehensive
aprender::audiomodule for audio I/O and signal processing in pure Rust. This is the foundation for ASR (whisper.apr), TTS, voice cloning, and other audio ML applications.Module Structure
Blocking
This module blocks:
whisper-apr streamcommandwhisper-apr recordcommandwhisper-apr commandcommandAPI Surface
capture.rs
codec.rs
mel.rs (move from whisper.apr)
Platform Support
Codec Support (Pure Rust)
Implementation Priority
mel.rs(move from whisper.apr) - blocks transcriptioncapture.rs- blocks streaming commandscodec.rs- blocks mp3/mp4 input supportresample.rs- needed for non-16kHz audioplayback.rs- needed for TTS outputReferences
Labels
enhancement,audio,whisper-apr-dependency