Skip to content

feat(audio): Comprehensive audio module - capture, playback, codec, mel, resample #131

@noahgift

Description

@noahgift

Summary

Add comprehensive aprender::audio module for audio I/O and signal processing in pure Rust. This is the foundation for ASR (whisper.apr), TTS, voice cloning, and other audio ML applications.

Module Structure

aprender/src/audio/
├── mod.rs
├── capture.rs      # Microphone input (ALSA/CoreAudio/WASAPI/WebAudio)
├── playback.rs     # Speaker output
├── stream.rs       # Chunked streaming primitives
├── codec.rs        # Decode: wav, mp3, aac, flac, opus, ogg, mp4, webm, mkv
├── resample.rs     # High-quality sample rate conversion
├── format.rs       # Container parsing
└── mel.rs          # Mel spectrogram (MOVED from whisper.apr)

Blocking

This module blocks:

  • whisper-apr stream command
  • whisper-apr record command
  • whisper-apr command command
  • Future TTS/voice cloning features

API Surface

capture.rs

pub struct AudioDevice { pub id: String, pub name: String, pub sample_rates: Vec<u32>, pub channels: u8 }
pub struct CaptureConfig { pub sample_rate: u32, pub channels: u8, pub buffer_size_ms: u32 }
pub fn list_devices() -> Result<Vec<AudioDevice>, AudioError>;
pub fn open(device: Option<&str>, config: CaptureConfig) -> Result<AudioCapture, AudioError>;
pub struct AudioCapture { /* read(), close() */ }

codec.rs

pub enum AudioFormat { Wav, Mp3, Aac, Flac, Opus, Ogg, M4a }
pub fn decode(data: &[u8], format: AudioFormat) -> Result<DecodedAudio, CodecError>;
pub fn decode_container(data: &[u8]) -> Result<DecodedAudio, CodecError>; // mp4, webm, mkv
pub struct DecodedAudio { pub samples: Vec<f32>, pub sample_rate: u32, pub channels: u8, pub duration_ms: u64 }

mel.rs (move from whisper.apr)

pub struct MelConfig { pub sample_rate: u32, pub n_fft: usize, pub hop_length: usize, pub n_mels: usize, pub fmin: f32, pub fmax: f32 }
pub fn mel_spectrogram(samples: &[f32], config: &MelConfig) -> Vec<Vec<f32>>;
pub struct MelFilterbank { /* new(), apply() */ }

Platform Support

Platform Capture Playback
Linux ALSA ALSA
macOS CoreAudio CoreAudio
Windows WASAPI WASAPI
WASM Web Audio API Web Audio API

Codec Support (Pure Rust)

Format Decode Encode Implementation
WAV Native
MP3 symphonia
AAC symphonia
FLAC symphonia
Opus Pure Rust
OGG lewton
MP4/WebM/MKV Container extraction

Implementation Priority

  1. High: mel.rs (move from whisper.apr) - blocks transcription
  2. High: capture.rs - blocks streaming commands
  3. Medium: codec.rs - blocks mp3/mp4 input support
  4. Medium: resample.rs - needed for non-16kHz audio
  5. Low: playback.rs - needed for TTS output

References

Labels

enhancement, audio, whisper-apr-dependency

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions