Skip to content

fix(parakeet-cpp): convert audio before the non-batched transcribe path#10161

Merged
mudler merged 1 commit into
masterfrom
worktree-fix+parakeet-cpp-audio-convert
Jun 3, 2026
Merged

fix(parakeet-cpp): convert audio before the non-batched transcribe path#10161
mudler merged 1 commit into
masterfrom
worktree-fix+parakeet-cpp-audio-convert

Conversation

@localai-bot

Copy link
Copy Markdown
Collaborator

Problem

Transcribing a non-WAV file (e.g. MP3) through the parakeet-cpp backend fails:

transcription failed: ... parakeet-cpp: transcribe_path_json failed:
parakeet: failed to load audio: /staging/ephemeral/inputs/.../R20260603-175425.MP3

Root cause

AudioTranscription in backend/go/parakeet-cpp/goparakeetcpp.go has two paths:

  • Batched (p.bat != nil): decodes via decodeWavMono16kutils.AudioToWav (ffmpeg), so any format works.
  • Direct / fallback (p.bat == nil, used when the batched C-API symbol isn't present in libparakeet.so): handed the original upload path straight to parakeet_capi_transcribe_path_json. That C loader only understands 16 kHz mono WAV/PCM, so MP3 (and anything non-WAV) fails with failed to load audio.

LocalAI does no conversion at the HTTP/service layer — it delegates format handling to each backend. Every other audio backend converts unconditionally with utils.AudioToWav before calling its engine (whisper gowhisper.go:189/299, crispasr gocrispasr.go:186/291). The parakeet-cpp direct path was the lone exception.

Fix

  • Extract convertToWavMono16k(path)(wavPath, cleanup, err) that produces a 16 kHz mono WAV in a temp dir. WAV inputs already in the target format are passed through without ffmpeg.
  • Run the non-batched path through it and pass the converted path to CppTranscribePathJSON.
  • Refactor the existing decodeWavMono16k to reuse the helper (no duplication).

Tests

Added specs that need neither the model, the C library, nor ffmpeg (they use an already-target-format WAV that AudioToWav passes through):

  • convertToWavMono16k returns a decodable temp-WAV copy (not the original path) and cleanup() removes it.
  • It errors on a missing input rather than silently passing the path through.
SUCCESS! -- 6 Passed | 0 Failed | 3 Skipped

golangci-lint run ./backend/go/parakeet-cpp/0 issues.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

🤖 Generated with Claude Code

The direct (non-batched) transcription path handed the original upload
path straight to the C library via parakeet_capi_transcribe_path_json.
That loader only understands 16 kHz mono WAV/PCM, so any other format
(MP3, etc.) failed with "parakeet: failed to load audio: <file>".

Only the batched path converted the input (via decodeWavMono16k ->
utils.AudioToWav). Every other audio backend (whisper, crispasr)
converts unconditionally with utils.AudioToWav before handing the file
to its engine; the parakeet-cpp fallback was the lone exception.

Extract a convertToWavMono16k helper (reused by decodeWavMono16k) that
produces a 16 kHz mono WAV in a temp dir, and run the non-batched path
through it before calling the C loader. WAV inputs already in the target
format are passed through without ffmpeg.

Add specs covering the helper (decodable copy + cleanup, and an error on
a missing input) that need neither the model, the C library, nor ffmpeg.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
@mudler mudler merged commit 9d10418 into master Jun 3, 2026
66 checks passed
@mudler mudler deleted the worktree-fix+parakeet-cpp-audio-convert branch June 3, 2026 13:07
@localai-bot localai-bot added the bug Something isn't working label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants