Skip to content

feat(parakeet-cpp): nemotron-3.5-asr multilingual streaming model + request language support#10199

Merged
mudler merged 2 commits into
masterfrom
feat/parakeet-nemotron-multilingual
Jun 6, 2026
Merged

feat(parakeet-cpp): nemotron-3.5-asr multilingual streaming model + request language support#10199
mudler merged 2 commits into
masterfrom
feat/parakeet-nemotron-multilingual

Conversation

@localai-bot

Copy link
Copy Markdown
Collaborator

Summary

Adds NVIDIA's multilingual, prompt-conditioned streaming ASR model
nemotron-3.5-asr-streaming-0.6b to the gallery and makes the parakeet-cpp
backend language-aware so it can be selected at request time.

Changes

  • Gallery entry parakeet-cpp-nemotron-3.5-asr-streaming-0.6b (q8_0 default,
    OpenMDW-1.1). 40+ locales in one 0.6B checkpoint, offline and cache-aware
    streaming, byte-identical to NeMo at WER 0, about 2.5x faster than NeMo on CPU.
    GGUF hosted at mudler/parakeet-cpp-gguf (sha256 verified against the live file).
  • Backend pin bump PARAKEET_VERSION to the parakeet.cpp master commit that
    ships nemotron support plus batched causal subsampling and the batched
    target_lang C-API (so the request-coalescing batcher can serve the causal
    model without aborting).
  • Language wiring in the parakeet-cpp backend: the request language field
    is now honored on the batched and streaming paths via the new
    parakeet_capi_transcribe_pcm_batch_json_lang and parakeet_capi_stream_begin_lang
    entry points. Both are probed with Dlsym and used only when present, so the
    backend still loads against an older libparakeet.so (it falls back to the
    default language). The batcher only coalesces requests that share a language
    (one target_lang per batch), holding a differing request over to the next
    batch so it is never dropped.

Notes

  • Empty language means the model default ("auto", which self-detects).
    Non-prompt parakeet models ignore the field, so behavior is unchanged for them.
  • New batcher_test.go case asserts no batch ever mixes languages and every
    request still gets a reply.

Assisted-by: Claude Opus 4.8 (1M context) noreply@anthropic.com

mudler added 2 commits June 6, 2026 09:03
… batched + streaming paths

Reads opts.GetLanguage() and threads it through to the new
parakeet_capi_transcribe_pcm_batch_json_lang and parakeet_capi_stream_begin_lang
C-API entry points, both probed with Dlsym so the backend still loads against an
older libparakeet.so (falling back to the non-lang paths, i.e. model default).

parakeet.cpp's batched C-API takes a single target_lang for the whole batch, so
the dispatcher only coalesces same-language requests: a request whose language
differs from the batch leader is held as a single carry-over and becomes the
leader of the next batch, never dropped and never left waiting (including on
shutdown). A new batcher test asserts no dispatched batch is ever mixed-language
and that every submitted request still receives a reply.

Assisted-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… parakeet.cpp pin

Adds the multilingual prompt-conditioned streaming model to the gallery (q8_0
default, OpenMDW-1.1) and bumps the parakeet-cpp backend pin to the parakeet.cpp
commit that ships nemotron support plus batched causal subsampling and the
batched target_lang C-API.

Assisted-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mudler mudler merged commit 03c84cf into master Jun 6, 2026
68 checks passed
@mudler mudler deleted the feat/parakeet-nemotron-multilingual branch June 6, 2026 11:53
@localai-bot localai-bot added the enhancement New feature or request label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants