feat(parakeet-cpp): nemotron-3.5-asr multilingual streaming model + request language support#10199
Merged
Merged
Conversation
… batched + streaming paths Reads opts.GetLanguage() and threads it through to the new parakeet_capi_transcribe_pcm_batch_json_lang and parakeet_capi_stream_begin_lang C-API entry points, both probed with Dlsym so the backend still loads against an older libparakeet.so (falling back to the non-lang paths, i.e. model default). parakeet.cpp's batched C-API takes a single target_lang for the whole batch, so the dispatcher only coalesces same-language requests: a request whose language differs from the batch leader is held as a single carry-over and becomes the leader of the next batch, never dropped and never left waiting (including on shutdown). A new batcher test asserts no dispatched batch is ever mixed-language and that every submitted request still receives a reply. Assisted-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… parakeet.cpp pin Adds the multilingual prompt-conditioned streaming model to the gallery (q8_0 default, OpenMDW-1.1) and bumps the parakeet-cpp backend pin to the parakeet.cpp commit that ships nemotron support plus batched causal subsampling and the batched target_lang C-API. Assisted-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds NVIDIA's multilingual, prompt-conditioned streaming ASR model
nemotron-3.5-asr-streaming-0.6bto the gallery and makes the parakeet-cppbackend language-aware so it can be selected at request time.
Changes
parakeet-cpp-nemotron-3.5-asr-streaming-0.6b(q8_0 default,OpenMDW-1.1). 40+ locales in one 0.6B checkpoint, offline and cache-aware
streaming, byte-identical to NeMo at WER 0, about 2.5x faster than NeMo on CPU.
GGUF hosted at
mudler/parakeet-cpp-gguf(sha256 verified against the live file).PARAKEET_VERSIONto the parakeet.cpp master commit thatships nemotron support plus batched causal subsampling and the batched
target_langC-API (so the request-coalescing batcher can serve the causalmodel without aborting).
languagefieldis now honored on the batched and streaming paths via the new
parakeet_capi_transcribe_pcm_batch_json_langandparakeet_capi_stream_begin_langentry points. Both are probed with
Dlsymand used only when present, so thebackend still loads against an older libparakeet.so (it falls back to the
default language). The batcher only coalesces requests that share a language
(one
target_langper batch), holding a differing request over to the nextbatch so it is never dropped.
Notes
languagemeans the model default ("auto", which self-detects).Non-prompt parakeet models ignore the field, so behavior is unchanged for them.
batcher_test.gocase asserts no batch ever mixes languages and everyrequest still gets a reply.
Assisted-by: Claude Opus 4.8 (1M context) noreply@anthropic.com