Skip to content

feat: Use NVIDIA NIM ASR for audio transcription#53

Merged
Alishahryar1 merged 20 commits into
Alishahryar1:mainfrom
MauroDruwel:useNvidiaNimASR
Feb 28, 2026
Merged

feat: Use NVIDIA NIM ASR for audio transcription#53
Alishahryar1 merged 20 commits into
Alishahryar1:mainfrom
MauroDruwel:useNvidiaNimASR

Conversation

@MauroDruwel

@MauroDruwel MauroDruwel commented Feb 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Added NVIDIA NIM as a second transcription option ( alongside local Whisper). This lets you transcribe voice notes using NVIDIA's cloud API instead of running Whisper locally.

What changed

  • Transcription: Now supports the two backends

    • Local Whisper: Free, runs on your GPU/CPU (existing)
    • NVIDIA NIM: Cloud API via Riva gRPC (new)
  • Supported models: 8 NVIDIA NIM models added (Parakeet variants for different languages, Whisper Large V3)

@Alishahryar1

Copy link
Copy Markdown
Owner

I decided to not add this because it adds an additional dependency and server outside of uv which is a hassle for users.

@MauroDruwel

Copy link
Copy Markdown
Contributor Author

I decided to not add this because it adds an additional dependency and server outside of uv which is a hassle for users.

This is an optional dependecy, I need to update .toml to not be in voice but like voice_nim, not a new required dependency.
It actually has fewer dependencies than local Whisper (no PyTorch/CUDA/model downloads). And since the project already depends on NVIDIA NIM and requires an API key and NVIDIA servers, this does not introduce a new server or hassle for the users. Let me know why you think it might be a hassle please :)

@Alishahryar1 Alishahryar1 reopened this Feb 25, 2026
@Alishahryar1

Copy link
Copy Markdown
Owner

Ok sounds good as long as it can be with uv sync --voice and doesn't require launching an extra server. You can do it.

@MauroDruwel

Copy link
Copy Markdown
Contributor Author

Ok sounds good as long as it can be with uv sync --voice and doesn't require launching an extra server. You can do it.

Isn't it better to use something like uv sync --voice_nim? Some dependencies in --voice are over 1 GB and aren't needed for my part, they’re only required for local Whisper, so downloading them would be pointless.

@Alishahryar1

Copy link
Copy Markdown
Owner

Yes, that's better

@MauroDruwel

MauroDruwel commented Feb 26, 2026

Copy link
Copy Markdown
Contributor Author

Something to note: the project uses a very recent Python version (3.14), and wheels for grpcio-tools are not yet available. As a result, on a fresh system you need to install the required build dependencies:
sudo apt install build-essential cmake pkg-config

Fixed in b3d815c by using newer versions of grpcio then nvidia-riva-client initially offered

Comment thread .env.example
@Alishahryar1

Alishahryar1 commented Feb 27, 2026

Copy link
Copy Markdown
Owner

Is it ready to be merged?

@MauroDruwel

Copy link
Copy Markdown
Contributor Author

Is it ready to be merged?

Not yet, I still need to fix smth in transcription, I will mark as ready when I'm ready :)

@MauroDruwel MauroDruwel marked this pull request as ready for review February 28, 2026 13:39
@MauroDruwel MauroDruwel changed the title Use nvidia nim asr feat: Use NVIDIA NIM ASR for audio transcription Feb 28, 2026
@MauroDruwel

Copy link
Copy Markdown
Contributor Author

@Alishahryar1 It's ready for review 😉
Feel free to test around and play with it first, I haven't done much testing, but seems to work and be quite fast aswell

@Alishahryar1

Alishahryar1 commented Feb 28, 2026

Copy link
Copy Markdown
Owner

Some ci checks failing you can run those locally as well. It's best practice to run all checks locally before pushing.

Comment thread messaging/transcription.py Outdated
Comment thread messaging/transcription.py
Comment thread messaging/transcription.py Outdated
Comment thread pyproject.toml
Comment thread messaging/transcription.py Outdated
Comment thread .env.example
Comment thread config/settings.py Outdated
@Alishahryar1

Copy link
Copy Markdown
Owner

LGTM. Great work!

@Alishahryar1 Alishahryar1 merged commit de70700 into Alishahryar1:main Feb 28, 2026
1 check passed
Jeeltilva pushed a commit to Jeeltilva/free-claude-code that referenced this pull request Mar 12, 2026
## Summary
Added NVIDIA NIM as a second transcription option ( alongside local
Whisper). This lets you transcribe voice notes using NVIDIA's cloud API
instead of running Whisper locally.

## What changed

- **Transcription**: Now supports the two backends

  - Local Whisper: Free, runs on your GPU/CPU (existing)
  - NVIDIA NIM: Cloud API via Riva gRPC (new)

- **Supported models**: 8 NVIDIA NIM models added (Parakeet variants for
different languages, Whisper Large V3)

---------

Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
diyism pushed a commit to diyism/cc-nim that referenced this pull request Apr 28, 2026
## Summary
Added NVIDIA NIM as a second transcription option ( alongside local
Whisper). This lets you transcribe voice notes using NVIDIA's cloud API
instead of running Whisper locally.

## What changed

- **Transcription**: Now supports the two backends

  - Local Whisper: Free, runs on your GPU/CPU (existing)
  - NVIDIA NIM: Cloud API via Riva gRPC (new)

- **Supported models**: 8 NVIDIA NIM models added (Parakeet variants for
different languages, Whisper Large V3)

---------

Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
SwayamDash pushed a commit to SwayamDash/quench that referenced this pull request May 2, 2026
## Summary
Added NVIDIA NIM as a second transcription option ( alongside local
Whisper). This lets you transcribe voice notes using NVIDIA's cloud API
instead of running Whisper locally.

## What changed

- **Transcription**: Now supports the two backends

  - Local Whisper: Free, runs on your GPU/CPU (existing)
  - NVIDIA NIM: Cloud API via Riva gRPC (new)

- **Supported models**: 8 NVIDIA NIM models added (Parakeet variants for
different languages, Whisper Large V3)

---------

Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants