feat(voice): implement real-time voice mode with cloud and local backends by Abhijit-2592 · Pull Request #24174 · google-gemini/gemini-cli

Abhijit-2592 · 2026-03-30T00:33:09Z

Summary

This PR implements a real-time Voice Mode for Gemini CLI, allowing users to dictate prompts directly into the terminal. It supports both cloud-based transcription via the Gemini Live API and local-first transcription via Whisper (using whisper.cpp).

Fixes #24175

Details

Transcription Backends:
- Gemini Live API (Cloud): High-accuracy, real-time transcription using Google's Live API. Requires an API key.
- Whisper (Local): Privacy-focused, local-first transcription. Supports multiple model sizes (Tiny, Base, Large) and automatically manages model downloads to ~/.gemini/whisper_models/.
UI Integration:
- New /voice slash command to toggle voice mode and switch backends.
- Use /voice model to manage local Whisper models.
- Push-To-Talk (PTT): Hold space to record, release to stop and submit.
- Continuous Mode: Dictate naturally with real-time text updates in the input buffer.
- Dedicated Voice settings in the configuration dialog.
Audio Infrastructure:
- Uses sox (rec) for cross-platform audio capture.
- Robust handling of audio streams, including automatic VAD (Voice Activity Detection) support where available.

Installation Requirements

To use Voice Mode, you must install the following dependencies:

1. SoX (Sound eXchange)

Required for capturing audio from your microphone.

macOS: brew install sox
Linux: sudo apt install sox libsox-fmt-all
Windows: Download and install from SoX SourceForge. Ensure sox.exe is in your PATH.

2. whisper-stream (for Local Transcription)

Required only if using the Whisper (Local) backend.

macOS: brew install whisper-cpp (The package provides the whisper-stream binary).
Other Platforms:
1. Clone the whisper.cpp repository.
2. Build the stream example: make stream.
3. Rename the resulting stream binary to whisper-stream and move it to a directory in your PATH.

Testing

Enable Voice Mode: Run /voice on in the CLI or toggle it in /settings.
Push-To-Talk: Hold space, speak a prompt, and release. The text should appear in the input and be submitted.
Switch Backends:
- For Cloud: /settings -> Voice -> Transcription Backend -> Gemini Live.
- For Local: /settings -> Voice -> Transcription Backend -> Whisper.
Validation:
- Run unit tests: npm test packages/core/src/voice/liveTranscriptionService.test.ts
- Run integration tests: npm test integration-tests/voice-mode.test.ts

Checklist

I have read the CONTRIBUTING.md document.
I have added/updated tests to cover my changes.
I have updated the documentation (if applicable).
I have run npm run preflight and all checks passed.

gemini-cli · 2026-03-30T00:33:22Z

Hi @Abhijit-2592, thank you so much for your contribution to Gemini CLI! We really appreciate the time and effort you've put into this.

We're making some updates to our contribution process to improve how we track and review changes. Please take a moment to review our recent discussion post: Improving Our Contribution Process & Introducing New Guidelines.

Key Update: Starting January 26, 2026, the Gemini CLI project will require all pull requests to be associated with an existing issue. Any pull requests not linked to an issue by that date will be automatically closed.

Thank you for your understanding and for being a part of our community!

gemini-code-assist · 2026-03-30T00:35:17Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive voice interaction system for the Gemini CLI. By supporting both cloud-based and local transcription backends, it provides users with flexible options for dictating prompts. The changes include significant updates to the UI to support voice recording states, new configuration options for managing voice settings, and the necessary backend infrastructure to handle audio streaming and transcription processing.

Highlights

Voice Mode Implementation: Introduced a real-time voice mode allowing users to dictate prompts using either the Gemini Live API (cloud) or Whisper (local).
UI/UX Enhancements: Added a new /voice slash command, push-to-talk functionality via the spacebar, and a dedicated voice settings dialog.
Audio Infrastructure: Integrated SoX for cross-platform audio capture and implemented a robust transcription service factory to manage different backends.
Local Model Management: Added a Whisper model manager to handle automatic downloads and configuration of local transcription models.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-03-30T00:37:30Z

Size Change: +40 kB (+0.12%)

Total Size: 33.8 MB

Filename	Size	Change
`./bundle/chunk-CJQF4Z4P.js`	0 B	-49.2 kB (removed)	🏆
`./bundle/chunk-HCNND337.js`	0 B	-669 kB (removed)	🏆
`./bundle/chunk-JTXZMAH6.js`	0 B	-2.73 MB (removed)	🏆
`./bundle/chunk-KXQ7D3UP.js`	0 B	-3.43 kB (removed)	🏆
`./bundle/chunk-OJPMPJ6A.js`	0 B	-14.6 MB (removed)	🏆
`./bundle/chunk-Z3HQYKEX.js`	0 B	-3.8 kB (removed)	🏆
`./bundle/core-EMXUA7GA.js`	0 B	-47.7 kB (removed)	🏆
`./bundle/devtoolsService-XE3OICRT.js`	0 B	-27.8 kB (removed)	🏆
`./bundle/gemini-ESVRSJ42.js`	0 B	-578 kB (removed)	🏆
`./bundle/interactiveCli-4VLJILXH.js`	0 B	-1.29 MB (removed)	🏆
`./bundle/liteRtServerManager-4HSYFP3G.js`	0 B	-2.08 kB (removed)	🏆
`./bundle/oauth2-provider-PWQ4XMU3.js`	0 B	-9.16 kB (removed)	🏆
`./bundle/chunk-DN3EMB6X.js`	3.43 kB	+3.43 kB (new file)	🆕
`./bundle/chunk-FD2S3KYG.js`	14.6 MB	+14.6 MB (new file)	🆕
`./bundle/chunk-JUIFMB2M.js`	3.8 kB	+3.8 kB (new file)	🆕
`./bundle/chunk-K3ZYGA7J.js`	672 kB	+672 kB (new file)	🆕
`./bundle/chunk-MT623GTR.js`	49.2 kB	+49.2 kB (new file)	🆕
`./bundle/chunk-X4MR4PEB.js`	2.73 MB	+2.73 MB (new file)	🆕
`./bundle/core-IULKZ22Q.js`	48 kB	+48 kB (new file)	🆕
`./bundle/devtoolsService-KBE5Y6NF.js`	27.8 kB	+27.8 kB (new file)	🆕
`./bundle/gemini-553UOQFL.js`	573 kB	+573 kB (new file)	🆕
`./bundle/interactiveCli-GTAJV72M.js`	1.31 MB	+1.31 MB (new file)	🆕
`./bundle/liteRtServerManager-D3NUXXIV.js`	2.08 kB	+2.08 kB (new file)	🆕
`./bundle/oauth2-provider-OLJEB2AH.js`	9.16 kB	+9.16 kB (new file)	🆕

ℹ️ View Unchanged

Filename	Size	Change
`./bundle/bundled/third_party/index.js`	8 MB	0 B
`./bundle/chunk-34MYV7JD.js`	2.45 kB	0 B
`./bundle/chunk-5AUYMPVF.js`	858 B	0 B
`./bundle/chunk-5PS3AYFU.js`	1.18 kB	0 B
`./bundle/chunk-664ZODQF.js`	124 kB	0 B
`./bundle/chunk-DAHVX5MI.js`	206 kB	0 B
`./bundle/chunk-IUUIT4SU.js`	56.5 kB	0 B
`./bundle/chunk-MTD736U4.js`	1.97 MB	0 B
`./bundle/chunk-RJTRUG2J.js`	39.8 kB	0 B
`./bundle/cleanup-3RKECZLL.js`	0 B	-932 B (removed)	🏆
`./bundle/devtools-36NN55EP.js`	696 kB	0 B
`./bundle/dist-T73EYRDX.js`	356 B	0 B
`./bundle/events-XB7DADIJ.js`	418 B	0 B
`./bundle/examples/hooks/scripts/on-start.js`	188 B	0 B
`./bundle/examples/mcp-server/example.js`	1.43 kB	0 B
`./bundle/gemini.js`	4.97 kB	0 B
`./bundle/getMachineId-bsd-TXG52NKR.js`	1.55 kB	0 B
`./bundle/getMachineId-darwin-7OE4DDZ6.js`	1.55 kB	0 B
`./bundle/getMachineId-linux-SHIFKOOX.js`	1.34 kB	0 B
`./bundle/getMachineId-unsupported-5U5DOEYY.js`	1.06 kB	0 B
`./bundle/getMachineId-win-6KLLGOI4.js`	1.72 kB	0 B
`./bundle/memoryDiscovery-NSOLCG4U.js`	980 B	0 B
`./bundle/multipart-parser-KPBZEGQU.js`	11.7 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/client/main.js`	222 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/src/_client-assets.js`	229 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/src/index.js`	13.4 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/src/types.js`	132 B	0 B
`./bundle/sandbox-macos-permissive-open.sb`	890 B	0 B
`./bundle/sandbox-macos-permissive-proxied.sb`	1.31 kB	0 B
`./bundle/sandbox-macos-restrictive-open.sb`	3.36 kB	0 B
`./bundle/sandbox-macos-restrictive-proxied.sb`	3.56 kB	0 B
`./bundle/sandbox-macos-strict-open.sb`	4.82 kB	0 B
`./bundle/sandbox-macos-strict-proxied.sb`	5.02 kB	0 B
`./bundle/src-QVCVGIUX.js`	47 kB	0 B
`./bundle/start-YKG77TL6.js`	0 B	-622 B (removed)	🏆
`./bundle/tree-sitter-7U6MW5PS.js`	274 kB	0 B
`./bundle/tree-sitter-bash-34ZGLXVX.js`	1.84 MB	0 B
`./bundle/cleanup-XNHBMPY3.js`	932 B	+932 B (new file)	🆕
`./bundle/start-7BUEMFYN.js`	622 B	+622 B (new file)	🆕

_{compressed-size-action}

gemini-code-assist

Code Review

This pull request implements a real-time Voice Mode for the Gemini CLI, supporting both cloud-based transcription via the Gemini Live API and local-first transcription using Whisper. Key features include push-to-talk functionality, new slash commands for voice control, and a management system for downloading Whisper models. The review feedback correctly identifies that the API key requirement should be conditional on the selected backend to allow local mode to function independently. Additionally, a critical security vulnerability was found in the binary check utility, which is susceptible to command injection.

gemini-cli · 2026-04-13T03:07:09Z

Hi there! Thank you for your interest in contributing to Gemini CLI.

To ensure we maintain high code quality and focus on our prioritized roadmap, we have updated our contribution policy (see Discussion #17383).

We only guarantee review and consideration of pull requests for issues that are explicitly labeled as 'help wanted'. All other community pull requests are subject to closure after 14 days if they do not align with our current focus areas. For this reason, we strongly recommend that contributors only submit pull requests against issues explicitly labeled as 'help-wanted'.

This pull request is being closed as it has been open for 14 days without a 'help wanted' designation. We encourage you to find and contribute to existing 'help wanted' issues in our backlog! Thank you for your understanding and for being part of our community!

gemini-cli · 2026-04-14T03:05:57Z

Hi there! Thank you for your interest in contributing to Gemini CLI.

To ensure we maintain high code quality and focus on our prioritized roadmap, we have updated our contribution policy (see Discussion #17383).

We only guarantee review and consideration of pull requests for issues that are explicitly labeled as 'help wanted'. All other community pull requests are subject to closure after 14 days if they do not align with our current focus areas. For this reason, we strongly recommend that contributors only submit pull requests against issues explicitly labeled as 'help-wanted'.

This pull request is being closed as it has been open for 14 days without a 'help wanted' designation. We encourage you to find and contribute to existing 'help wanted' issues in our backlog! Thank you for your understanding and for being part of our community!

gemini-code-assist

Code Review

This pull request introduces an experimental voice mode to the Gemini CLI, allowing for both cloud-based transcription via the Gemini Live API and local transcription using Whisper. Key additions include a new voice configuration schema, a VoiceModelDialog for managing backends and models, and an AudioRecorder service. The feedback highlights critical issues regarding resource management, specifically a microphone access conflict when using the Whisper backend and a memory leak in the model download logic. Additionally, the current transcription handling in the UI is identified as a source of potential data loss during interleaved manual typing.

gemini-cli · 2026-04-16T03:07:41Z

Hi there! Thank you for your interest in contributing to Gemini CLI.

To ensure we maintain high code quality and focus on our prioritized roadmap, we have updated our contribution policy (see Discussion #17383).

We only guarantee review and consideration of pull requests for issues that are explicitly labeled as 'help wanted'. All other community pull requests are subject to closure after 14 days if they do not align with our current focus areas. For this reason, we strongly recommend that contributors only submit pull requests against issues explicitly labeled as 'help-wanted'.

This pull request is being closed as it has been open for 14 days without a 'help wanted' designation. We encourage you to find and contribute to existing 'help wanted' issues in our backlog! Thank you for your understanding and for being part of our community!

…ends

- Modified InputPrompt to allow toggling recording even when the buffer is non-empty, enabling seamless voice mode resumption. - Implemented a "moving baseline" strategy that updates the transcription baseline on every 'turnComplete' event. This ensures that new sentences append to previous ones rather than overwriting them, supporting both incremental (Gemini Live) and cumulative (Whisper) providers. - Fixed a race condition by capturing the current buffer text as the baseline immediately upon the user pressing the spacebar to start recording. - Ensured voice mode hints remain visible when the mode is enabled, regardless of whether the buffer contains text. - Added a comprehensive suite of 6 unit tests in InputPrompt.test.tsx covering basic toggle, multi-turn transcription, and session resumption scenarios.

…e files

… timer racing

This reverts commit 9833d36.

This commit addresses two critical issues preventing the `sandbox:docker` E2E integration test image from building successfully in the CI environment (specifically after the GitHub Actions runner update to Ubuntu 24.04): 1. **Permission Denied Error (EACCES)**: The `Dockerfile` was copying the `cli` and `core` .tgz packages as `root:root` but executing the subsequent `npm install` as the less privileged `node` user. This resulted in an EACCES permission denied error. The fix updates the `COPY` commands to use `--chown=node:node` to explicitly set ownership during the copy. 2. **Missing Dependencies**: The `gemini --version` sanity check at the end of the Docker build was failing with `ERR_MODULE_NOT_FOUND` because the newly added code for voice mode and file utilities was missing its required dependencies (`command-exists` and `isbinaryfile`) in the `@google/gemini-cli-core` package's `package.json`. These have been added.

- Refactor AudioRecorder for re-entrancy and better error reporting - Improve GeminiLiveTranscriptionProvider WebSocket handling and safety - Add type-safe events and path-traversal protection to WhisperModelManager - Extract voice logic from InputPrompt into useVoiceMode hook - Relocate and gate voice settings under experimental group - Refactor voice commands: move /voice-model to /voice model subcommand - Replace custom CircularProgress with standard CliSpinner - Update documentation and keyboard shortcuts reference - Fix InputPrompt unit tests and settings mocks

…leak after stop

…anscription drain

…ing sessions

…ends (google-gemini#24174)

Abhijit-2592 requested a review from a team as a code owner March 30, 2026 00:33

github-advanced-security AI found potential problems Mar 30, 2026

View reviewed changes

Comment thread packages/core/src/utils/binaryCheck.ts Fixed

Comment thread packages/core/src/utils/binaryCheck.ts Fixed

Abhijit-2592 marked this pull request as draft March 30, 2026 00:38

Abhijit-2592 mentioned this pull request Mar 30, 2026

Epic: [Voice Mode] Refinement and Polish #24175

Closed

7 tasks

gemini-code-assist Bot reviewed Mar 30, 2026

View reviewed changes

Comment thread packages/cli/src/ui/components/InputPrompt.tsx Outdated

Comment thread packages/core/src/utils/binaryCheck.ts

gemini-cli Bot added the area/core Issues related to User Interface, OS Support, Core Functionality label Mar 30, 2026

Abhijit-2592 force-pushed the abhijit-2592/voice-mode-2 branch from 0da31ca to c6c038f Compare March 31, 2026 02:29

gemini-cli Bot closed this Apr 13, 2026

scidomino reopened this Apr 13, 2026

github-actions Bot mentioned this pull request Apr 14, 2026

📊 AI CLI 工具社区动态日报 2026-04-14 gsscsd/big_model_radar#183

Open

gemini-cli Bot closed this Apr 14, 2026

Abhijit-2592 reopened this Apr 15, 2026

Abhijit-2592 self-assigned this Apr 15, 2026

Abhijit-2592 force-pushed the abhijit-2592/voice-mode-2 branch from 7561876 to aab45dc Compare April 15, 2026 21:48

Abhijit-2592 marked this pull request as ready for review April 15, 2026 21:49

gemini-code-assist Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread packages/cli/src/ui/components/InputPrompt.tsx Outdated

Comment thread packages/cli/src/ui/components/InputPrompt.tsx Outdated

Comment thread packages/cli/src/ui/components/VoiceModelDialog.tsx Outdated

Abhijit-2592 force-pushed the abhijit-2592/voice-mode-2 branch 3 times, most recently from 9460f47 to aee45c1 Compare April 16, 2026 00:02

Abhijit-2592 requested a review from a team as a code owner April 16, 2026 00:02

Abhijit-2592 requested a review from a team as a code owner April 16, 2026 00:39

github-actions Bot mentioned this pull request Apr 16, 2026

📊 AI CLI 工具社区动态日报 2026-04-16 gsscsd/big_model_radar#193

Open

gemini-cli Bot closed this Apr 16, 2026

This comment was marked as spam.

Sign in to view

Abhijit-2592 added 22 commits April 24, 2026 13:53

feat(voice): implement real-time voice mode with cloud and local back…

27999d6

…ends

feat(voice): implement Push-To-Talk (PTT) with Optimistic Space strategy

de1a895

fix(voice): graceful stop for PTT transcription

3105d8a

feat(voice): place voice mode behind experimental flag

474a8dc

chore(settings): enable voiceMode in local settings for testing

5be1ef2

fix(voice): add friendly availability check for whisper-stream

082ec70

chore(voice): fix test build errors and enable voiceMode in settings

081d393

fix(voice): allow local voice mode without API key and remove obsolet…

b1175f6

…e files

fix(voice): address PR feedback on security and reliability

0d184a1

fix(voice): enforce strict typing for websocket payloads

080358c

test(voice): fix flakiness in PTT release test by avoiding artificial…

0158746

… timer racing

ci: install whisper-cpp on all platforms for voice mode tests

0842cd9

Revert "ci: install whisper-cpp on all platforms for voice mode tests"

f61727c

This reverts commit 9833d36.

test: skip whisper integration tests if whisper-stream is missing

00a77b1

test(voice): re-add UIState import after rebase

0e30321

docs: regenerate schemas and keyboard-shortcuts.md after rebase

c38fc16

fix(voice): fix stop logic race conditions and prevent transcription …

8921589

…leak after stop

fix(voice): auto-close voice model dialog on selection and restore tr…

cc44b28

…anscription drain

fix(voice): prevent orphaned draining service from killing new record…

5098f69

…ing sessions

Abhijit-2592 force-pushed the abhijit-2592/voice-mode-2 branch from 07bbe96 to 5098f69 Compare April 24, 2026 21:00

Abhijit-2592 added this pull request to the merge queue Apr 24, 2026

Merged via the queue into main with commit 2e0641c Apr 24, 2026
26 of 27 checks passed

Abhijit-2592 deleted the abhijit-2592/voice-mode-2 branch April 24, 2026 21:41

This was referenced Apr 25, 2026

📊 AI CLI 工具社区动态日报 2026-04-25 borq168/big_model_radar#13

Open

📊 AI CLI Tools Digest 2026-04-25 borq168/big_model_radar#16

Open

kimjune01 pushed a commit to kimjune01/gemini-cli-claude that referenced this pull request May 6, 2026

feat(voice): implement real-time voice mode with cloud and local back…

30d76dc

…ends (google-gemini#24174)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(voice): implement real-time voice mode with cloud and local backends#24174

feat(voice): implement real-time voice mode with cloud and local backends#24174
Abhijit-2592 merged 27 commits intomainfrom
abhijit-2592/voice-mode-2

Abhijit-2592 commented Mar 30, 2026 •

edited

Loading

Uh oh!

gemini-cli Bot commented Mar 30, 2026

Uh oh!

gemini-code-assist Bot commented Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-cli Bot commented Apr 13, 2026

Uh oh!

gemini-cli Bot commented Apr 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-cli Bot commented Apr 16, 2026

Uh oh!

This comment was marked as spam.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Abhijit-2592 commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Installation Requirements

1. SoX (Sound eXchange)

2. whisper-stream (for Local Transcription)

Testing

Checklist

Uh oh!

gemini-cli Bot commented Mar 30, 2026

Uh oh!

gemini-code-assist Bot commented Mar 30, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-cli Bot commented Apr 13, 2026

Uh oh!

gemini-cli Bot commented Apr 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-cli Bot commented Apr 16, 2026

Uh oh!

This comment was marked as spam.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Abhijit-2592 commented Mar 30, 2026 •

edited

Loading

github-actions Bot commented Mar 30, 2026 •

edited

Loading