feat(#179 voice): mobile voice recording + transcribe pipeline#254
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new mobile “Voice” surface that records audio with Expo Audio, uploads it to the paired desktop’s /api/voice/transcribe, and renders the transcript; includes supporting API client plumbing, permissions, and unit tests for the non-React helpers.
Changes:
- Introduce
VoiceScreentab with a simple recording → upload → transcript state machine. - Add
voice-service.tshelpers for URI→base64 conversion and transcribe orchestration with stable UI-facing error codes. - Extend the mobile API client with
transcribeVoice(...)using a 60s timeout override; wire permissions/deps/config for microphone + file access.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pnpm-lock.yaml | Locks new Expo dependencies used by the voice feature. |
| CHANGELOG.md | Documents the new mobile voice recording/transcribe flow and test plan. |
| apps/mobile/src/services/voice-service.ts | Adds testable helpers for base64 conversion and transcription error mapping. |
| apps/mobile/src/services/api-client.ts | Adds transcribeVoice(...) and request timeout override support. |
| apps/mobile/src/screens/VoiceScreen.tsx | New UI tab for recording, uploading, and displaying transcripts. |
| apps/mobile/src/App.tsx | Adds “Voice” tab to the main tab switch + tab bar. |
| apps/mobile/src/tests/voice-service.test.ts | Adds unit tests for voice helpers with mocked file system + fetch. |
| apps/mobile/package.json | Declares new Expo deps (expo-audio, expo-file-system). |
| apps/mobile/app.json | Adds microphone permissions + expo-audio plugin config. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| */ | ||
|
|
||
| import { File } from 'expo-file-system'; | ||
| import { SkyTwinApiClient } from './api-client'; |
|
|
||
| export interface VoiceTranscriptError { | ||
| ok: false; | ||
| /** Stable code so callers can branch on `permission_denied` vs. generic. */ |
Comment on lines
+261
to
+264
| * The API tolerates audio up to 25MB base64 (~18MB decoded). The | ||
| * recorder hook bounds clip length in the UI, so we don't pre-check | ||
| * size here — let the server return the 413 and the screen renders | ||
| * the message. |
Comment on lines
+162
to
+167
| const openSystemSettings = useCallback(() => { | ||
| Alert.alert( | ||
| 'Microphone access needed', | ||
| 'Open Settings → SkyTwin and enable Microphone to use voice. We never store recordings anywhere besides your paired desktop.', | ||
| ); | ||
| }, []); |
| const openSystemSettings = useCallback(() => { | ||
| Alert.alert( | ||
| 'Microphone access needed', | ||
| 'Open Settings → SkyTwin and enable Microphone to use voice. We never store recordings anywhere besides your paired desktop.', |
| <View style={styles.resultCard}> | ||
| <Text style={styles.resultText}>{state.transcript.trim() || '(silence)'}</Text> | ||
| </View> | ||
| <Text style={styles.resultMeta}>{formatBytes(state.durationBytes)} of audio</Text> |
|
|
||
| export interface VoiceTranscriptError { | ||
| ok: false; | ||
| /** Stable code so callers can branch on `permission_denied` vs. generic. */ |
| */ | ||
|
|
||
| import { File } from 'expo-file-system'; | ||
| import { SkyTwinApiClient } from './api-client'; |
Comment on lines
+162
to
+167
| const openSystemSettings = useCallback(() => { | ||
| Alert.alert( | ||
| 'Microphone access needed', | ||
| 'Open Settings → SkyTwin and enable Microphone to use voice. We never store recordings anywhere besides your paired desktop.', | ||
| ); | ||
| }, []); |
Comment on lines
+261
to
+264
| * The API tolerates audio up to 25MB base64 (~18MB decoded). The | ||
| * recorder hook bounds clip length in the UI, so we don't pre-check | ||
| * size here — let the server return the 413 and the screen renders | ||
| * the message. |
jayzalowitz
added a commit
that referenced
this pull request
May 11, 2026
…tings, accurate copy + size label Six Copilot findings on PR #254 addressed: 1. voice-service.ts now uses `import type { SkyTwinApiClient }` — the client is only a type reference here, runtime import was unnecessary coupling. 2. VoiceTranscriptError docstring no longer references a permission_denied code that doesn't exist in the union; mic permission is handled inside VoiceScreen before this layer. 3. transcribeVoice docstring corrected: 25MB DECODED (~33MB base64), not "25MB base64." 4. openSystemSettings now actually opens the OS settings page via Linking.openSettings() with the explanatory alert as fallback. One-tap recovery instead of just an explainer. 5. Permission-denial copy acknowledges the temporary on-device audio file. The earlier "never stored anywhere besides paired desktop" was technically inaccurate (file URI is read back as base64). 6. Result-state UI changed from "{X} of audio" to "Audio size: {X}" so the byte count isn't mislabeled as a duration. Test plan: mobile 163/163 passing (+2 skipped, unrelated). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Owner
Author
|
Round-2 review reply: all 4 findings were already addressed in round 1 (commit 05e0449), but Copilot's round-2 read appears to be against the pre-fix state. Current state on this branch:
|
Closes the code-bound half of #179. The mobile app can now capture audio and ship it to the paired desktop's `/api/voice/transcribe` (the route landed in PR #244). The remaining work is QA on physical devices, which has always been the actual blocker. Components: - New `VoiceScreen` tab with six-state machine (idle, denied, recording, processing, result, error). Recording driven by `useAudioRecorder` from `expo-audio`. Pulse animation + tabular timer while recording; permission-denied state has a "How to fix" affordance. - New `voice-service.ts` pure helpers: `audioFileToBase64()` reads the recorder's output via `expo-file-system`'s `File.base64()` API; `transcribeRecording()` orchestrates base64 → upload → result mapping with stable error codes the UI branches on. - New `transcribeVoice(userId, audioBase64, language?)` method on the API client; 60s timeout because whisper's first-run model load can take several seconds on cold start. - Permissions added to app.json: NSMicrophoneUsageDescription (iOS) + RECORD_AUDIO (Android) + expo-audio plugin entry. - Deps: expo-audio ~55.0.14, expo-file-system ~55.0.19 (the latter was already transitively installed; declared explicitly so the dep is auditable). Out of scope for this PR (deliberate follow-ups): TTS playback (pairs with #187 AC#4 desktop Piper), "send to twin" hand-off (waits on mobile assistant surface), physical-device QA on real iOS/Android hardware. Test plan: 11 new vitest cases mocking `File.base64()` + fetch. Mobile suite: 163 passing + 2 skipped (discovery tests, unrelated). Workspace: 70/70 turbo tasks green; build clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tings, accurate copy + size label Six Copilot findings on PR #254 addressed: 1. voice-service.ts now uses `import type { SkyTwinApiClient }` — the client is only a type reference here, runtime import was unnecessary coupling. 2. VoiceTranscriptError docstring no longer references a permission_denied code that doesn't exist in the union; mic permission is handled inside VoiceScreen before this layer. 3. transcribeVoice docstring corrected: 25MB DECODED (~33MB base64), not "25MB base64." 4. openSystemSettings now actually opens the OS settings page via Linking.openSettings() with the explanatory alert as fallback. One-tap recovery instead of just an explainer. 5. Permission-denial copy acknowledges the temporary on-device audio file. The earlier "never stored anywhere besides paired desktop" was technically inaccurate (file URI is read back as base64). 6. Result-state UI changed from "{X} of audio" to "Audio size: {X}" so the byte count isn't mislabeled as a duration. Test plan: mobile 163/163 passing (+2 skipped, unrelated). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
05e0449 to
348463a
Compare
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the code-bound half of #179. The mobile app can now capture audio and ship it to the paired desktop's
/api/voice/transcribe(the route landed in PR #244). The remaining work on #179 is QA on physical iOS/Android devices, which has always been the actual blocker — see the 2026-05-11 status comment on #179 for the corrected scoping.What changed
New
VoiceScreentab (apps/mobile/src/screens/VoiceScreen.tsx). Tap-to-record → tap-to-stop → upload → transcript. Six-state machine (idle/denied/recording/processing/result/error) with the recording lifecycle driven byuseAudioRecorderfromexpo-audio. Pulse animation + tabular-numeric timer while recording. Permission-denied state has a "How to fix" affordance pointing the user at system settings.New
voice-service.tspure helpers:audioFileToBase64(uri)reads the recorder's output viaexpo-file-system'sFile.base64()API (the SDK 55 idiomatic path).transcribeRecording(client, userId, uri, language?)orchestrates base64 → upload → result mapping with stable error codes (no_audio/read_failed/whisper_unavailable/network/unknown) so the UI can branch on cause without parsing free-form error strings.New
transcribeVoice(...)on the API client. Uses the existingrequest<T>()layer with a 60s timeout override because whisper's first-run model load can take several seconds on cold start — the default 10s would abort mid-transcribe. NewTranscribeResponseinterface added to the response-type block.Permissions in
app.json:NSMicrophoneUsageDescription(iOS),RECORD_AUDIO(Android),expo-audioplugin entry. Permission copy emphasizes "sent to your paired SkyTwin desktop for on-device transcription" so the install prompt matches the privacy story.Tab nav:
voiceadded to theMainTabenum inApp.tsx, wired into therenderContentswitch, and a "Voice"TabButtonplaced between Capabilities and Dashboard.Deps:
expo-audio: ~55.0.14,expo-file-system: ~55.0.19(versions match Expo SDK 55'sbundledNativeModules.json). The latter was already transitively installed viaexpo-asset; declared explicitly so the dep is auditable.What this PR deliberately does NOT do
piperbinary is on PATH.expo-audioAPI works in the simulator but real-device behavior (silent-mode switch, AirPods routing, background record interruption) needs hardware to verify.Test plan
voice-service.test.tscovering:audioFileToBase64: null/empty URI →no_audio, happy path, empty result →no_audio,File.base64()rejection →read_failed.transcribeRecording: 200 happy path, language code forwarding, 503 →whisper_unavailable, 413 →unknown(with detail in message), missing URI short-circuits beforefetchis called, network failure →network.apps/mobile/src/__tests__/api-client.test.ts:23(aTestApiClientstub instead of importing the real one) to keep React Native imports out of Node's test runner. Same convention already established for this app.integration-live.test.ts— discovery tests, unrelated).pnpm test).pnpm build --concurrency=1clean.npx tsc --noEmitclean forVoiceScreen.tsx,voice-service.ts,api-client.ts,App.tsx(pre-existing TS errors indiscovery.ts,notifications.ts, andapi-client.test.tsare unrelated to this PR).Notes for reviewers
VoiceScreen.tsxrather than in the service module becauseuseAudioRecorderis a React hook — pulling it into a plain module would lose the hook's internal teardown and recording-state mirroring. The pieces that are easy to unit-test outside React (base64 conversion + transcribe orchestration with error mapping) live invoice-service.tsand are exercised in the test file.expo-audiois the SDK 55 path (expo-avwas removed in SDK 53+). The hook API works cleanly here because the screen owns the recorder.🤖 Generated with Claude Code