feat(#179 voice): mobile voice recording + transcribe pipeline by jayzalowitz · Pull Request #254 · jayzalowitz/skytwin

jayzalowitz · 2026-05-11T15:14:50Z

Summary

Closes the code-bound half of #179. The mobile app can now capture audio and ship it to the paired desktop's /api/voice/transcribe (the route landed in PR #244). The remaining work on #179 is QA on physical iOS/Android devices, which has always been the actual blocker — see the 2026-05-11 status comment on #179 for the corrected scoping.

What changed

New VoiceScreen tab (apps/mobile/src/screens/VoiceScreen.tsx). Tap-to-record → tap-to-stop → upload → transcript. Six-state machine (idle / denied / recording / processing / result / error) with the recording lifecycle driven by useAudioRecorder from expo-audio. Pulse animation + tabular-numeric timer while recording. Permission-denied state has a "How to fix" affordance pointing the user at system settings.
New voice-service.ts pure helpers:
- audioFileToBase64(uri) reads the recorder's output via expo-file-system's File.base64() API (the SDK 55 idiomatic path).
- transcribeRecording(client, userId, uri, language?) orchestrates base64 → upload → result mapping with stable error codes (no_audio / read_failed / whisper_unavailable / network / unknown) so the UI can branch on cause without parsing free-form error strings.
New transcribeVoice(...) on the API client. Uses the existing request<T>() layer with a 60s timeout override because whisper's first-run model load can take several seconds on cold start — the default 10s would abort mid-transcribe. New TranscribeResponse interface added to the response-type block.
Permissions in app.json: NSMicrophoneUsageDescription (iOS), RECORD_AUDIO (Android), expo-audio plugin entry. Permission copy emphasizes "sent to your paired SkyTwin desktop for on-device transcription" so the install prompt matches the privacy story.
Tab nav: voice added to the MainTab enum in App.tsx, wired into the renderContent switch, and a "Voice" TabButton placed between Capabilities and Dashboard.
Deps: expo-audio: ~55.0.14, expo-file-system: ~55.0.19 (versions match Expo SDK 55's bundledNativeModules.json). The latter was already transitively installed via expo-asset; declared explicitly so the dep is auditable.

What this PR deliberately does NOT do

TTS playback. The screen displays the transcript but does not speak responses back. That's a separate flow that pairs with Capability loop #N: Embedded LLM with auto-upgrade path (Phi/Llama/Qwen via llama.cpp; Whisper-tiny STT; Piper TTS) #187 AC#4 (desktop Piper TTS) once a piper binary is on PATH.
"Send to twin" hand-off. The transcript is shown but not yet pipelined to the assistant or the decision route. A follow-up will route the transcribed text through the existing assistant route once that mobile surface lands.
Physical-device QA. The Expo SDK 55 expo-audio API works in the simulator but real-device behavior (silent-mode switch, AirPods routing, background record interruption) needs hardware to verify.

Test plan

11 new vitest cases in voice-service.test.ts covering:
- audioFileToBase64: null/empty URI → no_audio, happy path, empty result → no_audio, File.base64() rejection → read_failed.
- transcribeRecording: 200 happy path, language code forwarding, 503 → whisper_unavailable, 413 → unknown (with detail in message), missing URI short-circuits before fetch is called, network failure → network.
The test file mirrors the inlined-class pattern at apps/mobile/src/__tests__/api-client.test.ts:23 (a TestApiClient stub instead of importing the real one) to keep React Native imports out of Node's test runner. Same convention already established for this app.
Full mobile suite: 165 tests, 163 passing + 2 skipped (the 2 skips are pre-existing in integration-live.test.ts — discovery tests, unrelated).
Full workspace: 70/70 turbo tasks green (pnpm test).
pnpm build --concurrency=1 clean.
npx tsc --noEmit clean for VoiceScreen.tsx, voice-service.ts, api-client.ts, App.tsx (pre-existing TS errors in discovery.ts, notifications.ts, and api-client.test.ts are unrelated to this PR).

Notes for reviewers

The recording lifecycle lives inside VoiceScreen.tsx rather than in the service module because useAudioRecorder is a React hook — pulling it into a plain module would lose the hook's internal teardown and recording-state mirroring. The pieces that are easy to unit-test outside React (base64 conversion + transcribe orchestration with error mapping) live in voice-service.ts and are exercised in the test file.
expo-audio is the SDK 55 path (expo-av was removed in SDK 53+). The hook API works cleanly here because the screen owns the recorder.
Error codes are deliberately stable strings rather than free-form messages so a future analytics layer or audit-log surface can attribute voice failures without regex-parsing strings.

🤖 Generated with Claude Code

Copilot

Pull request overview

Adds a new mobile “Voice” surface that records audio with Expo Audio, uploads it to the paired desktop’s /api/voice/transcribe, and renders the transcript; includes supporting API client plumbing, permissions, and unit tests for the non-React helpers.

Changes:

Introduce VoiceScreen tab with a simple recording → upload → transcript state machine.
Add voice-service.ts helpers for URI→base64 conversion and transcribe orchestration with stable UI-facing error codes.
Extend the mobile API client with transcribeVoice(...) using a 60s timeout override; wire permissions/deps/config for microphone + file access.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
pnpm-lock.yaml	Locks new Expo dependencies used by the voice feature.
CHANGELOG.md	Documents the new mobile voice recording/transcribe flow and test plan.
apps/mobile/src/services/voice-service.ts	Adds testable helpers for base64 conversion and transcription error mapping.
apps/mobile/src/services/api-client.ts	Adds `transcribeVoice(...)` and request timeout override support.
apps/mobile/src/screens/VoiceScreen.tsx	New UI tab for recording, uploading, and displaying transcripts.
apps/mobile/src/App.tsx	Adds “Voice” tab to the main tab switch + tab bar.
apps/mobile/src/tests/voice-service.test.ts	Adds unit tests for voice helpers with mocked file system + fetch.
apps/mobile/package.json	Declares new Expo deps (`expo-audio`, `expo-file-system`).
apps/mobile/app.json	Adds microphone permissions + `expo-audio` plugin config.

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+ */
+
+import { File } from 'expo-file-system';
+import { SkyTwinApiClient } from './api-client';


+
+export interface VoiceTranscriptError {
+  ok: false;
+  /** Stable code so callers can branch on `permission_denied` vs. generic. */


+   * The API tolerates audio up to 25MB base64 (~18MB decoded). The
+   * recorder hook bounds clip length in the UI, so we don't pre-check
+   * size here — let the server return the 413 and the screen renders
+   * the message.


+  const openSystemSettings = useCallback(() => {
+    Alert.alert(
+      'Microphone access needed',
+      'Open Settings → SkyTwin and enable Microphone to use voice. We never store recordings anywhere besides your paired desktop.',
+    );
+  }, []);


+  const openSystemSettings = useCallback(() => {
+    Alert.alert(
+      'Microphone access needed',
+      'Open Settings → SkyTwin and enable Microphone to use voice. We never store recordings anywhere besides your paired desktop.',


+            <View style={styles.resultCard}>
+              <Text style={styles.resultText}>{state.transcript.trim() || '(silence)'}</Text>
+            </View>
+            <Text style={styles.resultMeta}>{formatBytes(state.durationBytes)} of audio</Text>


Copilot

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

+
+export interface VoiceTranscriptError {
+  ok: false;
+  /** Stable code so callers can branch on `permission_denied` vs. generic. */


+ */
+
+import { File } from 'expo-file-system';
+import { SkyTwinApiClient } from './api-client';


+  const openSystemSettings = useCallback(() => {
+    Alert.alert(
+      'Microphone access needed',
+      'Open Settings → SkyTwin and enable Microphone to use voice. We never store recordings anywhere besides your paired desktop.',
+    );
+  }, []);


+   * The API tolerates audio up to 25MB base64 (~18MB decoded). The
+   * recorder hook bounds clip length in the UI, so we don't pre-check
+   * size here — let the server return the 413 and the screen renders
+   * the message.


…tings, accurate copy + size label Six Copilot findings on PR #254 addressed: 1. voice-service.ts now uses `import type { SkyTwinApiClient }` — the client is only a type reference here, runtime import was unnecessary coupling. 2. VoiceTranscriptError docstring no longer references a permission_denied code that doesn't exist in the union; mic permission is handled inside VoiceScreen before this layer. 3. transcribeVoice docstring corrected: 25MB DECODED (~33MB base64), not "25MB base64." 4. openSystemSettings now actually opens the OS settings page via Linking.openSettings() with the explanatory alert as fallback. One-tap recovery instead of just an explainer. 5. Permission-denial copy acknowledges the temporary on-device audio file. The earlier "never stored anywhere besides paired desktop" was technically inaccurate (file URI is read back as base64). 6. Result-state UI changed from "{X} of audio" to "Audio size: {X}" so the byte count isn't mislabeled as a duration. Test plan: mobile 163/163 passing (+2 skipped, unrelated). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

jayzalowitz · 2026-05-12T02:54:04Z

Round-2 review reply: all 4 findings were already addressed in round 1 (commit 05e0449), but Copilot's round-2 read appears to be against the pre-fix state.

Current state on this branch:

type-only SkyTwinApiClient import: already import type at line 19 of voice-service.ts.
permission_denied docstring: already removed; the comment now lists the actual codes in the union (no_audio, read_failed, whisper_unavailable, network, unknown).
25MB base64 vs decoded: docstring corrected to "25MB decoded (~33MB base64)" (line 261).
openSystemSettings Alert vs deep-link: already wired to Linking.openSettings() with the alert as a fallback (commit 05e0449).
"bounds clip length" comment: already rewritten to "The mobile recorder has no explicit cap; the screen surfaces the 413 if the user records past the limit" — no longer implies a non-existent guardrail.

Closes the code-bound half of #179. The mobile app can now capture audio and ship it to the paired desktop's `/api/voice/transcribe` (the route landed in PR #244). The remaining work is QA on physical devices, which has always been the actual blocker. Components: - New `VoiceScreen` tab with six-state machine (idle, denied, recording, processing, result, error). Recording driven by `useAudioRecorder` from `expo-audio`. Pulse animation + tabular timer while recording; permission-denied state has a "How to fix" affordance. - New `voice-service.ts` pure helpers: `audioFileToBase64()` reads the recorder's output via `expo-file-system`'s `File.base64()` API; `transcribeRecording()` orchestrates base64 → upload → result mapping with stable error codes the UI branches on. - New `transcribeVoice(userId, audioBase64, language?)` method on the API client; 60s timeout because whisper's first-run model load can take several seconds on cold start. - Permissions added to app.json: NSMicrophoneUsageDescription (iOS) + RECORD_AUDIO (Android) + expo-audio plugin entry. - Deps: expo-audio ~55.0.14, expo-file-system ~55.0.19 (the latter was already transitively installed; declared explicitly so the dep is auditable). Out of scope for this PR (deliberate follow-ups): TTS playback (pairs with #187 AC#4 desktop Piper), "send to twin" hand-off (waits on mobile assistant surface), physical-device QA on real iOS/Android hardware. Test plan: 11 new vitest cases mocking `File.base64()` + fetch. Mobile suite: 163 passing + 2 skipped (discovery tests, unrelated). Workspace: 70/70 turbo tasks green; build clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…tings, accurate copy + size label Six Copilot findings on PR #254 addressed: 1. voice-service.ts now uses `import type { SkyTwinApiClient }` — the client is only a type reference here, runtime import was unnecessary coupling. 2. VoiceTranscriptError docstring no longer references a permission_denied code that doesn't exist in the union; mic permission is handled inside VoiceScreen before this layer. 3. transcribeVoice docstring corrected: 25MB DECODED (~33MB base64), not "25MB base64." 4. openSystemSettings now actually opens the OS settings page via Linking.openSettings() with the explanatory alert as fallback. One-tap recovery instead of just an explainer. 5. Permission-denial copy acknowledges the temporary on-device audio file. The earlier "never stored anywhere besides paired desktop" was technically inaccurate (file URI is read back as base64). 6. Result-state UI changed from "{X} of audio" to "Audio size: {X}" so the byte count isn't mislabeled as a duration. Test plan: mobile 163/163 passing (+2 skipped, unrelated). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 11, 2026 15:14

Copilot started reviewing on behalf of jayzalowitz May 11, 2026 15:15 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

jayzalowitz requested a review from Copilot May 11, 2026 23:11

Copilot started reviewing on behalf of jayzalowitz May 11, 2026 23:12 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

jayzalowitz and others added 2 commits May 12, 2026 00:44

jayzalowitz force-pushed the jayzalowitz/issue-179-mobile-voice branch from 05e0449 to 348463a Compare May 12, 2026 04:45

jayzalowitz merged commit f317cb2 into main May 12, 2026
7 checks passed

jayzalowitz mentioned this pull request May 13, 2026

docs: sync README + CLAUDE.md with v0.6.18-0.6.21 merge sweep #273

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#179 voice): mobile voice recording + transcribe pipeline#254

feat(#179 voice): mobile voice recording + transcribe pipeline#254
jayzalowitz merged 2 commits into
mainfrom
jayzalowitz/issue-179-mobile-voice

jayzalowitz commented May 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

jayzalowitz commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jayzalowitz commented May 11, 2026

Summary

What changed

What this PR deliberately does NOT do

Test plan

Notes for reviewers

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

jayzalowitz commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants