Skip to content

feat(#179 voice): mobile voice recording + transcribe pipeline#254

Merged
jayzalowitz merged 2 commits into
mainfrom
jayzalowitz/issue-179-mobile-voice
May 12, 2026
Merged

feat(#179 voice): mobile voice recording + transcribe pipeline#254
jayzalowitz merged 2 commits into
mainfrom
jayzalowitz/issue-179-mobile-voice

Conversation

@jayzalowitz

Copy link
Copy Markdown
Owner

Summary

Closes the code-bound half of #179. The mobile app can now capture audio and ship it to the paired desktop's /api/voice/transcribe (the route landed in PR #244). The remaining work on #179 is QA on physical iOS/Android devices, which has always been the actual blocker — see the 2026-05-11 status comment on #179 for the corrected scoping.

What changed

  • New VoiceScreen tab (apps/mobile/src/screens/VoiceScreen.tsx). Tap-to-record → tap-to-stop → upload → transcript. Six-state machine (idle / denied / recording / processing / result / error) with the recording lifecycle driven by useAudioRecorder from expo-audio. Pulse animation + tabular-numeric timer while recording. Permission-denied state has a "How to fix" affordance pointing the user at system settings.

  • New voice-service.ts pure helpers:

    • audioFileToBase64(uri) reads the recorder's output via expo-file-system's File.base64() API (the SDK 55 idiomatic path).
    • transcribeRecording(client, userId, uri, language?) orchestrates base64 → upload → result mapping with stable error codes (no_audio / read_failed / whisper_unavailable / network / unknown) so the UI can branch on cause without parsing free-form error strings.
  • New transcribeVoice(...) on the API client. Uses the existing request<T>() layer with a 60s timeout override because whisper's first-run model load can take several seconds on cold start — the default 10s would abort mid-transcribe. New TranscribeResponse interface added to the response-type block.

  • Permissions in app.json: NSMicrophoneUsageDescription (iOS), RECORD_AUDIO (Android), expo-audio plugin entry. Permission copy emphasizes "sent to your paired SkyTwin desktop for on-device transcription" so the install prompt matches the privacy story.

  • Tab nav: voice added to the MainTab enum in App.tsx, wired into the renderContent switch, and a "Voice" TabButton placed between Capabilities and Dashboard.

  • Deps: expo-audio: ~55.0.14, expo-file-system: ~55.0.19 (versions match Expo SDK 55's bundledNativeModules.json). The latter was already transitively installed via expo-asset; declared explicitly so the dep is auditable.

What this PR deliberately does NOT do

  • TTS playback. The screen displays the transcript but does not speak responses back. That's a separate flow that pairs with Capability loop #N: Embedded LLM with auto-upgrade path (Phi/Llama/Qwen via llama.cpp; Whisper-tiny STT; Piper TTS) #187 AC#4 (desktop Piper TTS) once a piper binary is on PATH.
  • "Send to twin" hand-off. The transcript is shown but not yet pipelined to the assistant or the decision route. A follow-up will route the transcribed text through the existing assistant route once that mobile surface lands.
  • Physical-device QA. The Expo SDK 55 expo-audio API works in the simulator but real-device behavior (silent-mode switch, AirPods routing, background record interruption) needs hardware to verify.

Test plan

  • 11 new vitest cases in voice-service.test.ts covering:
    • audioFileToBase64: null/empty URI → no_audio, happy path, empty result → no_audio, File.base64() rejection → read_failed.
    • transcribeRecording: 200 happy path, language code forwarding, 503 → whisper_unavailable, 413 → unknown (with detail in message), missing URI short-circuits before fetch is called, network failure → network.
  • The test file mirrors the inlined-class pattern at apps/mobile/src/__tests__/api-client.test.ts:23 (a TestApiClient stub instead of importing the real one) to keep React Native imports out of Node's test runner. Same convention already established for this app.
  • Full mobile suite: 165 tests, 163 passing + 2 skipped (the 2 skips are pre-existing in integration-live.test.ts — discovery tests, unrelated).
  • Full workspace: 70/70 turbo tasks green (pnpm test).
  • pnpm build --concurrency=1 clean.
  • npx tsc --noEmit clean for VoiceScreen.tsx, voice-service.ts, api-client.ts, App.tsx (pre-existing TS errors in discovery.ts, notifications.ts, and api-client.test.ts are unrelated to this PR).

Notes for reviewers

  • The recording lifecycle lives inside VoiceScreen.tsx rather than in the service module because useAudioRecorder is a React hook — pulling it into a plain module would lose the hook's internal teardown and recording-state mirroring. The pieces that are easy to unit-test outside React (base64 conversion + transcribe orchestration with error mapping) live in voice-service.ts and are exercised in the test file.
  • expo-audio is the SDK 55 path (expo-av was removed in SDK 53+). The hook API works cleanly here because the screen owns the recorder.
  • Error codes are deliberately stable strings rather than free-form messages so a future analytics layer or audit-log surface can attribute voice failures without regex-parsing strings.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 11, 2026 15:14

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new mobile “Voice” surface that records audio with Expo Audio, uploads it to the paired desktop’s /api/voice/transcribe, and renders the transcript; includes supporting API client plumbing, permissions, and unit tests for the non-React helpers.

Changes:

  • Introduce VoiceScreen tab with a simple recording → upload → transcript state machine.
  • Add voice-service.ts helpers for URI→base64 conversion and transcribe orchestration with stable UI-facing error codes.
  • Extend the mobile API client with transcribeVoice(...) using a 60s timeout override; wire permissions/deps/config for microphone + file access.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pnpm-lock.yaml Locks new Expo dependencies used by the voice feature.
CHANGELOG.md Documents the new mobile voice recording/transcribe flow and test plan.
apps/mobile/src/services/voice-service.ts Adds testable helpers for base64 conversion and transcription error mapping.
apps/mobile/src/services/api-client.ts Adds transcribeVoice(...) and request timeout override support.
apps/mobile/src/screens/VoiceScreen.tsx New UI tab for recording, uploading, and displaying transcripts.
apps/mobile/src/App.tsx Adds “Voice” tab to the main tab switch + tab bar.
apps/mobile/src/tests/voice-service.test.ts Adds unit tests for voice helpers with mocked file system + fetch.
apps/mobile/package.json Declares new Expo deps (expo-audio, expo-file-system).
apps/mobile/app.json Adds microphone permissions + expo-audio plugin config.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

*/

import { File } from 'expo-file-system';
import { SkyTwinApiClient } from './api-client';

export interface VoiceTranscriptError {
ok: false;
/** Stable code so callers can branch on `permission_denied` vs. generic. */
Comment thread apps/mobile/src/services/api-client.ts Outdated
Comment on lines +261 to +264
* The API tolerates audio up to 25MB base64 (~18MB decoded). The
* recorder hook bounds clip length in the UI, so we don't pre-check
* size here — let the server return the 413 and the screen renders
* the message.
Comment thread apps/mobile/src/screens/VoiceScreen.tsx Outdated
Comment on lines +162 to +167
const openSystemSettings = useCallback(() => {
Alert.alert(
'Microphone access needed',
'Open Settings → SkyTwin and enable Microphone to use voice. We never store recordings anywhere besides your paired desktop.',
);
}, []);
Comment thread apps/mobile/src/screens/VoiceScreen.tsx Outdated
const openSystemSettings = useCallback(() => {
Alert.alert(
'Microphone access needed',
'Open Settings → SkyTwin and enable Microphone to use voice. We never store recordings anywhere besides your paired desktop.',
Comment thread apps/mobile/src/screens/VoiceScreen.tsx Outdated
<View style={styles.resultCard}>
<Text style={styles.resultText}>{state.transcript.trim() || '(silence)'}</Text>
</View>
<Text style={styles.resultMeta}>{formatBytes(state.durationBytes)} of audio</Text>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported


export interface VoiceTranscriptError {
ok: false;
/** Stable code so callers can branch on `permission_denied` vs. generic. */
*/

import { File } from 'expo-file-system';
import { SkyTwinApiClient } from './api-client';
Comment thread apps/mobile/src/screens/VoiceScreen.tsx Outdated
Comment on lines +162 to +167
const openSystemSettings = useCallback(() => {
Alert.alert(
'Microphone access needed',
'Open Settings → SkyTwin and enable Microphone to use voice. We never store recordings anywhere besides your paired desktop.',
);
}, []);
Comment thread apps/mobile/src/services/api-client.ts Outdated
Comment on lines +261 to +264
* The API tolerates audio up to 25MB base64 (~18MB decoded). The
* recorder hook bounds clip length in the UI, so we don't pre-check
* size here — let the server return the 413 and the screen renders
* the message.
jayzalowitz added a commit that referenced this pull request May 11, 2026
…tings, accurate copy + size label

Six Copilot findings on PR #254 addressed:

1. voice-service.ts now uses `import type { SkyTwinApiClient }` —
   the client is only a type reference here, runtime import was
   unnecessary coupling.

2. VoiceTranscriptError docstring no longer references a
   permission_denied code that doesn't exist in the union; mic
   permission is handled inside VoiceScreen before this layer.

3. transcribeVoice docstring corrected: 25MB DECODED (~33MB base64),
   not "25MB base64."

4. openSystemSettings now actually opens the OS settings page via
   Linking.openSettings() with the explanatory alert as fallback.
   One-tap recovery instead of just an explainer.

5. Permission-denial copy acknowledges the temporary on-device audio
   file. The earlier "never stored anywhere besides paired desktop"
   was technically inaccurate (file URI is read back as base64).

6. Result-state UI changed from "{X} of audio" to "Audio size: {X}"
   so the byte count isn't mislabeled as a duration.

Test plan: mobile 163/163 passing (+2 skipped, unrelated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jayzalowitz

Copy link
Copy Markdown
Owner Author

Round-2 review reply: all 4 findings were already addressed in round 1 (commit 05e0449), but Copilot's round-2 read appears to be against the pre-fix state.

Current state on this branch:

  • type-only SkyTwinApiClient import: already import type at line 19 of voice-service.ts.
  • permission_denied docstring: already removed; the comment now lists the actual codes in the union (no_audio, read_failed, whisper_unavailable, network, unknown).
  • 25MB base64 vs decoded: docstring corrected to "25MB decoded (~33MB base64)" (line 261).
  • openSystemSettings Alert vs deep-link: already wired to Linking.openSettings() with the alert as a fallback (commit 05e0449).
  • "bounds clip length" comment: already rewritten to "The mobile recorder has no explicit cap; the screen surfaces the 413 if the user records past the limit" — no longer implies a non-existent guardrail.

jayzalowitz and others added 2 commits May 12, 2026 00:44
Closes the code-bound half of #179. The mobile app can now capture
audio and ship it to the paired desktop's `/api/voice/transcribe` (the
route landed in PR #244). The remaining work is QA on physical devices,
which has always been the actual blocker.

Components:

- New `VoiceScreen` tab with six-state machine (idle, denied, recording,
  processing, result, error). Recording driven by `useAudioRecorder`
  from `expo-audio`. Pulse animation + tabular timer while recording;
  permission-denied state has a "How to fix" affordance.

- New `voice-service.ts` pure helpers: `audioFileToBase64()` reads the
  recorder's output via `expo-file-system`'s `File.base64()` API;
  `transcribeRecording()` orchestrates base64 → upload → result mapping
  with stable error codes the UI branches on.

- New `transcribeVoice(userId, audioBase64, language?)` method on the
  API client; 60s timeout because whisper's first-run model load can
  take several seconds on cold start.

- Permissions added to app.json: NSMicrophoneUsageDescription (iOS) +
  RECORD_AUDIO (Android) + expo-audio plugin entry.

- Deps: expo-audio ~55.0.14, expo-file-system ~55.0.19 (the latter
  was already transitively installed; declared explicitly so the dep
  is auditable).

Out of scope for this PR (deliberate follow-ups): TTS playback
(pairs with #187 AC#4 desktop Piper), "send to twin" hand-off (waits
on mobile assistant surface), physical-device QA on real iOS/Android
hardware.

Test plan: 11 new vitest cases mocking `File.base64()` + fetch.
Mobile suite: 163 passing + 2 skipped (discovery tests, unrelated).
Workspace: 70/70 turbo tasks green; build clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tings, accurate copy + size label

Six Copilot findings on PR #254 addressed:

1. voice-service.ts now uses `import type { SkyTwinApiClient }` —
   the client is only a type reference here, runtime import was
   unnecessary coupling.

2. VoiceTranscriptError docstring no longer references a
   permission_denied code that doesn't exist in the union; mic
   permission is handled inside VoiceScreen before this layer.

3. transcribeVoice docstring corrected: 25MB DECODED (~33MB base64),
   not "25MB base64."

4. openSystemSettings now actually opens the OS settings page via
   Linking.openSettings() with the explanatory alert as fallback.
   One-tap recovery instead of just an explainer.

5. Permission-denial copy acknowledges the temporary on-device audio
   file. The earlier "never stored anywhere besides paired desktop"
   was technically inaccurate (file URI is read back as base64).

6. Result-state UI changed from "{X} of audio" to "Audio size: {X}"
   so the byte count isn't mislabeled as a duration.

Test plan: mobile 163/163 passing (+2 skipped, unrelated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jayzalowitz jayzalowitz force-pushed the jayzalowitz/issue-179-mobile-voice branch from 05e0449 to 348463a Compare May 12, 2026 04:45
@jayzalowitz jayzalowitz merged commit f317cb2 into main May 12, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants