feat(#194 Child 4): accessibility — high-contrast + text-scale + voice STT route#244
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds end-to-end accessibility preferences in the web UI (high-contrast theme, text scaling, reduced motion, voice-first affordances) and introduces a new API surface for server-side voice transcription using the embedded Whisper STT backend.
Changes:
- Web: new high-contrast theme, global focus/skip-link, text-scale + reduced-motion + voice-first preferences persisted in localStorage
- API: new
/api/voice/capabilities/:userIdand/api/voice/transcriberoutes backed by@skytwin/embedded-llm, with tests - Docs/meta: changelog entry and API package dependency update
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| apps/web/public/js/theme-switcher.js | Adds “high-contrast” theme variant metadata for the UI theme selector |
| apps/web/public/js/pages/settings.js | Adds Accessibility settings card and wires preference updates into the settings event delegator |
| apps/web/public/js/app.js | Boots initA11y() early and injects a skip-link on DOMContentLoaded |
| apps/web/public/js/a11y.js | New module to persist/apply text scale, reduced motion override, and voice-first mode |
| apps/web/public/css/themes.css | Adds high-contrast theme variables + global focus rings + text-scale + reduced-motion + skip-link styling |
| apps/api/src/routes/voice.ts | New voice capabilities/transcribe routes with caching and input validation |
| apps/api/src/index.ts | Mounts the new /api/voice router |
| apps/api/src/tests/voice-routes.test.ts | Adds Vitest coverage for voice capabilities + transcribe behaviors |
| apps/api/package.json | Adds @skytwin/embedded-llm dependency needed by the new voice routes |
| CHANGELOG.md | Documents the new a11y features and voice endpoints (adds a new unreleased section) |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
Comments suppressed due to low confidence (1)
apps/web/public/js/pages/settings.js:1
- The
changedelegator handles the two<select>controls before the AI-card guard, but does not handle the voice-first checkbox action. If the click handler path changes or if the checkbox is toggled in a way that only emitschange, this risks not persisting/applying voice-first. Consider handlinga11y-toggle-voice-firsthere (before the ai-provider-card guard) alongside the select handlers.
import { fetchUser, updateTrustTier, fetchOAuthStatus, getGoogleAuthUrl, disconnectProvider, escapeHtml, fetchSettings, updateAutonomySettings, updateIronClawChannel, upsertDomainPolicy, deleteDomainPolicy, createEscalationTrigger, deleteEscalationTrigger, createSession, fetchSessions, revokeSession, saveAIProviders, testAIProvider, fetchRoutines, deleteRoutine } from '../api-client.js';
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+60
to
+65
| const userId = body['userId']; | ||
| const audioBase64 = body['audioBase64']; | ||
| const language = body['language']; | ||
|
|
||
| if (typeof userId !== 'string' || userId.length === 0) { | ||
| res.status(400).json({ error: 'userId required' }); |
|
|
||
| let cachedPort: Promise<EmbeddedSttPort> | null = null; | ||
| function getPort(): Promise<EmbeddedSttPort> { | ||
| if (cachedPort === null) cachedPort = createEmbeddedSttPort(); |
| bytes: audio.length, | ||
| chars: transcript.length, | ||
| }); | ||
| res.json({ transcript, durationBytes: audio.length }); |
Comment on lines
+632
to
+634
| html[data-text-scale="125"] { font-size: 17.5px; } /* 1.25× of 14px */ | ||
| html[data-text-scale="150"] { font-size: 21px; } /* 1.5× of 14px */ | ||
| html[data-text-scale="200"] { font-size: 28px; } /* 2× of 14px */ |
| @@ -1,5 +1,49 @@ | |||
| All notable changes to SkyTwin will be documented in this file. | |||
|
|
|||
| ## [unreleased] — Accessibility: high-contrast + text-scale + voice STT route (#194 Child 4) | |||
| - Multi-language i18n (separate slice; voice route already accepts | ||
| `language`, so the audio side is ready). | ||
|
|
||
| ## [unreleased] — Embedded LLM as a first-class llm-client provider (#187 AC#7) |
Comment on lines
+126
to
+134
| <label for="a11y-text-scale" style="display: block; font-weight: 500; margin-bottom: 0.25rem;">Text size</label> | ||
| <div style="display: flex; gap: 0.5rem; align-items: center;"> | ||
| <select class="form-input" id="a11y-text-scale" style="flex: 1;" data-action="a11y-set-text-scale"> | ||
| <option value="100">Default</option> | ||
| <option value="125">Larger (125%)</option> | ||
| <option value="150">Much larger (150%)</option> | ||
| <option value="200">Maximum (200%)</option> | ||
| </select> | ||
| </div> |
…+ voice STT route Five real implementations: - High-contrast theme variant (4th in the dropdown): pure b/w with bold yellow/blue accents, 2px borders, 3px focus rings. WCAG AAA-aimed. - Text-scale slider: 100/125/150/200% via data-text-scale on <html>; every rem-based size reflows. - Reduced-motion override: respects prefers-reduced-motion by default, user can force on/off via Settings. - Voice-first toggle: persists to localStorage, sets body class so CSS surfaces mic affordances. - /api/voice/transcribe + /api/voice/capabilities/:userId backed by createEmbeddedSttPort() from #238. Real Whisper STT exposed as a reusable HTTP endpoint (mobile #179 will consume it once recording lands; desktop voice-first does today). Sweep: - Skip-link injected as first body child (CSS hides until focused). - Global focus-visible ring across buttons, links, inputs. - prefers-reduced-motion: reduce honored across all themes. - New Accessibility card in Settings with text-size, animations, voice. 8 new voice route tests; api total 423 passing (24 skipped). Out of scope: mobile recording UI (#179 hardware-blocked), axe-core CI gate, multi-language i18n (separate slice). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
daee7e4 to
e6db0e0
Compare
jayzalowitz
added a commit
that referenced
this pull request
May 12, 2026
Closes the code-bound half of #179. The mobile app can now capture audio and ship it to the paired desktop's `/api/voice/transcribe` (the route landed in PR #244). The remaining work is QA on physical devices, which has always been the actual blocker. Components: - New `VoiceScreen` tab with six-state machine (idle, denied, recording, processing, result, error). Recording driven by `useAudioRecorder` from `expo-audio`. Pulse animation + tabular timer while recording; permission-denied state has a "How to fix" affordance. - New `voice-service.ts` pure helpers: `audioFileToBase64()` reads the recorder's output via `expo-file-system`'s `File.base64()` API; `transcribeRecording()` orchestrates base64 → upload → result mapping with stable error codes the UI branches on. - New `transcribeVoice(userId, audioBase64, language?)` method on the API client; 60s timeout because whisper's first-run model load can take several seconds on cold start. - Permissions added to app.json: NSMicrophoneUsageDescription (iOS) + RECORD_AUDIO (Android) + expo-audio plugin entry. - Deps: expo-audio ~55.0.14, expo-file-system ~55.0.19 (the latter was already transitively installed; declared explicitly so the dep is auditable). Out of scope for this PR (deliberate follow-ups): TTS playback (pairs with #187 AC#4 desktop Piper), "send to twin" hand-off (waits on mobile assistant surface), physical-device QA on real iOS/Android hardware. Test plan: 11 new vitest cases mocking `File.base64()` + fetch. Mobile suite: 163 passing + 2 skipped (discovery tests, unrelated). Workspace: 70/70 turbo tasks green; build clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jayzalowitz
added a commit
that referenced
this pull request
May 12, 2026
* feat(#179 voice): mobile voice recording + transcribe pipeline Closes the code-bound half of #179. The mobile app can now capture audio and ship it to the paired desktop's `/api/voice/transcribe` (the route landed in PR #244). The remaining work is QA on physical devices, which has always been the actual blocker. Components: - New `VoiceScreen` tab with six-state machine (idle, denied, recording, processing, result, error). Recording driven by `useAudioRecorder` from `expo-audio`. Pulse animation + tabular timer while recording; permission-denied state has a "How to fix" affordance. - New `voice-service.ts` pure helpers: `audioFileToBase64()` reads the recorder's output via `expo-file-system`'s `File.base64()` API; `transcribeRecording()` orchestrates base64 → upload → result mapping with stable error codes the UI branches on. - New `transcribeVoice(userId, audioBase64, language?)` method on the API client; 60s timeout because whisper's first-run model load can take several seconds on cold start. - Permissions added to app.json: NSMicrophoneUsageDescription (iOS) + RECORD_AUDIO (Android) + expo-audio plugin entry. - Deps: expo-audio ~55.0.14, expo-file-system ~55.0.19 (the latter was already transitively installed; declared explicitly so the dep is auditable). Out of scope for this PR (deliberate follow-ups): TTS playback (pairs with #187 AC#4 desktop Piper), "send to twin" hand-off (waits on mobile assistant surface), physical-device QA on real iOS/Android hardware. Test plan: 11 new vitest cases mocking `File.base64()` + fetch. Mobile suite: 163 passing + 2 skipped (discovery tests, unrelated). Workspace: 70/70 turbo tasks green; build clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(#254 post-/review): type-only client import, real Linking.openSettings, accurate copy + size label Six Copilot findings on PR #254 addressed: 1. voice-service.ts now uses `import type { SkyTwinApiClient }` — the client is only a type reference here, runtime import was unnecessary coupling. 2. VoiceTranscriptError docstring no longer references a permission_denied code that doesn't exist in the union; mic permission is handled inside VoiceScreen before this layer. 3. transcribeVoice docstring corrected: 25MB DECODED (~33MB base64), not "25MB base64." 4. openSystemSettings now actually opens the OS settings page via Linking.openSettings() with the explanatory alert as fallback. One-tap recovery instead of just an explainer. 5. Permission-denial copy acknowledges the temporary on-device audio file. The earlier "never stored anywhere besides paired desktop" was technically inaccurate (file URI is read back as base64). 6. Result-state UI changed from "{X} of audio" to "Audio size: {X}" so the byte count isn't mislabeled as a duration. Test plan: mobile 163/163 passing (+2 skipped, unrelated). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #194 Child 4 a11y commitments + ships a real
/api/voice/transcriberoute backed by the WhisperCppSttBackend from #238. Five concrete additions:[data-text-scale]on<html>— every rem reflows.prefers-reduced-motionby default, user can force on/off./api/voice/transcribe+/capabilities/:userId: real Whisper STT exposed as HTTP endpoint. Mobile Capability loop #G: Mobile parity (Capabilities + Briefing + voice + push) #179 will consume it once recording lands.Plus baseline sweep: skip-link, focus-visible rings, prefers-reduced-motion media query.
Why this fits the theme
"For everyone" is in the architectural identity. A twin that needs 20/20 vision and steady hands isn't a personal AI for everyone. The voice route is the natural consumer of the embedded Whisper backend in #238 — same prompts, same memory, reachable without a keyboard.
Test plan
pnpm --filter @skytwin/api test— 423 passing, 24 skipped (8 new voice tests)pnpm build— all 34 packages cleanOut of scope
language)🤖 Generated with Claude Code