fix(voice): interrupt mascot speech when listening starts#2678
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (14)
✅ Files skipped from review due to trivial changes (9)
🚧 Files skipped from review as they are similar to previous changes (5)
📝 WalkthroughWalkthroughThe ChangesListening override for mascot hook
Mic recording timers, STT, UI, tests, and i18n
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
graycyrus
left a comment
There was a problem hiding this comment.
Clean, well-scoped fix. The listening-as-interruption approach is the right call — bumping the playback sequence guard to invalidate stale TTS callbacks is exactly how the existing turn machinery works, so this slots in naturally.
Good test coverage across the three interruption paths (streaming deltas, active playback, pending synthesis). The effectiveFace simplification from listening && face !== 'speaking' to just listening correctly enforces listening as highest-priority state per #1206.
CI note: Rust Core Coverage and Windows secrets ACL failures are pre-existing on main and unrelated to this frontend-only change. All frontend checks (Vitest, TypeCheck, lint, e2e) pass.
LGTM.
- Auto-stop recording after 60s (MAX_RECORDING_MS) to prevent
accidental infinite recordings
- Show countdown label ("Tap to send (Ns)") when <= 10s remain
- Add structured debug logging to useHumanMascot for all voice
session state transitions (listening interrupt, TTS cancel,
tool_call, inference_start, etc.)
- Add tests for recording timeout and countdown behavior
- Add i18n key mic.tapToSendCountdown to all 13 locale chunk files
Refs tinyhumansai#1206
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
616bc23
… detection - Retry transient STT failures up to 2 times with exponential backoff (500ms, 1000ms) per encoding path (native + WAV fallback) - Skip retries for permanent errors (stale sidecar, empty blob) - Detect low-confidence transcripts heuristically: single char, repeated chars, punctuation-only — surface "could not understand" message instead of submitting garbage - Add isLowConfidenceTranscript() with Unicode support for non-Latin scripts (Cyrillic, Arabic, Devanagari, CJK, Korean) - Add tests for retry behavior, permanent error bypass, low-confidence rejection, and isLowConfidenceTranscript unit tests (82 total passing) - Add i18n key mic.lowConfidenceResult to all 13 locale chunk files Refs tinyhumansai#1206 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
app/src/features/human/MicComposer.test.tsx (2)
497-538: ⚡ Quick winAlways restore fake timers even when the test fails early.
If one of these awaits throws before the trailing
vi.useRealTimers(), later tests inherit fake timers and start failing for unrelated reasons. Move timer restoration intoafterEachor atry/finally.As per coding guidelines, "Keep tests deterministic: avoid real network calls, time-sensitive flakes, or hidden global state."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/features/human/MicComposer.test.tsx` around lines 497 - 538, The tests call vi.useFakeTimers() in the "auto-stops recording after MAX_RECORDING_MS" and "shows countdown label when remaining time <= 10s" specs but restore timers with vi.useRealTimers() only at the end of each test, which can be skipped on failure; move timer restoration into a shared cleanup so fake timers are always reverted. Replace the per-test trailing vi.useRealTimers() calls by adding an afterEach hook (or wrap each test body in try/finally) that calls vi.useRealTimers() and resets any mocks; reference the test file's use of vi.useFakeTimers(), vi.useRealTimers(), and the two it() blocks ("auto-stops recording after MAX_RECORDING_MS" and "shows countdown label when remaining time <= 10s") when making the change.
189-213: ⚡ Quick winDrive the retry tests with controlled timers instead of real backoff waits.
These cases now sleep through the 500ms/1000ms retry delays and rely on 5-10s
waitFortimeouts, which makes the suite slower and more time-sensitive in CI. Please advance the retry timers deterministically instead of waiting for wall-clock time.As per coding guidelines, "Keep tests deterministic: avoid real network calls, time-sensitive flakes, or hidden global state."
Also applies to: 559-605
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/features/human/MicComposer.test.tsx` around lines 189 - 213, Wrap this test in fake timers and drive the retry backoff deterministically instead of waiting real time: call vi.useFakeTimers() before rendering/starting the recording, then after firing the stop event advance the timers with vi.advanceTimersByTime(...) to cover the configured retry delays (e.g. 500ms then 1000ms for the two native retries) and call vi.runOnlyPendingTimers() or vi.runAllTimers() as needed, await any pending microtasks (e.g. await Promise.resolve() or await waitFor(...)) so mocks (transcribeWithFactoryMock and encodeBlobToWavMock) resolve in order, then assert onSubmit and call vi.useRealTimers() at the end; reference transcribeWithFactoryMock, encodeBlobToWavMock, and the MicComposer render/stop interactions to locate where to insert the fake-timer setup/advances.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/features/human/MicComposer.tsx`:
- Around line 61-66: The current symbols-only check uses an explicit regex of
Unicode ranges and omits the Bengali block, causing valid bn transcripts to be
rejected; update the regex in the conditional that tests trimmed (the
/[a-zA-Z\u00C0-...]/.test(trimmed) expression) to either include the Bengali
range \u0980-\u09FF or replace the whole character class with a Unicode-aware
letter class (e.g., use \p{L} with the u flag) so Bengali letters are accepted
by the low-confidence allow-list.
- Around line 488-524: transcribeWithRetry is retrying STT calls without
checking disposedRef, so if the composer is disposed during backoff or an
in-flight transcribe the code can still proceed to finalizeRecording and call
onError/onSubmit/setState on a dead composer; modify transcribeWithRetry to
check disposedRef.current (or an equivalent disposed flag) at these points: at
the top of each loop iteration, immediately after any await (both after the
backoff Promise and after await transcribeWithFactory), and before returning a
successful transcription, and if disposed abort the retry (throw a
cancelled/abort error or return a sentinel) so callers (finalizeRecording) do
not run side effects on a disposed composer (references: transcribeWithRetry,
STT_MAX_RETRIES, STT_RETRY_BASE_MS, transcribeWithFactory, disposedRef,
finalizeRecording, onError/onSubmit/setState).
- Around line 467-469: The current composerLog call in the
isLowConfidenceTranscript branch logs the raw transcript text; instead, stop
logging user speech and log only safe metadata (e.g., transcript length,
confidence classification, or drop the payload). Update the block around
isLowConfidenceTranscript(...) to remove or replace JSON.stringify(transcript)
with non-sensitive metadata (for example transcript.length or a confidence
label) before calling composerLog, and keep the existing
onError?.(t('mic.lowConfidenceResult')) behavior unchanged.
---
Nitpick comments:
In `@app/src/features/human/MicComposer.test.tsx`:
- Around line 497-538: The tests call vi.useFakeTimers() in the "auto-stops
recording after MAX_RECORDING_MS" and "shows countdown label when remaining time
<= 10s" specs but restore timers with vi.useRealTimers() only at the end of each
test, which can be skipped on failure; move timer restoration into a shared
cleanup so fake timers are always reverted. Replace the per-test trailing
vi.useRealTimers() calls by adding an afterEach hook (or wrap each test body in
try/finally) that calls vi.useRealTimers() and resets any mocks; reference the
test file's use of vi.useFakeTimers(), vi.useRealTimers(), and the two it()
blocks ("auto-stops recording after MAX_RECORDING_MS" and "shows countdown label
when remaining time <= 10s") when making the change.
- Around line 189-213: Wrap this test in fake timers and drive the retry backoff
deterministically instead of waiting real time: call vi.useFakeTimers() before
rendering/starting the recording, then after firing the stop event advance the
timers with vi.advanceTimersByTime(...) to cover the configured retry delays
(e.g. 500ms then 1000ms for the two native retries) and call
vi.runOnlyPendingTimers() or vi.runAllTimers() as needed, await any pending
microtasks (e.g. await Promise.resolve() or await waitFor(...)) so mocks
(transcribeWithFactoryMock and encodeBlobToWavMock) resolve in order, then
assert onSubmit and call vi.useRealTimers() at the end; reference
transcribeWithFactoryMock, encodeBlobToWavMock, and the MicComposer render/stop
interactions to locate where to insert the fake-timer setup/advances.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bbabef3a-74a5-477c-b49c-b0f12ff74bb9
📒 Files selected for processing (16)
app/src/features/human/MicComposer.test.tsxapp/src/features/human/MicComposer.tsxapp/src/lib/i18n/chunks/ar-3.tsapp/src/lib/i18n/chunks/bn-3.tsapp/src/lib/i18n/chunks/de-3.tsapp/src/lib/i18n/chunks/en-3.tsapp/src/lib/i18n/chunks/es-3.tsapp/src/lib/i18n/chunks/fr-3.tsapp/src/lib/i18n/chunks/hi-3.tsapp/src/lib/i18n/chunks/id-3.tsapp/src/lib/i18n/chunks/it-3.tsapp/src/lib/i18n/chunks/ko-3.tsapp/src/lib/i18n/chunks/pt-3.tsapp/src/lib/i18n/chunks/ru-3.tsapp/src/lib/i18n/chunks/zh-CN-3.tsapp/src/lib/i18n/en.ts
✅ Files skipped from review due to trivial changes (8)
- app/src/lib/i18n/chunks/id-3.ts
- app/src/lib/i18n/chunks/it-3.ts
- app/src/lib/i18n/chunks/zh-CN-3.ts
- app/src/lib/i18n/chunks/fr-3.ts
- app/src/lib/i18n/en.ts
- app/src/lib/i18n/chunks/es-3.ts
- app/src/lib/i18n/chunks/de-3.ts
- app/src/lib/i18n/chunks/en-3.ts
|
Update after commit I addressed the review feedback:
Local validation: Current PR check snapshot:
Ready for re-review once the remaining CI lanes finish and the timeout-canceled Windows secrets lane is rerun. |
Signed-off-by: sunilkumarvalmiki <g.sunilkumarvalmiki@gmail.com>
|
Actionable comments posted: 0 |
sanil-23
left a comment
There was a problem hiding this comment.
Round-3 shepherd review: implementation looks good; CI signal (current or expected after re-run) clean. Approving on behalf of sanil-23.
Summary
listeningstate interrupt mascot reply speech instead of letting TTS continue over the user.text_deltaand completed-reply TTS from switching the mascot back to speaking while the mic is active.Problem
#1206 calls out turn-taking and interruption as part of making the mascot voice experience feel deliberate. The current hook comments already describe
listeningas the highest-priority state, but the implementation did not enforce that during speech.Before this change, a focused hook reproduction on
maindid the following:useHumanMascot({ speakReplies: true, listening })with mocked TTS playback.speakingwith an active playback handle.listeningtotrueto simulate the microphone becoming active.Expected: playback stops, face becomes
listening, and the mouth resets to rest.Actual:
stop()was not called and the hook stayed on the speaking path.Solution
listeningRefso chat-event callbacks can observe the latest mic-hot state without resubscribing to the chat event stream.text_deltaandchat_donespeech transitions while listening is active.listeningeffect that treats mic activation as an interruption: bump the TTS sequence guard, stop any active playback, attach the stop-sentinel catch, clear viseme frames/cursor/target, and return the underlying face to idle.listeningplusRESTviseme whenever the external listening override is active.Submission Checklist
## Related— N/A: no matrix feature ID applies.docs/RELEASE-MANUAL-SMOKE.md) — N/A: no release-cut surface touched.Closes #NNNin the## Relatedsection — N/A: Polish mascot-driven TTS and STT voice experience #1206 is a broad voice-polish roadmap item; this PR addresses only the interruption/turn-taking slice and should not close the whole issue.Impact
Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
codex/OH-1206-mascot-listening-interrupte5c6411fa228a9c8d0259be2586cbb30fbf52e31Validation Run
pnpm --filter openhuman-app format:check— passed.pnpm typecheck— passed.pnpm --dir app exec vitest run --config test/vitest.config.ts src/features/human/useHumanMascot.test.ts src/features/human/useHumanMascot.lipsync.test.ts— 2 files passed, 41 tests passed.pnpm --filter openhuman-app rust:check— passed; no Rust files changed.pnpm --filter openhuman-app format:checkandpnpm --filter openhuman-app rust:check; no Tauri files changed.pnpm --filter openhuman-app lint— exit 0 with existing repo warnings, no errors;node scripts/codex-pr-preflight.mjs --lightweight— passed;git diff --check HEAD~1..HEAD— passed.Validation Blocked
command:pre-push hook steppnpm --filter openhuman-app lint:commands-tokenson native Windows.error:the package script isbash -c 'command -v rg ... || { ... }; ! rg ...'; when invoked through the Windows pre-push path it failed before checking files withThe system cannot find the path specified.and'{'' is not recognized as an internal or external command.impact:usedgit push --no-verifyafter running the equivalent PowerShellrg -nU "(bg|text|border|ring|shadow)-(neutral|primary|sage|amber|canvas|stone|slate)" app/src/components/commands/check, which passed with no disallowed command-palette token classes found. This PR does not touchapp/src/components/commands/.Behavior Changes
Parity Contract
Duplicate / Superseded PR Handling
1206.Summary by CodeRabbit
New Features
Bug Fixes
Tests
Localization