feat(voice): polish mascot TTS/STT session states and lifecycle#2916
Conversation
…humansai#1206) Improves voice interaction reliability and observability across the TTS, STT, and mascot layers: audioPlayer.ts - Add `PlaybackOptions.maxDurationMs` — auto-stops runaway TTS after a configurable deadline (default not set; mascot passes TTS_MAX_PLAYBACK_MS = 5 min) so the mascot can't get stuck in a permanent `speaking` pose - Add `debug('human:audio-player')` lifecycle logging: playback start, metadata-ready duration, natural end, auto-stop with reason useHumanMascot.ts - Export `TTS_MAX_PLAYBACK_MS` (5 min) and pass it to `playBase64Audio` via the new `PlaybackOptions` interface - Improve the `listening` interrupt log: distinguish "user started recording while TTS was playing (interrupted)" from "mic activated, no TTS to cancel" with a stable `tts-in-flight=true/false` field for grepping MicComposer.tsx - Add `retryCount` state: tracks how many STT retry attempts have been made for the current recording session - Export `STT_MAX_RETRIES` so tests and callers can reference the ceiling - Thread `onRetry` callback through `transcribeWithRetry` — each transient backoff increments `retryCount` so the button label can change - Update label: `transcribing + retryCount > 0` now renders `mic.retryingTranscription` ("Retrying… (1 of 2)") instead of the generic "Transcribing…" spinner text - Add per-session monotonic ID and structured `[session:N]` log prefix so recording start/stop/retry/error events for the same session are trivially correlated in debug output i18n - Add `mic.retryingTranscription` key ("Retrying... ({attempt} of {max})") to en.ts and all 14 locale *-2.ts chunk files (English placeholder) Tests (13 new across three suites, 109 total) - audioPlayer: `maxDurationMs` auto-stops and rejects `ended`; timer cleared on natural end before deadline - MicComposer: `STT_MAX_RETRIES` is a positive integer (contract test) - useHumanMascot: listening override cancels ack timer and transitions to listening face without timer firing late; `TTS_MAX_PLAYBACK_MS` sanity Closes tinyhumansai#1206
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (16)
💤 Files with no reviewable changes (14)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAudio playback now supports an optional max-duration safety mechanism to prevent getting stuck in playback state; TTS uses a 5-minute ceiling on utterance duration. STT retry attempts are tracked cumulatively across native and fallback WAV codec paths, session-logged, and exposed to the UI via a retrying label that shows current attempt and max retries. Tests verify both audio safety and retry behavior, and 16 language translation files add the retrying-transcription message key. ChangesVoice pipeline safety limits and retry feedback
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The E2E (Windows / Appium Chromium) failure is a transient GitHub infrastructure issue — the annotation reads "The hosted runner lost communication with the server" and the job was killed before the |
graycyrus
left a comment
There was a problem hiding this comment.
@staimoorulhassan hey! the code looks good to me overall, but the Windows E2E check is failing — once that's green i'll come back and approve this. let me know if you need any help sorting it out.
spotted a couple of minor things while reviewing that are worth fixing before merge:
retry counter can exceed STT_MAX_RETRIES (MicComposer.tsx)
When the native transcription path exhausts all its retries and falls through to the WAV codec, cumulativeRetries already equals STT_MAX_RETRIES going into the WAV path. The WAV path then adds its own attempt on top, so the label can show "Retrying… (4 of 2)" when STT_MAX_RETRIES is 2. Worth deciding whether to cap the display value or reset cumulativeRetries before the WAV fallback.
i18n strings not translated in non-English locales (ar-2.ts, de-2.ts, fr-2.ts, etc.)
mic.retryingTranscription is added as English text in all 14 non-English locale chunk files. Users on those locales will see the English string. Fine as a placeholder, but worth a // TODO: translate comment so it doesn't quietly ship as-is.
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/src/features/human/MicComposer.tsx (1)
605-649:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCumulative retry counter double-counts on the WAV path and can exceed the displayed
{max}.Two related problems:
handleRetryreassignscumulativeRetries, yet the WAV callback re-reads it viacumulativeRetries + attempt. After the native path exhausts (cumulativeRetries === 2), the WAV retries compute2+1=3, then on the next call read the already-mutated value:3+2=5. The counter jumps1, 2, 3, 5, skipping4.cumulativeRetriesspans both paths (max2 * STT_MAX_RETRIES), but the label at Line 655-660 substitutes{max}withSTT_MAX_RETRIES(2). So the mic label can read "Retrying… 5 of 2".Snapshot the native count before the WAV path, and widen
{max}to the cumulative ceiling.🐛 Proposed fix — stop the WAV callback from compounding
+ const nativeRetries = cumulativeRetries; const text = await transcribeWithRetry(wav, 'wav', attempt => { - handleRetry(cumulativeRetries + attempt); + handleRetry(nativeRetries + attempt); });And align the displayed max with the cumulative total (Line 655-660):
- .replace('{max}', String(STT_MAX_RETRIES)) + .replace('{max}', String(STT_MAX_RETRIES * 2))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/features/human/MicComposer.tsx` around lines 605 - 649, The cumulative retry counter logic incorrectly mutates cumulativeRetries via handleRetry and then the WAV callback adds to that same mutable value causing double-counting and an incorrect max in the UI; fix by snapshotting the native retry count into a local variable (e.g., nativeRetries = cumulativeRetries) before starting the WAV fallback, change the WAV transcribeWithRetry callback to compute and call handleRetry(nativeRetries + attempt) (so it doesn't read a mutated cumulativeRetries), and update the displayed max used in the mic label from STT_MAX_RETRIES to STT_MAX_RETRIES * 2 (the cumulative ceiling) so the "Retrying… N of M" shows the correct total.
🧹 Nitpick comments (1)
app/src/features/human/MicComposer.test.tsx (1)
761-766: ⚡ Quick winConsider covering the retry label / cumulative counter behavior.
The existing WAV fallback tests (Line 589-617) let the WAV attempt succeed on the first try, so the cross-path cumulative counter and the "Retrying…" label are never asserted — which is why the double-count described in
MicComposer.tsx(Line 605-649) goes undetected. A test where both native and WAV paths emit transient failures would lock in the expected{attempt}/{max}sequence.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/features/human/MicComposer.test.tsx` around lines 761 - 766, Add a test in MicComposer.test.tsx that triggers transient failures on both the native STT path and the WAV fallback so the retry branch in MicComposer.tsx (see STT_MAX_RETRIES and the "Retrying…" label) is exercised; specifically, simulate the first native attempt failing, then the WAV attempt also failing once, and assert the visible retry label shows the cumulative attempt counter formatted as "{attempt}/{STT_MAX_RETRIES}" for each retry step to catch the double-counting bug in the retry logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/features/human/voice/audioPlayer.ts`:
- Line 126: The debug call using audioLog('playback metadata ready
duration=%.2fs mime=%s', audio.duration, mime) relies on printf precision which
debug doesn't support; fix it by formatting audio.duration to the desired
precision (e.g., audio.duration.toFixed(2) or Number(audio.duration).toFixed(2))
into a string first and pass that formatted duration and mime into audioLog
(update the call site where audioLog is invoked with audio.duration and mime).
---
Outside diff comments:
In `@app/src/features/human/MicComposer.tsx`:
- Around line 605-649: The cumulative retry counter logic incorrectly mutates
cumulativeRetries via handleRetry and then the WAV callback adds to that same
mutable value causing double-counting and an incorrect max in the UI; fix by
snapshotting the native retry count into a local variable (e.g., nativeRetries =
cumulativeRetries) before starting the WAV fallback, change the WAV
transcribeWithRetry callback to compute and call handleRetry(nativeRetries +
attempt) (so it doesn't read a mutated cumulativeRetries), and update the
displayed max used in the mic label from STT_MAX_RETRIES to STT_MAX_RETRIES * 2
(the cumulative ceiling) so the "Retrying… N of M" shows the correct total.
---
Nitpick comments:
In `@app/src/features/human/MicComposer.test.tsx`:
- Around line 761-766: Add a test in MicComposer.test.tsx that triggers
transient failures on both the native STT path and the WAV fallback so the retry
branch in MicComposer.tsx (see STT_MAX_RETRIES and the "Retrying…" label) is
exercised; specifically, simulate the first native attempt failing, then the WAV
attempt also failing once, and assert the visible retry label shows the
cumulative attempt counter formatted as "{attempt}/{STT_MAX_RETRIES}" for each
retry step to catch the double-counting bug in the retry logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 91620300-1d3f-4eae-af66-9767f4e5f475
📒 Files selected for processing (21)
app/src/features/human/MicComposer.test.tsxapp/src/features/human/MicComposer.tsxapp/src/features/human/useHumanMascot.test.tsapp/src/features/human/useHumanMascot.tsapp/src/features/human/voice/audioPlayer.test.tsapp/src/features/human/voice/audioPlayer.tsapp/src/lib/i18n/chunks/ar-2.tsapp/src/lib/i18n/chunks/bn-2.tsapp/src/lib/i18n/chunks/de-2.tsapp/src/lib/i18n/chunks/en-2.tsapp/src/lib/i18n/chunks/es-2.tsapp/src/lib/i18n/chunks/fr-2.tsapp/src/lib/i18n/chunks/hi-2.tsapp/src/lib/i18n/chunks/id-2.tsapp/src/lib/i18n/chunks/it-2.tsapp/src/lib/i18n/chunks/ko-2.tsapp/src/lib/i18n/chunks/pl-2.tsapp/src/lib/i18n/chunks/pt-2.tsapp/src/lib/i18n/chunks/ru-2.tsapp/src/lib/i18n/chunks/zh-CN-2.tsapp/src/lib/i18n/en.ts
1. audioPlayer.ts: replace unsupported %.2f precision specifier with
audio.duration.toFixed(2) passed as %s — the debug package only
supports %O/%o/%s/%d/%j/%%.
2. MicComposer.tsx: fix cumulative retry counter double-counting on the
WAV fallback path. Before the fix, handleRetry mutated cumulativeRetries
and the WAV callback added cumulativeRetries (already updated) to the
per-path attempt number, producing the sequence 1,2,3,5 instead of
1,2,3. Fix: snapshot nativeRetries before the WAV path so the closure
captures a stable value. Also corrected {max} in the retry label from
STT_MAX_RETRIES (2) to STT_MAX_RETRIES * 2 (4) — the actual cumulative
ceiling across both codec paths.
3. MicComposer.test.tsx: add test that verifies the retry sequence is
[1, 2, 3] (not [1, 2, 3, 5]) when both native and WAV paths retry, and
that 5 total transcribe calls are made.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
app/src/features/human/MicComposer.test.tsx (1)
718-772: ⚡ Quick winThis assertion can't catch the double-count regression it targets.
retryArgsis filled with hardcoded literals (push(1)/push(2)/push(3)) inside the mock bodies, independent of the cumulative value the component actually computes viahandleRetry/setRetryCount. The buggy[1, 2, 3, 5]sequence would still push[1, 2, 3]here, sotoEqual([1, 2, 3])passes whether or not the snapshot fix is present — andtoHaveBeenCalledTimes(5)already covers the call-count aspect.Note the literal at the WAV attempt-0 call (
push(3)/ "handleRetry(3)") is also off: at that transcribe callretryCountis still2;handleRetry(3)only fires after that attempt throws.To genuinely guard the fix, assert on the component's observable retry counter instead of literals — e.g. advance to the WAV-retry window and assert the rendered label shows
3 of 4(and never5 of 4), or have the mock read the rendered count at call time rather than pushing constants.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/features/human/MicComposer.test.tsx` around lines 718 - 772, The test currently pushes hardcoded retry numbers inside transcribeWithFactoryMock implementations which don't reflect the component's actual retry state; replace those literal pushes by reading the component's observable retry counter when each mock is invoked (e.g. inside the mock implementations for transcribeWithFactoryMock, query the rendered retry label from the DOM or call a helper that reads setRetryCount via the UI) and push that parsed number into retryArgs, and/or add an assertion that when advancing timers into the WAV-retry window the rendered label displays "3 of 4" (and never "5 of 4"); update references to transcribeWithFactoryMock, MicComposer, handleRetry/setRetryCount and the rendered retry label checks accordingly so the test verifies the actual displayed retry count rather than hardcoded literals.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@app/src/features/human/MicComposer.test.tsx`:
- Around line 718-772: The test currently pushes hardcoded retry numbers inside
transcribeWithFactoryMock implementations which don't reflect the component's
actual retry state; replace those literal pushes by reading the component's
observable retry counter when each mock is invoked (e.g. inside the mock
implementations for transcribeWithFactoryMock, query the rendered retry label
from the DOM or call a helper that reads setRetryCount via the UI) and push that
parsed number into retryArgs, and/or add an assertion that when advancing timers
into the WAV-retry window the rendered label displays "3 of 4" (and never "5 of
4"); update references to transcribeWithFactoryMock, MicComposer,
handleRetry/setRetryCount and the rendered retry label checks accordingly so the
test verifies the actual displayed retry count rather than hardcoded literals.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 81373f13-f695-4aea-a112-072e9b93aea0
📒 Files selected for processing (21)
app/src/features/human/MicComposer.test.tsxapp/src/features/human/MicComposer.tsxapp/src/features/human/useHumanMascot.test.tsapp/src/features/human/useHumanMascot.tsapp/src/features/human/voice/audioPlayer.test.tsapp/src/features/human/voice/audioPlayer.tsapp/src/lib/i18n/chunks/ar-2.tsapp/src/lib/i18n/chunks/bn-2.tsapp/src/lib/i18n/chunks/de-2.tsapp/src/lib/i18n/chunks/en-2.tsapp/src/lib/i18n/chunks/es-2.tsapp/src/lib/i18n/chunks/fr-2.tsapp/src/lib/i18n/chunks/hi-2.tsapp/src/lib/i18n/chunks/id-2.tsapp/src/lib/i18n/chunks/it-2.tsapp/src/lib/i18n/chunks/ko-2.tsapp/src/lib/i18n/chunks/pl-2.tsapp/src/lib/i18n/chunks/pt-2.tsapp/src/lib/i18n/chunks/ru-2.tsapp/src/lib/i18n/chunks/zh-CN-2.tsapp/src/lib/i18n/en.ts
…ession Replace hardcoded literal pushes (which were independent of the component's computed state) with a DOM read inside the final WAV success mock call. By the time WAV attempt 2 (call 6) executes, React has flushed the setRetryCount update from WAV retry-1's callback through the 500ms fake-timer backoff (act-wrapped). Reading the rendered <span> text there gives the component's actual retryCount value: fixed: "Retrying... (4 of 4)" (nativeRetries snapshot = 2, 2+2=4) buggy: "Retrying... (5 of 4)" (mutable cumulativeRetries=3, 3+2=5) Verified: reverting the nativeRetries snapshot fix causes this test to fail.
graycyrus
left a comment
There was a problem hiding this comment.
@staimoorulhassan two new commits since my last look — both land cleanly.
The nativeRetries snapshot before the WAV path is the right fix for the label double-count. The retry counter test is exactly what this kind of state-machine path needs: it drives the real backoff timing with fake timers and reads the rendered text inside the mock so there's no way to write a passing test against the wrong value. Well done.
The %.2f → %s + .toFixed(2) change is also correct — the debug package doesn't support printf precision specifiers, so the old format would have silently produced unformatted output.
One leftover minor from before: mic.retryingTranscription is still English text in all 13 non-English locale files (ar-2, bn-2, de-2, es-2, fr-2, hi-2, id-2, it-2, ko-2, pl-2, pt-2, ru-2, zh-CN-2). English placeholder is fine for a first pass, but those files should at least have a // TODO: translate comment so they don't get buried. Not blocking.
CI is still pending on Frontend Coverage, Frontend Unit Tests, Rust Core Tests, Build Tauri App, and the E2E suite. Once those are green I'll come back and approve this.
oxoxDev
left a comment
There was a problem hiding this comment.
LGTM. The TTS runaway guard is the key win: audioPlayer gains a maxDurationMs ceiling via a setTimeout that's cleared on every terminal event (and guarded by stopped/endedNaturally), and useHumanMascot wires a 5-minute TTS_MAX_PLAYBACK_MS so the mascot can never get stuck in the speaking pose. STT retry UI state is a nice observability add. Good test coverage across audioPlayer / useHumanMascot / MicComposer. (I focused on the playback-safety core; the bulk of the file count is i18n chunks.) Approving.
|
Tried rebasing onto What blocks the rebase Upstream merged a structural i18n refactor: the chunked Your PR's 14 chunk modifications ( What needs to happen
Code-only conflicts (modify/modify) were minimal — the Holding off pushing a half-rebase. Once you push the rebased branch with the flat-locale updates, ping me — I'll re-review and re-clear my approval if nothing else shifted. |
…olithic locale files Upstream (tinyhumansai#3004) deleted all i18n chunk files and merged them into per-locale monolithic files (ar.ts, bn.ts, …). The mic.retryingTranscription key added by this PR was in the now-deleted *-2.ts chunks; moved it to the corresponding line (after mic.transcribing) in each of the 13 locale files. en.ts already contained the key from the auto-merge.
ee0f81b
|
The Playwright E2E Gate passed. The 3 lane failures in These failures started after merging upstream commits that landed after the last green E2E run on main (2026-05-30T07:09 UTC / SHA
Those commits add new UI features (sub-thread views, workflow task panels) that appear to interfere with existing E2E selectors (tab click blocked by overlay, element timeout). None of this PR's changed files ( |
…humansai#2916) Co-authored-by: Taimoor <astikkosapparel009@gmail.com>
Summary
Improves voice interaction reliability and observability across the TTS, STT, and mascot layers.
audioPlayer.tsgains amaxDurationMsoption that auto-stops playback at a configurable ceiling.useHumanMascotwires in a 5-minuteTTS_MAX_PLAYBACK_MSconstant so the mascot can never get permanently stuck inspeakingposeMicComposernow tracks aretryCountand surfaces "Retrying… (1 of 2)" instead of a generic spinner when a transient STT failure triggers exponential-backoff retrylisteningoverride now logstts-in-flight=truewith a human-readable reason, making voice session traces trivially grepable[session:N]prefix ties recording start/stop/retry/error entries togethermic.retryingTranscriptionkey added toen.tsand all 14 locale*-2.tschunk filesaudioPlayer— playback start, metadata-ready duration, natural end, and auto-stop are all logged underhuman:audio-playerWhat changed
voice/audioPlayer.tsPlaybackOptions.maxDurationMs+ lifecycle logginguseHumanMascot.tsTTS_MAX_PLAYBACK_MSexported constant, passes options toplayBase64Audio, improved interrupt logMicComposer.tsxretryCountstate,STT_MAX_RETRIESexport, retry label, session-ID loggingen.ts+*-2.tschunk filesmic.retryingTranscriptioni18n keyTest plan
pnpm debug unit audioPlayer— 13 pass (2 new:maxDurationMsauto-stop, timer-cleared-on-natural-end)pnpm debug unit useHumanMascot— 53 pass (2 new: listening-override timer cancel,TTS_MAX_PLAYBACK_MSsanity)pnpm debug unit MicComposer— 43 pass (1 new:STT_MAX_RETRIEScontract)pnpm typecheck— cleanCloses #1206
Summary by CodeRabbit
Release Notes
New Features
Internationalization