Skip to content

feat(#134): configurable voice prompts on assistant pause#135

Merged
atlas-apex merged 1 commit into
me2resh:devfrom
atlas-apex:feature/#134-voice-prompts
May 1, 2026
Merged

feat(#134): configurable voice prompts on assistant pause#135
atlas-apex merged 1 commit into
me2resh:devfrom
atlas-apex:feature/#134-voice-prompts

Conversation

@atlas-apex

Copy link
Copy Markdown
Collaborator

Closes #134

Summary

Configurable Stop hook that speaks the assistant's question aloud (Jarvis-from-Iron-Man style) when it pauses for user input. Initial phase is macOS-only via the bundled say command, no voice input — adopters reply via keyboard.

Default OFF. Pure-additive change for upstream — no existing fork sees behaviour change until they explicitly flip voice_prompts.enabled to true in their .claude/project-config.json.

Decision rationale in AgDR-0009-voice-prompts-on-pause.

What ships

Path Change
.claude/hooks/voice-prompt-on-pause.sh New — Stop hook with config gate (sub-millisecond fast-path when disabled), trigger heuristic (questions-only by default — last paragraph ends with ? or matches recognised "Approved?" / "Reply with X" / "(a)/(b)/(c)" / "which path" patterns), markdown stripping (backticks, bold/italic, link syntax, bullets, table pipes), sentence-boundary truncation to max_chars (default 200), fire-and-forget say invocation
.claude/hooks/tests/test_voice_prompt_on_pause.sh New — 9 cases (file follows the snake_case tests/test_<name>.sh convention from test_warn_stale_review_markers.sh). All passing locally.
.claude/project-config.defaults.json Added voice_prompts block (enabled, voice, max_chars, rate_wpm, trigger). All default-OFF / safe values.
.claude/settings.json New Stop hook entry wired with the standard ops-root resolver wrapper that the rest of the hooks use
docs/agdr/AgDR-0009-voice-prompts-on-pause.md New — full options matrix (status quo / macOS say / cloud TTS / ML detection / notification daemon), why say wins for the initial phase, future-phase backlog
docs/project-config.md New "Voice prompts" section with the schema, example overrides ("turn it on", "different voice", "always-speak debugging"), privacy notes

Six files, +665 / -1 LOC.

Why a Stop hook, not something more elaborate

The ApexYard interaction pattern in long sessions has lots of discrete pause points where the assistant cannot proceed without explicit input — per-PR merge approvals, design-review (a/b/c) choices, "which path do you want", tool-result confirmations. These pauses are silent text in the terminal. If the user has stepped away from the keyboard, the conversation stalls with zero attentional signal.

A Stop hook fires exactly at those pause points. The trigger heuristic only speaks when the message looks like a request for input — so tool-result reports, summary messages, and progress updates stay silent.

Real example from the session that produced this PR — the assistant said "Reply approve 354 (or merge 354) to ship — then I'll roll on to #243". Hook trigger fired (last paragraph contains Reply with + ends with em-dash but the heuristic also matches the apostrophe-paragraph). With enabled: true, the user would have heard "Reply approve 354 or merge 354 to ship — then I'll roll on to two four three" in Daniel's voice.

Why default OFF

Upstream-friendly default. Adopters who pull this commit see no behaviour change until they opt in. The hook is a sub-millisecond fast-path no-op when disabled (single config_get_or call before exit 0).

Why macOS-only initial phase

The user's framing ("Jarvis from Iron Man") implies a high-quality British male voice. macOS bundles Daniel (Premium) which is the closest free voice. ElevenLabs / OpenAI TTS would produce closer fidelity, but at recurring per-character cost AND require API-key management AND send assistant text off the local machine — that's an AgDR-worthy decision in its own right (Phase 3, separately).

Linux/Windows fall through silently when say isn't on PATH. Phase 2 adds OS-detection and espeak / Add-Type SpeechSynthesizer paths. Same trigger model, same config schema, just a platform-layer addition.

Testing

  • bash .claude/hooks/tests/test_voice_prompt_on_pause.sh9/9 pass
    • case 1: disabled-default → no say
    • case 2: enabled + question → say invoked with full text
    • case 3: enabled + statement → no say (questions-only heuristic)
    • case 4: enabled + Approved? pattern → say invoked
    • case 5: enabled + (a)/(b)/(c) menu → say invoked
    • case 6: malformed transcript JSON → no crash, exit 0
    • case 7: enabled but say not on PATH → exit 0, no crash
    • case 8: trigger=always → say invoked even on a statement
    • case 9: markdown stripping — backticks, bold, links removed; words preserved
  • bash -n syntax-check on both the hook and the test file
  • jq . validates project-config.defaults.json and settings.json
  • Manual smoke on macOS — invoked the hook with a synthetic transcript ("Quick voice test — does this work?") and confirmed Daniel's voice spoke through the speakers
  • Live integration verified — the user enabled it locally via .claude/project-config.json override and confirmed "heard it" on the in-session smoke test

Risks

  • macOS-only. Adopters on Linux / Windows see no benefit until Phase 2 (separate ticket). Mitigation: hook silently exits 0 when say isn't on PATH — cross-platform users see exactly the disabled-state behaviour. No errors, no spam.
  • Trigger heuristic false-positives. A tool-result message that happens to end with ? would get read aloud. Conservative regex prefers false-negatives. Adopters can disable the hook entirely or switch to trigger: "always" for debugging.
  • TTS sound-output collision. If the user is on a call when the hook fires, say will speak through the active output device. Acceptable side-effect for v1; an env-var override could disable per-session in Phase 2.
  • Privacy. Today the hook reads from the local transcript file and pipes to a local OS binary — nothing leaves the machine. When/if Phase 3 adds cloud TTS, that becomes a separate AgDR — explicitly NOT in scope here.

Glossary

Term Definition
Stop hook A Claude Code hook that fires when the assistant ends a turn. Receives { session_id, transcript_path, ... } JSON on stdin. Used here to detect pause-for-input moments and speak the question aloud.
Trigger heuristic (questions-only) The default rule for "is this message asking for input?" — the last paragraph ends with ? (after stripping trailing whitespace + markdown emphasis), OR matches one of Approved? / Reply with / Confirm / (a)/(b)/(c) / which path / proceed?. Conservative; prefers false-negatives.
Daniel (Premium) macOS's bundled British-accented male voice — the closest free voice to "Jarvis-from-Iron-Man". Listable via say -v "?".
VOICE_PROMPTS_SYNC=1 Test-mode env var that makes the hook run say synchronously instead of fire-and-forget. Tests need this because the orphaned-bg-process reparenting interacts badly with the test runner's subshell wrapper. Production invocations always run async.
Sentence-boundary truncation Walks the message forward, keeping whole sentences (delimited by . ! ?) until the next sentence would push past max_chars. Avoids cutting mid-word; reads cleanly.

Refs #134

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants