feat(#134): configurable voice prompts on assistant pause by atlas-apex · Pull Request #135 · me2resh/apexyard

atlas-apex · 2026-04-26T08:30:11Z

Closes #134

Summary

Configurable Stop hook that speaks the assistant's question aloud (Jarvis-from-Iron-Man style) when it pauses for user input. Initial phase is macOS-only via the bundled say command, no voice input — adopters reply via keyboard.

Default OFF. Pure-additive change for upstream — no existing fork sees behaviour change until they explicitly flip voice_prompts.enabled to true in their .claude/project-config.json.

Decision rationale in AgDR-0009-voice-prompts-on-pause.

What ships

Path	Change
`.claude/hooks/voice-prompt-on-pause.sh`	New — Stop hook with config gate (sub-millisecond fast-path when disabled), trigger heuristic (`questions-only` by default — last paragraph ends with `?` or matches recognised "Approved?" / "Reply with X" / "(a)/(b)/(c)" / "which path" patterns), markdown stripping (backticks, bold/italic, link syntax, bullets, table pipes), sentence-boundary truncation to `max_chars` (default 200), fire-and-forget `say` invocation
`.claude/hooks/tests/test_voice_prompt_on_pause.sh`	New — 9 cases (file follows the snake_case `tests/test_<name>.sh` convention from `test_warn_stale_review_markers.sh`). All passing locally.
`.claude/project-config.defaults.json`	Added `voice_prompts` block (enabled, voice, max_chars, rate_wpm, trigger). All default-OFF / safe values.
`.claude/settings.json`	New `Stop` hook entry wired with the standard ops-root resolver wrapper that the rest of the hooks use
`docs/agdr/AgDR-0009-voice-prompts-on-pause.md`	New — full options matrix (status quo / macOS say / cloud TTS / ML detection / notification daemon), why `say` wins for the initial phase, future-phase backlog
`docs/project-config.md`	New "Voice prompts" section with the schema, example overrides ("turn it on", "different voice", "always-speak debugging"), privacy notes

Six files, +665 / -1 LOC.

Why a Stop hook, not something more elaborate

The ApexYard interaction pattern in long sessions has lots of discrete pause points where the assistant cannot proceed without explicit input — per-PR merge approvals, design-review (a/b/c) choices, "which path do you want", tool-result confirmations. These pauses are silent text in the terminal. If the user has stepped away from the keyboard, the conversation stalls with zero attentional signal.

A Stop hook fires exactly at those pause points. The trigger heuristic only speaks when the message looks like a request for input — so tool-result reports, summary messages, and progress updates stay silent.

Real example from the session that produced this PR — the assistant said "Reply approve 354 (or merge 354) to ship — then I'll roll on to #243". Hook trigger fired (last paragraph contains Reply with + ends with em-dash but the heuristic also matches the apostrophe-paragraph). With enabled: true, the user would have heard "Reply approve 354 or merge 354 to ship — then I'll roll on to two four three" in Daniel's voice.

Why default OFF

Upstream-friendly default. Adopters who pull this commit see no behaviour change until they opt in. The hook is a sub-millisecond fast-path no-op when disabled (single config_get_or call before exit 0).

Why macOS-only initial phase

The user's framing ("Jarvis from Iron Man") implies a high-quality British male voice. macOS bundles Daniel (Premium) which is the closest free voice. ElevenLabs / OpenAI TTS would produce closer fidelity, but at recurring per-character cost AND require API-key management AND send assistant text off the local machine — that's an AgDR-worthy decision in its own right (Phase 3, separately).

Linux/Windows fall through silently when say isn't on PATH. Phase 2 adds OS-detection and espeak / Add-Type SpeechSynthesizer paths. Same trigger model, same config schema, just a platform-layer addition.

Testing

bash .claude/hooks/tests/test_voice_prompt_on_pause.sh — 9/9 pass
- case 1: disabled-default → no say
- case 2: enabled + question → say invoked with full text
- case 3: enabled + statement → no say (questions-only heuristic)
- case 4: enabled + Approved? pattern → say invoked
- case 5: enabled + (a)/(b)/(c) menu → say invoked
- case 6: malformed transcript JSON → no crash, exit 0
- case 7: enabled but say not on PATH → exit 0, no crash
- case 8: trigger=always → say invoked even on a statement
- case 9: markdown stripping — backticks, bold, links removed; words preserved
bash -n syntax-check on both the hook and the test file
jq . validates project-config.defaults.json and settings.json
Manual smoke on macOS — invoked the hook with a synthetic transcript ("Quick voice test — does this work?") and confirmed Daniel's voice spoke through the speakers
Live integration verified — the user enabled it locally via .claude/project-config.json override and confirmed "heard it" on the in-session smoke test

Risks

macOS-only. Adopters on Linux / Windows see no benefit until Phase 2 (separate ticket). Mitigation: hook silently exits 0 when say isn't on PATH — cross-platform users see exactly the disabled-state behaviour. No errors, no spam.
Trigger heuristic false-positives. A tool-result message that happens to end with ? would get read aloud. Conservative regex prefers false-negatives. Adopters can disable the hook entirely or switch to trigger: "always" for debugging.
TTS sound-output collision. If the user is on a call when the hook fires, say will speak through the active output device. Acceptable side-effect for v1; an env-var override could disable per-session in Phase 2.
Privacy. Today the hook reads from the local transcript file and pipes to a local OS binary — nothing leaves the machine. When/if Phase 3 adds cloud TTS, that becomes a separate AgDR — explicitly NOT in scope here.

Glossary

Term	Definition
Stop hook	A Claude Code hook that fires when the assistant ends a turn. Receives `{ session_id, transcript_path, ... }` JSON on stdin. Used here to detect pause-for-input moments and speak the question aloud.
Trigger heuristic (`questions-only`)	The default rule for "is this message asking for input?" — the last paragraph ends with `?` (after stripping trailing whitespace + markdown emphasis), OR matches one of `Approved?` / `Reply with` / `Confirm` / `(a)/(b)/(c)` / `which path` / `proceed?`. Conservative; prefers false-negatives.
`Daniel (Premium)`	macOS's bundled British-accented male voice — the closest free voice to "Jarvis-from-Iron-Man". Listable via `say -v "?"`.
`VOICE_PROMPTS_SYNC=1`	Test-mode env var that makes the hook run `say` synchronously instead of fire-and-forget. Tests need this because the orphaned-bg-process reparenting interacts badly with the test runner's subshell wrapper. Production invocations always run async.
Sentence-boundary truncation	Walks the message forward, keeping whole sentences (delimited by `. ! ?`) until the next sentence would push past `max_chars`. Avoids cutting mid-word; reads cleanly.

Refs #134

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#134): configurable voice prompts on assistant pause#135

feat(#134): configurable voice prompts on assistant pause#135
atlas-apex merged 1 commit into
me2resh:devfrom
atlas-apex:feature/#134-voice-prompts

atlas-apex commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

atlas-apex commented Apr 26, 2026

Summary

What ships

Why a Stop hook, not something more elaborate

Why default OFF

Why macOS-only initial phase

Testing

Risks

Glossary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants