[Chore] Configurable Jarvis-style voice prompts when waiting for user input

## Driver

When the assistant pauses for user input — design choices, per-PR merge approvals, ambiguous "which path do you want", etc. — the input request is a text message in the terminal. If the user has stepped away from the keyboard, the conversation stalls silently. The assistant is "waiting" but giving zero attentional signal.

A configurable text-to-speech layer that **speaks the question out loud** when the assistant ends a turn with a question (Jarvis-from-Iron-Man style) would surface those moments without requiring the user to babysit the terminal. The user still replies via keyboard — no voice input in the initial phase.

This is a quality-of-life improvement for adopters running long sessions (especially the multi-PR launch flows where the assistant fires 10+ "approved?" / "merge X?" / "design call: a/b/c?" prompts over an afternoon).

## Scope

**Initial phase: single platform (macOS), single TTS engine, conservative trigger heuristic.** Cross-platform and higher-quality TTS providers are explicitly out of scope here — they're follow-ups (see Future phases below) once the trigger model is proven.

### Mechanism

A new Stop hook at `.claude/hooks/voice-prompt-on-pause.sh`:

1. Fires on assistant turn end (Stop event)
2. Reads the last assistant message from the transcript JSON Claude Code passes to the hook (per the Claude Code hooks spec — the hook receives `{ transcript_path, ... }` on stdin)
3. If the message ends with a question or matches a "waiting for input" heuristic (see below), runs `say -v <voice> "<message-or-summary>"`
4. Always exits 0 — TTS failure must never block the conversation

### Trigger heuristic (initial phase, deliberately conservative)

The hook only speaks if the last assistant message:

- **Ends with `?`** (after stripping trailing whitespace + closing markdown formatting), OR
- Contains a recognised "Reply with X" / "Approved?" / "Confirm Y" / "a/b/c" / "Reply `<token>`" pattern in its last paragraph

This skips informational summary messages (which often end with "."), tool-result reports, and progress updates. Initial heuristic prefers false-negatives over false-positives — a missed prompt is annoying; a TTS reading a 200-line tool output is unbearable.

### What gets spoken

Not the full message — that would read aloud entire markdown tables, code blocks, and glossaries. The hook extracts:

- The first sentence of the trailing paragraph that triggered the heuristic, OR
- A configurable maximum (default: 200 chars), truncated at sentence boundary

Markdown is stripped before TTS (backticks, asterisks, links, bullets) — `say` doesn't interpret them well.

### Configuration

Add a `voice_prompts` section to `.claude/project-config.defaults.json`:

```json
{
  "voice_prompts": {
    "enabled": false,
    "voice": "Daniel",
    "max_chars": 200,
    "rate_wpm": 180,
    "trigger": "questions-only"
  }
}
```

Adopters override per-fork in `.claude/project-config.json`. The shipped default is **off** — no behaviour change for existing forks until they opt in.

| Field | Purpose |
|---|---|
| `enabled` | Master switch. `false` makes the hook a no-op. |
| `voice` | macOS `say` voice name. Default `Daniel` (British, premium-quality on modern macOS — closest free Jarvis-alike). |
| `max_chars` | Caps the spoken length so a long question doesn't read for 30 seconds. Truncates at sentence boundary. |
| `rate_wpm` | `say -r <wpm>` rate. Default 180 (slightly faster than `say` default; closer to spoken English pace). |
| `trigger` | `questions-only` (default — heuristic above) or `always` (speaks every turn-end). |

### Wiring

Settings hook entry in `.claude/settings.json`:

```json
{
  "hooks": {
    "Stop": [
      { "matcher": ".*", "hooks": [{ "type": "command", "command": "${ops_root}/.claude/hooks/voice-prompt-on-pause.sh" }] }
    ]
  }
}
```

The hook script reads config via the existing `_lib-read-config.sh` library.

## Acceptance Criteria

- [ ] `.claude/hooks/voice-prompt-on-pause.sh` exists, exits 0 in all branches, never blocks the conversation
- [ ] When `voice_prompts.enabled = true` and the last assistant message ends with `?`, the hook runs `say -v <voice> -r <rate_wpm> "<truncated message>"`
- [ ] When `enabled = false` (the shipped default), the hook is a no-op (silent, no `say` invoked)
- [ ] When `enabled = true` and the message does NOT match the trigger heuristic, no TTS — verified by feeding a few non-question turn-ends as test fixtures
- [ ] Markdown is stripped before TTS — `say` doesn't read backticks / asterisks / link syntax aloud
- [ ] Truncation at sentence boundary, not mid-word, respecting `max_chars`
- [ ] `.claude/project-config.defaults.json` updated with the new schema
- [ ] `docs/project-config.md` documents the new section
- [ ] `.claude/settings.json` Stop-hook entry shipped (the actual hook stays a no-op until enabled, so this is safe)
- [ ] Test fixtures at `.claude/hooks/_tests/voice-prompt-on-pause/` covering: enabled+question, enabled+statement, disabled, malformed transcript JSON
- [ ] AgDR documenting: TTS provider choice (macOS `say` vs OpenAI TTS vs ElevenLabs), trigger heuristic vs ML-based question detection, why we're shipping macOS-only first

## Future phases (NOT in this ticket)

- **Phase 2** — cross-platform TTS (Linux `espeak` / `spd-say`, Windows `Add-Type ... SpeechSynthesizer`). Single config, OS-detection in the hook.
- **Phase 3** — high-quality TTS providers (OpenAI TTS, ElevenLabs). New config field `provider`. AgDR-worthy on cost vs quality.
- **Phase 4** — voice input. Whisper-based STT, voice-trigger phrases ("approve PR", "merge"). Significant scope; needs its own design.
- **Phase 5** — per-message overrides (a way for the assistant to mark a message as "speak this" or "skip TTS" inline).

## Risks / Dependencies

- **macOS-only initial phase** — adopters on Linux/Windows see no benefit until Phase 2. Acknowledged trade-off; the trigger model + config schema can be designed once and the platform layer added.
- **Trigger heuristic false positives** — a tool-result or summary that happens to end with `?` would get read aloud. Mitigation: deliberate conservative heuristic, configurable, easy to disable.
- **Hook overhead on every turn end** — Stop hook fires every turn, even when disabled. The disabled path is `[ "$(config_get '.voice_prompts.enabled')" = "true" ] || exit 0` — sub-millisecond. Negligible.
- **TTS sound-output collision with system audio** — if the user is on a call when the hook fires, `say` will speak through the active output. Acceptable side-effect; adopter can disable per-session by exporting an env var the hook respects (Phase 2 enhancement).
- **Privacy** — the hook reads the assistant's text from the local transcript file and pipes it to `say`, which is a local OS binary — nothing leaves the machine. When/if Phase 3 adds cloud TTS providers, that becomes a meaningful privacy decision worth its own AgDR.

Filed at the user's request — initial phase only (TTS-on-pause, macOS, keyboard-reply). Voice input is explicitly deferred to Phase 4.


Field	Purpose
`enabled`	Master switch. `false` makes the hook a no-op.
`voice`	macOS `say` voice name. Default `Daniel` (British, premium-quality on modern macOS — closest free Jarvis-alike).
`max_chars`	Caps the spoken length so a long question doesn't read for 30 seconds. Truncates at sentence boundary.
`rate_wpm`	`say -r <wpm>` rate. Default 180 (slightly faster than `say` default; closer to spoken English pace).
`trigger`	`questions-only` (default — heuristic above) or `always` (speaks every turn-end).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Chore] Configurable Jarvis-style voice prompts when waiting for user input #134

Driver

Scope

Mechanism

Trigger heuristic (initial phase, deliberately conservative)

What gets spoken

Configuration

Wiring

Acceptance Criteria

Future phases (NOT in this ticket)

Risks / Dependencies

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Chore] Configurable Jarvis-style voice prompts when waiting for user input #134

Description

Driver

Scope

Mechanism

Trigger heuristic (initial phase, deliberately conservative)

What gets spoken

Configuration

Wiring

Acceptance Criteria

Future phases (NOT in this ticket)

Risks / Dependencies

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions