Skip to content

fix(meetings): never drop live transcript segments on a coverage-window miss#3918

Merged
louis030195 merged 1 commit into
mainfrom
claude/meeting-transcript-surfacing
Jun 8, 2026
Merged

fix(meetings): never drop live transcript segments on a coverage-window miss#3918
louis030195 merged 1 commit into
mainfrom
claude/meeting-transcript-surfacing

Conversation

@louis030195

Copy link
Copy Markdown
Collaborator

The bug

A meeting can capture and transcribe both sides of a call, yet the transcript stops surfacing partway through (looks like "recording stopped" while detection stays green). The data is in the DB on the live path, but the post-call meeting view reads audio_transcriptions, and a live segment only lands there if mirror_live_meeting_to_audio_transcriptions copies it in.

That mirror silently dropped any live segment whose nearest same-device audio chunk fell outside a fixed ±coverage_window (15s):

live final captured_at drifts past the chunk timestamp
  (provider finalizes a turn seconds late / long chunks / a capture gap)
        │
        ▼
  no same-device chunk within ±15s  ──►  segment SKIPPED (`continue;`)
        │
        ▼
  never written to audio_transcriptions
        │
        ▼
  gone from meeting notes / timeline / search after the call

A second, related inconsistency: mark_chunks_covered_by_live matched the device name case-sensitively (instr(file_path, device_name)) while its sibling mirror matched case-insensitively (and the mirror's comment claimed they agreed). A casing difference between the chunk file path and the stored device name left meeting chunks pending — re-transcribed by the batch reconciler and inconsistent with the mirror.

The fix

  • The mirror now falls back to the nearest same-device chunk regardless of the window instead of dropping the segment. Losing the transcript text is worse than a small playback offset, and the row keeps the segment's real timestamp, so search/timeline stay correct. Device attribution stays strict: it never matches a different device's chunk.
  • Aligned mark_chunks_covered_by_live's device-name match to be case-insensitive, matching the mirror.

Tests

Two regression tests added, full coverage/mirror/dedup suite passes (screenpipe-db, no hardware):

  • a live segment whose only same-device chunk is outside the window is mirrored (not dropped);
  • a case-different device name still matches coverage.
cargo test -p screenpipe-db --test timeline_live_meeting_test \
  --test live_coverage_marker_test --test meeting_transcript_dedup_test
# 9 + 7 + 1 passed

Scope / honest notes

  • This is the durable-surfacing half. The live-during-the-call view builds only from push events and never reconciles against the persisted segments, so a stalled event stream can leave a gap until reload — worth a small frontend follow-up, not in this PR.
  • No behavior change to capture or to the read query; only the live→durable mirror and the coverage marker.

🤖 Generated with Claude Code

…ow miss

From a report that a meeting "recorded both sides for ~5 min then stopped
surfacing the transcript." The audio was captured and transcribed the whole call
(live finals for both devices, batch transcriptions, and the live->durable mirror
all ran in the logs). The gap is on the surfacing side: how live meeting finals
get copied into audio_transcriptions, which the post-call meeting view reads.

Two robustness bugs in the live-coverage path:

1. mirror_live_meeting_to_audio_transcriptions silently DROPPED any live segment
   whose nearest same-device chunk fell outside the +/-coverage_window. The live
   provider finalizes a turn seconds after the audio (drifting captured_at past
   the chunk timestamp); long chunks or capture gaps do the same. Dropped
   segments never reached audio_transcriptions, so they vanished from every
   post-call surface (meeting notes, timeline, search). Now fall back to the
   nearest SAME-device chunk regardless of window rather than dropping: losing
   the text is worse than a small playback offset, and the row keeps the
   segment's real timestamp so search/timeline stay correct. Device attribution
   stays strict (never a different device's chunk).

2. mark_chunks_covered_by_live matched the device name case-SENSITIVELY while its
   sibling mirror matched case-INSENSITIVELY (and the mirror comment claimed they
   agreed). A casing difference left meeting chunks pending, re-transcribed by
   batch and inconsistent with the mirror. Aligned to case-insensitive.

Regression tests: a far same-device chunk is used instead of dropping the
segment; case-different device names still match coverage.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@louis030195 louis030195 merged commit df61b29 into main Jun 8, 2026
22 of 23 checks passed
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Diarization eval results

Source: crates/screenpipe-audio-eval/evals/ · VoxConverse dev (CC-BY-4.0) + composed workday templates + screenpipe-shaped LibriSpeech fixtures

fixture DER VAD FA VAD FN boundary err (s) continuity predicted / true spk
interrupted_meeting 0.186 0.01 0.063 20.286 0.833 9 / 5
long_silence_day 0.437 0.011 0.145 11.46 0.7 14 / 10
screenpipe_meeting_rapid_handoffs 0.241 0.196 0.099 2.305 1 5 / 3
screenpipe_background_24_7_day 0.315 0.025 0.159 2.203 1 4 / 3
screenpipe_short_backchannels 0.561 0.915 0.064 0.488 n/a 3 / 3
screenpipe_mic_system_echo_leakage 0.275 0.198 0.084 3.045 0.667 5 / 3
screenpipe_overlap_crosstalk 0.254 0.84 0.042 0.667 n/a 3 / 3
abjxc 0.016 0.098 0.002 1.151 n/a 2 / 1
bxpwa 0.111 0.453 0.029 20.793 0.714 8 / 5
dhorc 0.143 0.461 0.034 3.681 1 5 / 4

DER, VAD FA, VAD FN, boundary err: lower is better. Continuity: higher is better, 1.0 = same hyp cluster across all silence gaps. Composed workday rows and screenpipe_* rows exercise screenpipe-shaped usage: meetings, background gaps, backchannels, echo leakage, and crosstalk. Raw VoxConverse rows score broadcast-quality stems for comparison. See crates/screenpipe-audio-eval/evals/README.md for methodology.

Pipeline replay matrix

Source: generated screenpipe_* fixtures materialized into temp screenpipe SQLite DBs, then read back through search_audio. This catches storage/search regressions that pure DER scoring misses.

scenarios passed failed skipped avg background DER avg background speaker err Deepgram
41 40 0 1 0.329 0.183 skip

The no-secret CI matrix runs local diarization under Parakeet/Whisper engine labels across live/background and mic/system device profiles. Real Deepgram/screenpipe-cloud smoke can be run locally with --deepgram required when credentials are present.

Transcription quality

Source: LibriSpeech test-clean (CC-BY-4.0) · per-model utterance cap · normalized lowercased word-level Levenshtein

model utterances WER CER throughput (samples/s)
tiny 50 0.085 0.033 69157
whisper-large-v3-turbo-quantized 20 0.042 0.009 1911
parakeet 50 0.04 0.026 107210

WER + CER on read-aloud speech. Per-model utterance caps keep wall time bounded — tiny/parakeet at 50, the heavier large-v3-turbo-quantized at 20. See README for normalization rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant