fix(meetings): never drop live transcript segments on a coverage-window miss by louis030195 · Pull Request #3918 · screenpipe/screenpipe

louis030195 · 2026-06-08T23:28:49Z

The bug

A meeting can capture and transcribe both sides of a call, yet the transcript stops surfacing partway through (looks like "recording stopped" while detection stays green). The data is in the DB on the live path, but the post-call meeting view reads audio_transcriptions, and a live segment only lands there if mirror_live_meeting_to_audio_transcriptions copies it in.

That mirror silently dropped any live segment whose nearest same-device audio chunk fell outside a fixed ±coverage_window (15s):

live final captured_at drifts past the chunk timestamp
  (provider finalizes a turn seconds late / long chunks / a capture gap)
        │
        ▼
  no same-device chunk within ±15s  ──►  segment SKIPPED (`continue;`)
        │
        ▼
  never written to audio_transcriptions
        │
        ▼
  gone from meeting notes / timeline / search after the call

A second, related inconsistency: mark_chunks_covered_by_live matched the device name case-sensitively (instr(file_path, device_name)) while its sibling mirror matched case-insensitively (and the mirror's comment claimed they agreed). A casing difference between the chunk file path and the stored device name left meeting chunks pending — re-transcribed by the batch reconciler and inconsistent with the mirror.

The fix

The mirror now falls back to the nearest same-device chunk regardless of the window instead of dropping the segment. Losing the transcript text is worse than a small playback offset, and the row keeps the segment's real timestamp, so search/timeline stay correct. Device attribution stays strict: it never matches a different device's chunk.
Aligned mark_chunks_covered_by_live's device-name match to be case-insensitive, matching the mirror.

Tests

Two regression tests added, full coverage/mirror/dedup suite passes (screenpipe-db, no hardware):

a live segment whose only same-device chunk is outside the window is mirrored (not dropped);
a case-different device name still matches coverage.

cargo test -p screenpipe-db --test timeline_live_meeting_test \
  --test live_coverage_marker_test --test meeting_transcript_dedup_test
# 9 + 7 + 1 passed

Scope / honest notes

This is the durable-surfacing half. The live-during-the-call view builds only from push events and never reconciles against the persisted segments, so a stalled event stream can leave a gap until reload — worth a small frontend follow-up, not in this PR.
No behavior change to capture or to the read query; only the live→durable mirror and the coverage marker.

🤖 Generated with Claude Code

…ow miss From a report that a meeting "recorded both sides for ~5 min then stopped surfacing the transcript." The audio was captured and transcribed the whole call (live finals for both devices, batch transcriptions, and the live->durable mirror all ran in the logs). The gap is on the surfacing side: how live meeting finals get copied into audio_transcriptions, which the post-call meeting view reads. Two robustness bugs in the live-coverage path: 1. mirror_live_meeting_to_audio_transcriptions silently DROPPED any live segment whose nearest same-device chunk fell outside the +/-coverage_window. The live provider finalizes a turn seconds after the audio (drifting captured_at past the chunk timestamp); long chunks or capture gaps do the same. Dropped segments never reached audio_transcriptions, so they vanished from every post-call surface (meeting notes, timeline, search). Now fall back to the nearest SAME-device chunk regardless of window rather than dropping: losing the text is worse than a small playback offset, and the row keeps the segment's real timestamp so search/timeline stay correct. Device attribution stays strict (never a different device's chunk). 2. mark_chunks_covered_by_live matched the device name case-SENSITIVELY while its sibling mirror matched case-INSENSITIVELY (and the mirror comment claimed they agreed). A casing difference left meeting chunks pending, re-transcribed by batch and inconsistent with the mirror. Aligned to case-insensitive. Regression tests: a far same-device chunk is used instead of dropping the segment; case-different device names still match coverage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-09T00:03:57Z

Diarization eval results

Source: crates/screenpipe-audio-eval/evals/ · VoxConverse dev (CC-BY-4.0) + composed workday templates + screenpipe-shaped LibriSpeech fixtures

fixture	DER	VAD FA	VAD FN	boundary err (s)	continuity	predicted / true spk
interrupted_meeting	0.186	0.01	0.063	20.286	0.833	9 / 5
long_silence_day	0.437	0.011	0.145	11.46	0.7	14 / 10
screenpipe_meeting_rapid_handoffs	0.241	0.196	0.099	2.305	1	5 / 3
screenpipe_background_24_7_day	0.315	0.025	0.159	2.203	1	4 / 3
screenpipe_short_backchannels	0.561	0.915	0.064	0.488	n/a	3 / 3
screenpipe_mic_system_echo_leakage	0.275	0.198	0.084	3.045	0.667	5 / 3
screenpipe_overlap_crosstalk	0.254	0.84	0.042	0.667	n/a	3 / 3
abjxc	0.016	0.098	0.002	1.151	n/a	2 / 1
bxpwa	0.111	0.453	0.029	20.793	0.714	8 / 5
dhorc	0.143	0.461	0.034	3.681	1	5 / 4

_{DER, VAD FA, VAD FN, boundary err: lower is better. Continuity: higher is better, 1.0 = same hyp cluster across all silence gaps. Composed workday rows and screenpipe_* rows exercise screenpipe-shaped usage: meetings, background gaps, backchannels, echo leakage, and crosstalk. Raw VoxConverse rows score broadcast-quality stems for comparison. See crates/screenpipe-audio-eval/evals/README.md for methodology.}

Pipeline replay matrix

Source: generated screenpipe_* fixtures materialized into temp screenpipe SQLite DBs, then read back through search_audio. This catches storage/search regressions that pure DER scoring misses.

scenarios	passed	failed	skipped	avg background DER	avg background speaker err	Deepgram
41	40	0	1	0.329	0.183	skip

_{The no-secret CI matrix runs local diarization under Parakeet/Whisper engine labels across live/background and mic/system device profiles. Real Deepgram/screenpipe-cloud smoke can be run locally with --deepgram required when credentials are present.}

Transcription quality

Source: LibriSpeech test-clean (CC-BY-4.0) · per-model utterance cap · normalized lowercased word-level Levenshtein

model	utterances	WER	CER	throughput (samples/s)
tiny	50	0.085	0.033	69157
whisper-large-v3-turbo-quantized	20	0.042	0.009	1911
parakeet	50	0.04	0.026	107210

_{WER + CER on read-aloud speech. Per-model utterance caps keep wall time bounded — tiny/parakeet at 50, the heavier large-v3-turbo-quantized at 20. See README for normalization rules.}

louis030195 merged commit df61b29 into main Jun 8, 2026
22 of 23 checks passed

louis030195 mentioned this pull request Jun 8, 2026

fix(audio): capture other call participants when meeting audio routes to a Bluetooth headset (tap-only aggregate) #3919

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(meetings): never drop live transcript segments on a coverage-window miss#3918

fix(meetings): never drop live transcript segments on a coverage-window miss#3918
louis030195 merged 1 commit into
mainfrom
claude/meeting-transcript-surfacing

louis030195 commented Jun 8, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

louis030195 commented Jun 8, 2026

The bug

The fix

Tests

Scope / honest notes

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026

Diarization eval results

Pipeline replay matrix

Transcription quality

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant