Summary
When the user is in a voice/video meeting (Google Meet, Zoom, etc.) and using speakers instead of headphones, every utterance gets transcribed twice — once from the mic, once from the System Audio loopback — and shows up as two near-duplicate lines in the transcript UI, often with different speaker labels.
Repro
- Join a Google Meet (or any voice call) with the laptop's built-in speakers active (no headphones).
- Speak.
- Open the meeting transcript view.
You'll see each utterance rendered twice at the same timestamp, e.g.:
| time |
label |
text |
| 07:35:07 AM |
louis@screenpi.pe (🔊) |
"the chat I had when we were going through so current screen pipe is healthy this morning was healthy it did have a couple of…" |
| 07:35:07 AM |
You (🎤) |
"So recurrent screen pipe is healthy. This morning was healthy. It did have a couple of drops, one of which I put in there…" |
Same speech, two transcriptions, slight wording differences from two separate Whisper/Parakeet passes.
Root cause — chained
Three things happen simultaneously:
- Mic (
MacBook Pro Microphone (input)) captures the user's voice directly → its own audio_chunks row + transcription.
- The conferencing app plays the meeting audio mix (which includes the user's own voice fed back through the call) through the speakers.
- System Audio loopback (
System Audio (output)) captures that → a separate audio_chunks row + transcription of the same speech, encoded once through the meet codec + room acoustics.
Two distinct audio_chunks.file_path files, two distinct audio_transcriptions rows.
Verified in DB (live install, 2026-05-04 14:35:49Z)
chunk_id=24796 is_input=1 device=MacBook Pro Microphone (input) speaker_id=2 name=louis@screenpi.pe
chunk_id=24795 is_input=0 device=System Audio (output) speaker_id=12 name=louis@screenpi.pe
Secondary effect — duplicate speaker entities with the same name
The calendar-speaker-id rules in crates/screenpipe-engine/src/calendar_speaker_id.rs then name both unnamed speakers louis@screenpi.pe:
- Rule 1 ("input device → user") names the mic-side speaker
louis@screenpi.pe → ends up on speaker_id=2.
- Rule 2 ("1:1 meeting + 1 unnamed output → other attendee") fires on the loopback-side speaker; if both calendar attendees resolve to the user (self-meeting or both accounts being Louis), "other attendee" comes back as Louis again → speaker_id=12.
The embedding manager keeps them as separate clusters (mic timbre vs Meet-encoded reflected timbre cluster differently across the 0.70 cosine threshold), and the naming layer doesn't check for name collision before writing. Net: two named speaker rows for the same person.
Tertiary effect — UI shows mismatched icons
apps/screenpipe-app-tauri/components/rewind/timeline/audio-transcript.tsx:
const name = speakerName || (item.audio.is_input ? "me" : "speaker");
The meeting popover overrides the displayed name on is_input=true rows, so the mic-side row shows "You" even though the DB has name=louis@screenpi.pe on it. The output-side row shows the literal name. Two icons (🔊 / 🎤) for the same speaker = user confusion.
Impact
- User confusion: looks like there's a phantom second "Louis" in the meeting.
- Doubled transcription cost on every speakers-mode meeting.
- Polluted speaker DB: each loopback session creates new duplicate-name speaker entities (current count: 2 entities both named
louis@screenpi.pe plus 1 unnamed cluster, 3000+ embeddings between them, all the same person).
- Per-chunk health/metrics noise: doubled DB writes, doubled audio file output, doubled disk usage during long meetings.
Fix plan (in priority order)
(1) Capture-time fix — skip System Audio loopback during meetings-on-speakers
When meeting_detector reports an active meeting AND macOS audio output route ≠ headphones, suspend System Audio (output) capture for the duration of the meeting. The loopback only adds noise (it just echoes the user + the remote audio that the mic also picks up via the speakers).
Detection hook: macOS AVAudioSessionRouteDescription.outputs / Core Audio's kAudioDevicePropertyDataSource to identify built-in speakers vs headphones/AirPods/USB.
(2) Naming-time fix — speaker-name collision check
In calendar_speaker_id.rs, before writing name = X to a speaker, query for any existing speaker with that exact name. If found:
- If the embeddings are close → merge into the existing one.
- If the embeddings are far → still merge, but flag for manual review (the user might have multiple voices/devices that legitimately diverged).
The DB should not be allowed to contain two speakers with an identical exact name.
(3) Cross-device transcription dedup (band-aid)
Cheap last-mile guard at replace_audio_transcription time: if a row exists with is_input != self.is_input, same speaker name, within ±5s, and text Levenshtein similarity > 0.8 → drop the new one. Treats symptom, not cause, but defends the UI even when (1) misses (e.g. user explicitly enables system audio capture).
(4) One-time DB cleanup migration
After (1)+(2) ship, sweep existing duplicate-named speakers and reassign their audio_transcriptions.speaker_id to a single survivor. Without (1) first, the dups would just regenerate.
Out of scope
- Real-time AEC (acoustic echo cancellation) at capture: covered partially by the OS, but doing it ourselves is a much bigger lift than (1) and offers little additional value once (1) is in place.
Acceptance
- A meeting on speakers no longer produces two transcript rows per utterance.
- DB never has two
speakers rows with identical name.
- Existing duplicate
louis@screenpi.pe (or any other duplicated email) speakers from past sessions are merged after upgrade.
Summary
When the user is in a voice/video meeting (Google Meet, Zoom, etc.) and using speakers instead of headphones, every utterance gets transcribed twice — once from the mic, once from the System Audio loopback — and shows up as two near-duplicate lines in the transcript UI, often with different speaker labels.
Repro
You'll see each utterance rendered twice at the same timestamp, e.g.:
louis@screenpi.pe(🔊)You(🎤)Same speech, two transcriptions, slight wording differences from two separate Whisper/Parakeet passes.
Root cause — chained
Three things happen simultaneously:
MacBook Pro Microphone (input)) captures the user's voice directly → its ownaudio_chunksrow + transcription.System Audio (output)) captures that → a separateaudio_chunksrow + transcription of the same speech, encoded once through the meet codec + room acoustics.Two distinct
audio_chunks.file_pathfiles, two distinctaudio_transcriptionsrows.Verified in DB (live install, 2026-05-04 14:35:49Z)
Secondary effect — duplicate speaker entities with the same name
The calendar-speaker-id rules in
crates/screenpipe-engine/src/calendar_speaker_id.rsthen name both unnamed speakerslouis@screenpi.pe:louis@screenpi.pe→ ends up on speaker_id=2.The embedding manager keeps them as separate clusters (mic timbre vs Meet-encoded reflected timbre cluster differently across the 0.70 cosine threshold), and the naming layer doesn't check for name collision before writing. Net: two named speaker rows for the same person.
Tertiary effect — UI shows mismatched icons
apps/screenpipe-app-tauri/components/rewind/timeline/audio-transcript.tsx:The meeting popover overrides the displayed name on
is_input=truerows, so the mic-side row shows "You" even though the DB hasname=louis@screenpi.peon it. The output-side row shows the literal name. Two icons (🔊 / 🎤) for the same speaker = user confusion.Impact
louis@screenpi.peplus 1 unnamed cluster, 3000+ embeddings between them, all the same person).Fix plan (in priority order)
(1) Capture-time fix — skip System Audio loopback during meetings-on-speakers
When
meeting_detectorreports an active meeting AND macOS audio output route ≠ headphones, suspendSystem Audio (output)capture for the duration of the meeting. The loopback only adds noise (it just echoes the user + the remote audio that the mic also picks up via the speakers).Detection hook: macOS
AVAudioSessionRouteDescription.outputs/ Core Audio'skAudioDevicePropertyDataSourceto identify built-in speakers vs headphones/AirPods/USB.(2) Naming-time fix — speaker-name collision check
In
calendar_speaker_id.rs, before writingname = Xto a speaker, query for any existing speaker with that exact name. If found:The DB should not be allowed to contain two speakers with an identical exact name.
(3) Cross-device transcription dedup (band-aid)
Cheap last-mile guard at
replace_audio_transcriptiontime: if a row exists withis_input != self.is_input, same speaker name, within ±5s, and text Levenshtein similarity > 0.8 → drop the new one. Treats symptom, not cause, but defends the UI even when (1) misses (e.g. user explicitly enables system audio capture).(4) One-time DB cleanup migration
After (1)+(2) ship, sweep existing duplicate-named speakers and reassign their
audio_transcriptions.speaker_idto a single survivor. Without (1) first, the dups would just regenerate.Out of scope
Acceptance
speakersrows with identicalname.louis@screenpi.pe(or any other duplicated email) speakers from past sessions are merged after upgrade.