Skip to content

Acoustic loopback duplicates transcripts during meetings on speakers #3228

@louis030195

Description

@louis030195

Summary

When the user is in a voice/video meeting (Google Meet, Zoom, etc.) and using speakers instead of headphones, every utterance gets transcribed twice — once from the mic, once from the System Audio loopback — and shows up as two near-duplicate lines in the transcript UI, often with different speaker labels.

Repro

  1. Join a Google Meet (or any voice call) with the laptop's built-in speakers active (no headphones).
  2. Speak.
  3. Open the meeting transcript view.

You'll see each utterance rendered twice at the same timestamp, e.g.:

time label text
07:35:07 AM louis@screenpi.pe (🔊) "the chat I had when we were going through so current screen pipe is healthy this morning was healthy it did have a couple of…"
07:35:07 AM You (🎤) "So recurrent screen pipe is healthy. This morning was healthy. It did have a couple of drops, one of which I put in there…"

Same speech, two transcriptions, slight wording differences from two separate Whisper/Parakeet passes.

Root cause — chained

Three things happen simultaneously:

  1. Mic (MacBook Pro Microphone (input)) captures the user's voice directly → its own audio_chunks row + transcription.
  2. The conferencing app plays the meeting audio mix (which includes the user's own voice fed back through the call) through the speakers.
  3. System Audio loopback (System Audio (output)) captures that → a separate audio_chunks row + transcription of the same speech, encoded once through the meet codec + room acoustics.

Two distinct audio_chunks.file_path files, two distinct audio_transcriptions rows.

Verified in DB (live install, 2026-05-04 14:35:49Z)

chunk_id=24796  is_input=1  device=MacBook Pro Microphone (input)   speaker_id=2   name=louis@screenpi.pe
chunk_id=24795  is_input=0  device=System Audio (output)            speaker_id=12  name=louis@screenpi.pe

Secondary effect — duplicate speaker entities with the same name

The calendar-speaker-id rules in crates/screenpipe-engine/src/calendar_speaker_id.rs then name both unnamed speakers louis@screenpi.pe:

  • Rule 1 ("input device → user") names the mic-side speaker louis@screenpi.pe → ends up on speaker_id=2.
  • Rule 2 ("1:1 meeting + 1 unnamed output → other attendee") fires on the loopback-side speaker; if both calendar attendees resolve to the user (self-meeting or both accounts being Louis), "other attendee" comes back as Louis again → speaker_id=12.

The embedding manager keeps them as separate clusters (mic timbre vs Meet-encoded reflected timbre cluster differently across the 0.70 cosine threshold), and the naming layer doesn't check for name collision before writing. Net: two named speaker rows for the same person.

Tertiary effect — UI shows mismatched icons

apps/screenpipe-app-tauri/components/rewind/timeline/audio-transcript.tsx:

const name = speakerName || (item.audio.is_input ? "me" : "speaker");

The meeting popover overrides the displayed name on is_input=true rows, so the mic-side row shows "You" even though the DB has name=louis@screenpi.pe on it. The output-side row shows the literal name. Two icons (🔊 / 🎤) for the same speaker = user confusion.

Impact

  • User confusion: looks like there's a phantom second "Louis" in the meeting.
  • Doubled transcription cost on every speakers-mode meeting.
  • Polluted speaker DB: each loopback session creates new duplicate-name speaker entities (current count: 2 entities both named louis@screenpi.pe plus 1 unnamed cluster, 3000+ embeddings between them, all the same person).
  • Per-chunk health/metrics noise: doubled DB writes, doubled audio file output, doubled disk usage during long meetings.

Fix plan (in priority order)

(1) Capture-time fix — skip System Audio loopback during meetings-on-speakers

When meeting_detector reports an active meeting AND macOS audio output route ≠ headphones, suspend System Audio (output) capture for the duration of the meeting. The loopback only adds noise (it just echoes the user + the remote audio that the mic also picks up via the speakers).

Detection hook: macOS AVAudioSessionRouteDescription.outputs / Core Audio's kAudioDevicePropertyDataSource to identify built-in speakers vs headphones/AirPods/USB.

(2) Naming-time fix — speaker-name collision check

In calendar_speaker_id.rs, before writing name = X to a speaker, query for any existing speaker with that exact name. If found:

  • If the embeddings are close → merge into the existing one.
  • If the embeddings are far → still merge, but flag for manual review (the user might have multiple voices/devices that legitimately diverged).

The DB should not be allowed to contain two speakers with an identical exact name.

(3) Cross-device transcription dedup (band-aid)

Cheap last-mile guard at replace_audio_transcription time: if a row exists with is_input != self.is_input, same speaker name, within ±5s, and text Levenshtein similarity > 0.8 → drop the new one. Treats symptom, not cause, but defends the UI even when (1) misses (e.g. user explicitly enables system audio capture).

(4) One-time DB cleanup migration

After (1)+(2) ship, sweep existing duplicate-named speakers and reassign their audio_transcriptions.speaker_id to a single survivor. Without (1) first, the dups would just regenerate.

Out of scope

  • Real-time AEC (acoustic echo cancellation) at capture: covered partially by the OS, but doing it ourselves is a much bigger lift than (1) and offers little additional value once (1) is in place.

Acceptance

  • A meeting on speakers no longer produces two transcript rows per utterance.
  • DB never has two speakers rows with identical name.
  • Existing duplicate louis@screenpi.pe (or any other duplicated email) speakers from past sessions are merged after upgrade.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions