Summary
The audio-transcription reconciliation worker stopped processing pending chunks at some point. Audio capture and live transcription kept working — but chunks that fell into the "pending" state never got picked back up, accumulating ~3000 chunks over ~6 days. The health-check warning fires correctly (audio transcription backlog stalled — N chunk(s) pending, oldest Ns old) but the dispatcher does not self-recover. Restarting the app reliably triggers reconciliation to run and drain the backlog at ~50 chunks per 2-3 min.
This causes silent data loss in user-visible behavior: audio files are written to disk but never transcribed, so they don't appear in search/UI for days.
Environment
- macOS 26.5 (Tahoe) on Apple Silicon (M1 Pro)
- Screenpipe app v2.4.247 (Tauri shell), engine v0.3.307
- Transcription backend:
Deepgram via screenpipe-cloud (api.screenpi.pe), mode Batch
- Pro license active, cloud trial active
- Audio devices:
Shure MV7+ (input) + macOS System Audio (output)
~/.screenpipe/db.sqlite ~1.2 GB at time of incident
- No exotic config — defaults except retention (
mode=media, retention_days=14)
Timeline (from ~/.screenpipe/screenpipe-app.2026-05-21.log)
- ~2026-05-15 (extrapolated from
oldest 517251s old first observed): reconciliation worker stops draining pending chunks. Live capture/transcription continues normally — no user-visible signal anywhere in the UI.
- 2026-05-21 07:46:42 Z:
capture transcription configured: background_engine=Deepgram ... transcription_mode=Batch — app restart on this date triggered reconciliation re-init.
- 2026-05-21 07:49:33 Z: first
audio transcription backlog stalled — 3128 chunk(s) pending, oldest 517251s old | pool: read=4/4 idle, write=2/2 idle (6 days of pending audio).
- 2026-05-21 07:49:48 Z onward:
reconciliation: transcribed 50 orphaned chunks every 2-3 min — backlog starts draining.
- 2026-05-21 12:49 Z: backlog down to 31 chunks, last
stalled warning emitted.
- 2026-05-21 13:50+ Z: only
reconciliation: transcribed N orphaned chunks log lines, no more stall warnings — system healthy.
Time-to-drain ~5 hours from restart, with capture continuously feeding the queue.
Reproduction
We do not know the precise trigger. Hypotheses below. The repeatable observation:
- Run screenpipe-app with cloud Deepgram batch mode for several days.
- At some point — possibly tied to network drop / laptop sleep / cloud-side rate limiting / dispatcher state corruption — reconciliation stops processing pending chunks.
- From that moment on:
pending count grows linearly with capture rate. oldest pending age grows by 1 second every 1 second. Health-check warns. Live path remains unaffected, so the UI looks healthy.
- Restart the app. Reconciliation re-runs at startup, drains the accumulated backlog.
Possible triggers I cannot rule out from this single incident:
- macOS sleep / wake cycle interrupting an in-flight HTTP request to
api.screenpi.pe
- Cloud-side 429 / 5xx that wasn't handled with retry-with-backoff but with silent abort
- Token refresh failure (the screenpipe-cloud bearer token has a refresh dance)
- DNS / TLS error during a request that left the worker task in an awaiting-future state
Logs around the suspected stall start point are no longer available (only daily log files; 6 days back is rotated out). If you can hint at what to grep for, I can capture it for the next recurrence.
Expected behavior
When pending_count > 20 AND oldest_pending_age > 2 * AUDIO_RECONCILIATION_FRESHNESS_DELAY_SECS (20 min), the reconciliation worker should either:
- Auto-recover by restarting its task and re-claiming pending chunks, OR
- Surface a user-visible notification "transcription stalled — restart recommended" so users know the indexed data they see is stale by days. The health-check WARN goes to
screenpipe-app.YYYY-MM-DD.log, which 99% of users will never read.
Actual behavior
Reconciliation worker continues to log nothing. Health-check warning fires once per minute but is log-only. Pool is 8/8 idle — workers exist but no chunks are being dispatched to them. Live path proceeds independently (new captures still transcribed promptly), masking the problem in the UI.
Source-code observations
From crates/screenpipe-engine/src/routes/health.rs:474-518 — comment is accurate, the heuristic correctly detects the stall:
// Direct measurement: count chunks stuck in 'pending' status. This
// replaces the previous pool-idle + stale-metric heuristic, which
// fired false positives whenever the live path's dedup short-circuit
// ate batches of common short words and went silent on the write
// pool. ...
//
// A real stall now means: the reconciliation worker has pending
// chunks older than the freshness window — i.e. they should have
// been processed by now and haven't.
let stalled = pending_count > 20
&& oldest_pending_age_secs
> (AUDIO_RECONCILIATION_FRESHNESS_DELAY_SECS as u64).saturating_mul(2);
crates/screenpipe-audio/src/audio_manager/reconciliation.rs:28:
const RECONCILIATION_FRESHNESS_DELAY_SECS: i64 = 10 * 60;
So the detection threshold is 20 min stale + 20+ chunks. In my incident, both conditions were satisfied for days, not minutes. The reconciliation worker's scheduling / retry logic is the suspect.
Workaround
Quit and relaunch screenpipe.app. Reconciliation drains automatically over ~5h depending on backlog size (~50 chunks per 2-3 min in my run, cloud rate-limit dependent).
Suggested fixes (in roughly increasing complexity)
- User-facing notification when health-check
stalled flag is true for >5 consecutive checks. "Transcription backlog hasn't drained in 5 minutes — restart screenpipe to recover."
- Watchdog timer on the reconciliation worker. If the task hasn't made progress (touched a row in
audio_chunks WHERE status='pending') in 2 * RECONCILIATION_FRESHNESS_DELAY_SECS, the watchdog kills and restarts the task.
- Periodic re-init on schedule (every 30 min?), not just on app startup. Reconciliation re-init seems to be the fix; doing it proactively avoids the multi-day silent failure.
- Bounded retry with exponential backoff on cloud Deepgram failures, then dead-letter-style logging when retries exhausted. If a chunk fails N times, mark it so the dispatcher moves on instead of retrying it forever (if that's what's happening).
- Trace logs around reconciliation worker start/stop with the reason. Right now there's no log line that says "reconciliation worker stopped because X" — only the WARN that says "it's been stopped for a while." Adding a line at the exit point of the worker task (panic / clean shutdown / awaiting-forever) would make root-cause obvious next time.
Severity
Medium-High in my judgment. No data lost from disk (audio files remain), but indexed/searchable transcripts can fall days behind without user awareness. For a tool whose value proposition is "every word you said is searchable," silent multi-day staleness is a serious UX failure.
I can help debug next recurrence
I'm a heavy user with a contributor setup ready (mediar-ai/screenpipe cloned, full build environment on Mac Mini). If you want me to drop in additional instrumentation behind a feature flag and run on prod for a week to catch the next stall in the act, happy to.
Incident date: 2026-05-21 (drained), backlog onset estimated 2026-05-15
Filed by: @pleasedodisturb
Related: #3466 (port collision / silent recorder failure) — distinct symptom, same general theme of silent failure modes that the health-check sees but the user doesn't.
Summary
The audio-transcription reconciliation worker stopped processing pending chunks at some point. Audio capture and live transcription kept working — but chunks that fell into the "pending" state never got picked back up, accumulating ~3000 chunks over ~6 days. The health-check warning fires correctly (
audio transcription backlog stalled — N chunk(s) pending, oldest Ns old) but the dispatcher does not self-recover. Restarting the app reliably triggers reconciliation to run and drain the backlog at ~50 chunks per 2-3 min.This causes silent data loss in user-visible behavior: audio files are written to disk but never transcribed, so they don't appear in search/UI for days.
Environment
Deepgramviascreenpipe-cloud(api.screenpi.pe), modeBatchShure MV7+ (input)+ macOSSystem Audio (output)~/.screenpipe/db.sqlite~1.2 GB at time of incidentmode=media, retention_days=14)Timeline (from
~/.screenpipe/screenpipe-app.2026-05-21.log)oldest 517251s oldfirst observed): reconciliation worker stops draining pending chunks. Live capture/transcription continues normally — no user-visible signal anywhere in the UI.capture transcription configured: background_engine=Deepgram ... transcription_mode=Batch— app restart on this date triggered reconciliation re-init.audio transcription backlog stalled — 3128 chunk(s) pending, oldest 517251s old | pool: read=4/4 idle, write=2/2 idle(6 days of pending audio).reconciliation: transcribed 50 orphaned chunksevery 2-3 min — backlog starts draining.stalledwarning emitted.reconciliation: transcribed N orphaned chunkslog lines, no more stall warnings — system healthy.Time-to-drain ~5 hours from restart, with capture continuously feeding the queue.
Reproduction
We do not know the precise trigger. Hypotheses below. The repeatable observation:
pendingcount grows linearly with capture rate.oldest pending agegrows by 1 second every 1 second. Health-check warns. Live path remains unaffected, so the UI looks healthy.Possible triggers I cannot rule out from this single incident:
api.screenpi.peLogs around the suspected stall start point are no longer available (only daily log files; 6 days back is rotated out). If you can hint at what to grep for, I can capture it for the next recurrence.
Expected behavior
When
pending_count > 20ANDoldest_pending_age > 2 * AUDIO_RECONCILIATION_FRESHNESS_DELAY_SECS(20 min), the reconciliation worker should either:screenpipe-app.YYYY-MM-DD.log, which 99% of users will never read.Actual behavior
Reconciliation worker continues to log nothing. Health-check warning fires once per minute but is log-only. Pool is
8/8 idle— workers exist but no chunks are being dispatched to them. Live path proceeds independently (new captures still transcribed promptly), masking the problem in the UI.Source-code observations
From
crates/screenpipe-engine/src/routes/health.rs:474-518— comment is accurate, the heuristic correctly detects the stall:crates/screenpipe-audio/src/audio_manager/reconciliation.rs:28:So the detection threshold is 20 min stale + 20+ chunks. In my incident, both conditions were satisfied for days, not minutes. The reconciliation worker's scheduling / retry logic is the suspect.
Workaround
Quit and relaunch screenpipe.app. Reconciliation drains automatically over ~5h depending on backlog size (~50 chunks per 2-3 min in my run, cloud rate-limit dependent).Suggested fixes (in roughly increasing complexity)
stalledflag is true for >5 consecutive checks. "Transcription backlog hasn't drained in 5 minutes — restart screenpipe to recover."audio_chunks WHERE status='pending') in2 * RECONCILIATION_FRESHNESS_DELAY_SECS, the watchdog kills and restarts the task.Severity
Medium-High in my judgment. No data lost from disk (audio files remain), but indexed/searchable transcripts can fall days behind without user awareness. For a tool whose value proposition is "every word you said is searchable," silent multi-day staleness is a serious UX failure.
I can help debug next recurrence
I'm a heavy user with a contributor setup ready (
mediar-ai/screenpipecloned, full build environment on Mac Mini). If you want me to drop in additional instrumentation behind a feature flag and run on prod for a week to catch the next stall in the act, happy to.Incident date: 2026-05-21 (drained), backlog onset estimated 2026-05-15
Filed by: @pleasedodisturb
Related: #3466 (port collision / silent recorder failure) — distinct symptom, same general theme of silent failure modes that the health-check sees but the user doesn't.