fix(db): disable mmap + back off redact worker on SQLite corruption by louis030195 · Pull Request #3889 · screenpipe/screenpipe

louis030195 · 2026-06-06T20:39:07Z

Problem

Recurring SQLite corruption: database disk image is malformed (SQLITE_CORRUPT, code 11). It is intermittent and lands on the hottest table (ui_events). When it hits, the redact reconciliation worker retries the failing query every 2s indefinitely, pinning a CPU core and spamming the log. To the user this looks like screenpipe "suddenly using a lot of CPU".

Root cause

Memory-mapped I/O (mmap_size = 256MB) maps the database file writably into screenpipe's address space. screenpipe is a very native-code-dense process (ScreenCaptureKit, CoreAudio taps, the accessibility tree walker, the ONNX runtime for PII/redaction models, the sqlite_vec C extension, FFmpeg). A stray pointer write, buffer overrun, or use-after-free in any of that native code can silently scribble onto the mapped DB pages and write the corruption straight to disk, bypassing SQLite entirely.

This matches every symptom: corruption on the hottest table (its pages are resident in the mmap window), intermittent timing (a rare wild write, not a deterministic bug), and the fact that the otherwise-correct WAL + synchronous=NORMAL configuration does not prevent it. Disabling mmap on corruption is also SQLite's own documented guidance. The other suspects were ruled out by evidence: connection/pragma logic is correct (single-writer semaphore, write queue, WAL pre-conversion), the MCP server only does a brief read-only SELECT, incremental_vacuum is a no-op (auto_vacuum=NONE), and VACUUM INTO writes a separate snapshot file.

Fix

mmap_size = 0 on all device tiers (screenpipe-config/defaults.rs). Buffered I/O through the page cache removes the entire stray-write corruption class. The minor read-throughput cost is the right trade for a capture product where data integrity is paramount (the 64MB page cache stays).
Redact worker backs off on SQLITE_CORRUPT (screenpipe-redact/worker/mod.rs). It now detects non-transient corruption, logs once, and backs off 5 minutes instead of retrying every 2s. This removes the CPU spin even if corruption ever recurs from another source.

Added hardening

Startup integrity check (screenpipe-db/db.rs). DatabaseManager::new() spawns a one-shot background PRAGMA quick_check(1) ~10s after boot. On failure it logs a loud, actionable error pointing at the existing screenpipe db recover command. Backgrounded so it adds zero boot latency on multi-GB databases (quick_check still scans every page). Previously, corruption was only discovered later via worker errors, never surfaced cleanly with the fix command.
Classify SQLITE_NOTADB (screenpipe-db/sqlite_error.rs). "file is not a database" (code 26) is now treated as fatal alongside "malformed", so the write queue drops the poisoned handle instead of cascading errors across the batch.
Test fix (screenpipe-db/tests/db_config_test.rs, by @louis030195). Updated the per-tier assertions to expect mmap_size=0.

Deliberately not auto-running recovery in-process: screenpipe db recover is designed to run as a separate process under a PID lock while the app is closed (the app refuses to boot while the lock is held). Detection plus guidance is the safe layer; auto-heal-on-boot would be a larger, separate change.

Flow

flowchart TB
    subgraph Before["Before: mmap enabled"]
        A1["DB file mapped WRITABLE into process address space"]
        A2["stray native write (capture, CoreAudio, ONNX/PII, sqlite_vec)"]
        A3["corrupt page on disk (ui_events)"]
        A4["redact worker hits SQLITE_CORRUPT"]
        A5["retry every 2s forever: CPU core pinned, log spam"]
        A1 --> A2 --> A3 --> A4 --> A5
    end
    subgraph After["After: mmap=0 + corrupt backoff + boot check"]
        B1["DB NOT mapped writable: buffered I/O via page cache"]
        B2["stray-write corruption path removed"]
        B3["if corrupt anyway: boot quick_check logs recovery hint; worker backs off 5 min"]
        B1 --> B2
    end

Testing

cargo check -p screenpipe-config -p screenpipe-redact -p screenpipe-db: clean.
cargo test -p screenpipe-config db_config: pass.
cargo test -p screenpipe-db sqlite_error: 2 passed (incl. new NOTADB cases).
cargo test -p screenpipe-db --test db_config_test: 5 passed (constructs DatabaseManager, exercising the new startup path; asserts mmap=0 across tiers).
Field validation: a real corrupted DB showing this exact ui_events btree corruption was recovered via .recover + FTS rebuild (integrity_check = ok), confirming the corruption signature and the recovery path.

Risk

mmap_size=0 slightly reduces read throughput vs memory mapping, mitigated by the existing page cache. No schema or API changes. Behavior changes are limited to the DB connection pragma, the redact worker's error backoff, and a read-only background integrity check.

🤖 Generated with Claude Code

Memory-mapped I/O (mmap_size=256MB) maps the SQLite DB file *writably* into screenpipe's address space. A stray write from any native component (capture, CoreAudio, ONNX/PII models, sqlite_vec) can silently corrupt DB pages on disk, surfacing as "database disk image is malformed" (SQLITE_CORRUPT). It is intermittent and lands on the hottest table (ui_events) — the mmap stray-write signature, and why the otherwise-correct WAL + synchronous=NORMAL config does not prevent it. - Set mmap_size=0 on all device tiers (defaults.rs). Buffered I/O via the page cache removes the entire corruption class; the minor read-throughput cost is worth it for a capture product where data integrity is paramount. - Redact reconciliation worker now detects SQLITE_CORRUPT and backs off 5min (logging once) instead of retrying every 2s. The 2s spin on a corrupt DB was pinning a CPU core and spamming the log — the user-visible "sudden high CPU". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

db_config_test still asserted the old per-tier mmap_size values (32/128/256 MB) but DbConfig now sets mmap_size=0 on every tier to prevent DB corruption. Update the three assertions and the doc comment to match. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Corruption previously surfaced only later, via worker query errors (which used to spin a CPU core retrying a malformed DB). Add a one-shot background PRAGMA quick_check(1) ~10s after boot in DatabaseManager::new() that logs a loud, actionable error pointing at `screenpipe db recover` when the DB is malformed. Backgrounded so it adds no boot latency on multi-GB databases (quick_check still scans every page). Also classify SQLITE_NOTADB ("file is not a database", code 26) as fatal alongside "malformed" so the write queue drops the poisoned handle instead of cascading errors across the batch. Unit-tested. Deliberately not auto-running recovery in-process: the existing `screenpipe db recover` is designed to run as a separate process under a PID lock while the app is closed (the app refuses to boot while it is held). Detection plus guidance is the safe layer here. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-08T14:49:52Z

Diarization eval results

Source: crates/screenpipe-audio-eval/evals/ · VoxConverse dev (CC-BY-4.0) + composed workday templates + screenpipe-shaped LibriSpeech fixtures

fixture	DER	VAD FA	VAD FN	boundary err (s)	continuity	predicted / true spk
interrupted_meeting	0.186	0.01	0.063	20.286	0.833	9 / 5
long_silence_day	0.437	0.011	0.145	11.46	0.7	14 / 10
screenpipe_meeting_rapid_handoffs	0.241	0.196	0.099	2.305	1	5 / 3
screenpipe_background_24_7_day	0.315	0.025	0.159	2.203	1	4 / 3
screenpipe_short_backchannels	0.561	0.915	0.064	0.488	n/a	3 / 3
screenpipe_mic_system_echo_leakage	0.275	0.198	0.084	3.045	0.667	5 / 3
screenpipe_overlap_crosstalk	0.254	0.84	0.042	0.667	n/a	3 / 3
abjxc	0.016	0.098	0.002	1.151	n/a	2 / 1
bxpwa	0.111	0.453	0.029	20.793	0.714	8 / 5
dhorc	0.143	0.461	0.034	3.681	1	5 / 4

_{DER, VAD FA, VAD FN, boundary err: lower is better. Continuity: higher is better, 1.0 = same hyp cluster across all silence gaps. Composed workday rows and screenpipe_* rows exercise screenpipe-shaped usage: meetings, background gaps, backchannels, echo leakage, and crosstalk. Raw VoxConverse rows score broadcast-quality stems for comparison. See crates/screenpipe-audio-eval/evals/README.md for methodology.}

Pipeline replay matrix

Source: generated screenpipe_* fixtures materialized into temp screenpipe SQLite DBs, then read back through search_audio. This catches storage/search regressions that pure DER scoring misses.

scenarios	passed	failed	skipped	avg background DER	avg background speaker err	Deepgram
41	40	0	1	0.329	0.183	skip

_{The no-secret CI matrix runs local diarization under Parakeet/Whisper engine labels across live/background and mic/system device profiles. Real Deepgram/screenpipe-cloud smoke can be run locally with --deepgram required when credentials are present.}

Transcription quality

Source: LibriSpeech test-clean (CC-BY-4.0) · per-model utterance cap · normalized lowercased word-level Levenshtein

model	utterances	WER	CER	throughput (samples/s)
tiny	50	0.085	0.033	68707
whisper-large-v3-turbo-quantized	20	0.042	0.009	1924
parakeet	50	0.04	0.026	107024

_{WER + CER on read-aloud speech. Per-model utterance caps keep wall time bounded — tiny/parakeet at 50, the heavier large-v3-turbo-quantized at 20. See README for normalization rules.}

Louis Beaumont and others added 3 commits June 6, 2026 13:37

louis030195 merged commit bb1457a into main Jun 8, 2026
19 of 23 checks passed

louis030195 mentioned this pull request Jun 9, 2026

fix(db): recover write queue from persistent disk-I/O wedges #3953

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(db): disable mmap + back off redact worker on SQLite corruption#3889

fix(db): disable mmap + back off redact worker on SQLite corruption#3889
louis030195 merged 3 commits into
mainfrom
claude/happy-easley-f2eab6

louis030195 commented Jun 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

louis030195 commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Fix

Added hardening

Flow

Testing

Risk

Uh oh!

Uh oh!

github-actions Bot commented Jun 8, 2026

Diarization eval results

Pipeline replay matrix

Transcription quality

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

louis030195 commented Jun 6, 2026 •

edited

Loading