-
Notifications
You must be signed in to change notification settings - Fork 25
Closed
Description
Problem
DuckDB's storage engine is append-oriented. When ArchiveService DELETEs old rows after exporting to parquet, the space is marked free but never reclaimed. VACUUM does not fix append-fragmented space. Over time, the database file grows monotonically — we observed 3.8GB for just 35MB of real data (110x bloat), causing 3-7 second collector times and 50-60% CPU.
The only fix is export/reimport (compaction), which we did manually to go from 3.8GB → 323MB.
Proposed solution
Add automated daily compaction to prevent bloat from recurring:
-
Daily compaction — after the hourly archive cycle, once per day:
- Pause collection
- CHECKPOINT (flush WAL)
- Export all tables to a temp DuckDB file
- Close connections, swap files (old → .bak, temp → primary)
- Reopen connections, resume collection
- Delete .bak
-
Size watchdog — log a warning if database file exceeds a threshold (e.g., 1GB) between compaction cycles, so runaway bloat is caught early.
Context
- Compaction takes ~2-5 seconds for a 300MB database
- Collection gap is negligible (at most one 1-minute cycle missed)
VACUUMis not an alternative — it does not reclaim append-fragmented space in DuckDB- The
checkpoint_threshold=1GBsetting (PR DuckDB checkpoint optimization and timing fix #159) prevents checkpoint stalls but does not address long-term file growth
Files involved
Lite/Database/DuckDbInitializer.cs— compaction logic (owns file path and connection string)Lite/Services/CollectionBackgroundService.cs— daily compaction timer alongside existing archive/retention timers
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels