Skip to content

Parquet archive visibility and scheduled compaction#162

Merged
erikdarlingdata merged 1 commit intodevfrom
feature/parquet-archive-queries
Feb 19, 2026
Merged

Parquet archive visibility and scheduled compaction#162
erikdarlingdata merged 1 commit intodevfrom
feature/parquet-archive-queries

Conversation

@erikdarlingdata
Copy link
Owner

Summary

  • Display queries now read from v_ views that UNION hot DuckDB tables with archived parquet files via read_parquet() glob scans — users can see the full 90-day retention window instead of only the 7-day hot data window (Lite: Display queries should include archived parquet data #160)
  • Views created at startup and refreshed after each archive cycle; falls back to table-only view if parquet schema doesn't match
  • Daily database compaction prevents file bloat from DuckDB's append-only storage — exports all tables to a fresh database via ATTACH/CREATE TABLE AS, swaps files, recreates indexes and views (Lite: Add scheduled database compaction to prevent DuckDB bloat #161)
  • 1GB size watchdog logs warnings between compaction cycles

Closes #160, closes #161

Files changed

  • DuckDbInitializer.csCreateArchiveViewsAsync(), CompactAsync(), archive path + table list
  • CollectionBackgroundService.cs — daily compaction timer, size watchdog
  • ArchiveService.cs — refresh views after archival
  • MainWindow.xaml.cs — pass DuckDbInitializer to background service
  • 11 LocalDataService.*.cs files — query v_ views instead of raw tables

Test plan

  • dotnet build -c Debug — 0 warnings, 0 errors
  • All 14 v_ views created successfully
  • Archive round-trip: export 73 rows to parquet, delete from hot table, view still returns all 73 rows
  • Compaction: 19.3 MB → 5.3 MB (73% reduction), all 1,055 rows across 25 tables preserved
  • View fallback: table-only view created when no parquet files exist
  • Launch Lite, verify all tabs populate correctly with views
  • Run for 1+ hour to verify archive cycle refreshes views

🤖 Generated with Claude Code

)

Display queries now read from views that UNION hot DuckDB tables with
archived parquet files, so users can see the full 90-day retention window
instead of only the 7-day hot data window. Views are created at startup
and refreshed after each archive cycle.

Adds daily database compaction to prevent file bloat from DuckDB's
append-only storage. Compaction exports all tables to a fresh database
via ATTACH/CREATE TABLE AS, swaps files, and recreates indexes/views.
Includes a 1GB size watchdog that logs warnings between compaction cycles.

Tested: 24/24 — archive round-trip (export to parquet, delete hot rows,
verify view still returns all rows) and compaction (19MB → 5MB, all 1055
rows preserved across 25 tables).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata merged commit 98fcbff into dev Feb 19, 2026
3 checks passed
erikdarlingdata added a commit that referenced this pull request Feb 19, 2026
Three fixes for "DuckDBOpen failed: Cannot open file" errors introduced
by PR #159 (checkpoint) and PR #162 (compaction):

1. Timer initialization: DateTime.MinValue → DateTime.UtcNow prevents
   compaction/archival from firing on the very first collection cycle

2. Inline checkpoint: moved CHECKPOINT to end of RunDueCollectorsAsync
   using the existing connection pool instead of opening a separate
   DuckDB instance that conflicts via OS file locks

3. Atomic file swap: replaced two-step File.Move in CompactAsync with
   File.Replace (single OS operation, no window where the database file
   is missing) plus retry logic for locked files and WAL cleanup

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata deleted the feature/parquet-archive-queries branch February 20, 2026 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant