Skip to content

Lite: All collectors should use incremental collection (last collection time cutoff) #147

@erikdarlingdata

Description

@erikdarlingdata

Problem

Dashboard's collectors use incremental collection — each cycle only fetches data since the last collection time. Lite's collectors re-scan fixed time windows every cycle, doing redundant work.

This is the single biggest performance difference between Dashboard and Lite collection.

Query Store collector (avg 2,286ms vs Dashboard's 126ms)

Root cause: Lite scans a fixed 60-minute window with GROUP BY + TOP 100 every cycle. Dashboard tracks MAX(utc_last_execution_time) and only fetches rows newer than that — typically 1-5 minutes of data, no aggregation needed.

Fix: Track last collection time in DuckDB, pass as cutoff to SQL query. Store raw Query Store intervals (like Dashboard does) instead of re-aggregating every cycle. Aggregation moves to the display/trend queries.

Deadlock collector (avg 1,194ms, spiking to 3,837ms)

Two issues:

  1. Ring buffer full parse: The XE ring_buffer query reads the entire buffer (up to 4MB XML) every cycle, then filters by cutoff time inside .nodes() XQuery. SQL Server must parse the full XML before filtering.
  2. XML parsing in DuckDB timing: ExtractVictimSqlText() does XElement.Parse() on every deadlock graph XML inside the DuckDB stopwatch block. This is C# XML parsing being misattributed as DuckDB time, but more importantly it's O(n) full XML parses per collection. Should be done once, outside the appender loop, or deferred to display time.

Fix: Already has @cutoff_time from DuckDB MAX(deadlock_time) — the SQL-side is incremental. The real fix is moving XML parsing out of the insert loop (parse victim text at display time, not collection time) and ensuring the ring buffer doesn't accumulate unbounded events.

All other collectors

Every collector that uses a fixed time window should switch to incremental:

Collector Current Window Fix
query_store Fixed 60 min Track last collection time, fetch only new runtime_stats intervals
query_snapshots Fixed window Track last collection_time
blocked_process_report Fixed window Track last event_time
cpu_utilization Fixed 60 min (ring buffer) Track last sample_time
perfmon_stats Delta-based (already incremental via cumulative counters) OK
wait_stats Delta-based (already incremental via cumulative counters) OK
memory_stats Point-in-time snapshot OK (no window)
file_io_stats Delta-based OK
tempdb_stats Point-in-time snapshot OK
memory_clerks Point-in-time snapshot OK

Pattern

Dashboard uses this pattern in every collector:

-- Get last collection time
SELECT @cutoff_time = MAX(last_execution_time) FROM collect.table;

-- Only fetch new data
WHERE rs.last_execution_time >= @cutoff_time

Lite equivalent:

// Query DuckDB for last collection time
var lastTime = await GetLastCollectionTimeAsync(serverId, "table_name");

// Pass to SQL query
command.Parameters.Add(new SqlParameter("@cutoff_time", SqlDbType.DateTime2) 
    { Value = lastTime ?? DateTime.UtcNow.AddMinutes(-10) });

The deadlock collector already does this (lines 361-376 in RemoteCollectorService.Deadlocks.cs) — extend the same pattern to all collectors that scan time windows.

Additional: Deadlock XML parsing

Move ExtractVictimSqlText() out of the collection hot path:

  • Option A: Store raw XML, extract victim text at display time (lazy)
  • Option B: Parse once per batch outside the appender loop, build a dictionary, then look up during append

Files involved

  • Lite/Services/RemoteCollectorService.QueryStore.cs — biggest win
  • Lite/Services/RemoteCollectorService.Deadlocks.cs — XML parsing
  • Lite/Services/RemoteCollectorService.cs — add shared GetLastCollectionTimeAsync() helper
  • All other RemoteCollectorService.*.cs files with fixed time windows

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions