-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Problem
The Dashboard and Lite apps alert on blocking, deadlocks, and high CPU, but don't alert on dangerous wait type spikes that signal imminent trouble — THREADPOOL exhaustion, memory grant starvation, etc. These "poison waits" are leading indicators of severe problems.
Proposed Solution
Monitor deltas of specific dangerous wait types and alert when they increase significantly.
Poison wait types to monitor:
THREADPOOL— worker thread exhaustion, server stops accepting requestsRESOURCE_SEMAPHORE— memory grant starvation, queries can't get memory to executeRESOURCE_SEMAPHORE_QUERY_COMPILE— can't compile queries due to memory pressure
Potentially also:
CMEMTHREAD— memory object contentionSOS_SCHEDULER_YIELD(sustained high delta) — CPU scheduler saturation
Alert trigger: Configurable delta threshold per collection interval, or a significant increase over baseline (e.g., delta > N ms in a single interval).
Note on THREADPOOL: This is self-defeating — if THREADPOOL is saturated, collections may stop running. The valuable signal is the last successful collection showing THREADPOOL spiking before things go dark. That's the canary in the coal mine.
Context
- Wait stats deltas are already collected in both Dashboard (SQL Server) and Lite (DuckDB)
- Should be implemented before Alerts History view for centralized alert assessment #52 (Alerts History view) so the new alert type is included in the history UI
- CPU alerts currently have minimal context; this gives a richer picture of server distress
Scope
- Full Dashboard
- Lite
- SQL collection scripts (already collected)
- Installer
- Documentation