-
Notifications
You must be signed in to change notification settings - Fork 26
Closed
Description
Problem
Lite connects to all configured servers sequentially. When servers are offline/unreachable, each connection attempt waits ~30s for the TCP timeout before moving on. With 7 offline servers, this eats ~8 minutes per collection cycle, making the effective refresh interval much longer than intended.
Expected behavior
Offline servers should be detected quickly (short timeout or async probing) so they don't block collection from healthy servers.
Possible approaches
- Reduce SQL connection timeout for Lite (e.g. 5s instead of 30s default)
- Skip servers that failed N consecutive times (exponential backoff)
- Collect from servers in parallel instead of sequentially
- Surface server names in collection log (currently only integer hash
server_id, can't tell which server is which)
Additional context
During testing with 8 configured servers (only sql2022 reachable), Lite completed only 4 collection cycles in 40 minutes. The DuckDB collection_log showed 448 errors, all from unreachable servers.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels