-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Closed
Copy link
Labels
Description
The thread watchdog is already an important mechanism to detect and recover from coding errors that results in infinite loops, blocking API calls and very long computations in worker threads. There are a few simple improvements that would make the watchdog even more awesome:
- Option to capture a 5sec to 10sec CPU profile after a series of watchdog misses or mega misses, and either write it to disk or make it available via admin interface. If writing to disk, provide parameter for max number of profiles to generate to avoid filling up the disk.
- Option to capture and log the current stack of the watched thread or all thread stacks on mega miss.
- Option to terminate the process by sending SIGABRT to the stuck thread instead calling PANIC on the guarddog thread.
- Registration mechanism for additional callbacks to invoke on watchdog miss or megamiss which could be used to implement some of the prior ideas and/or integrate with third party systems. Callback arguments may include the list of threads that have experienced recent megamiss events and info about when they were last reported alive.
Reactions are currently unavailable