watchdog: Touch Watchdog before executing most event loop callbacks to avoid spurious misses when there are many queued callbacks.#13104
Conversation
… avoid spurious misses when there are many queued callbacks. Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
… problems when execution of the original callback ends up deleting the wrapped callback. The original implementation triggered segfaults when compiling with gcc-10. Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
KBaichoo
left a comment
There was a problem hiding this comment.
Nice, thanks for implementing this! It will make the watchdog system more useful.
Curious if have any data on the performance impact
I'm not fully up to speed on available OSS performance tests, so no - I don't know about the full performance implications of this change with or without #13103 |
KBaichoo
left a comment
There was a problem hiding this comment.
lgtm besides a few nits
I'll defer to senior maintainers regarding any performance tests we have / if we want to measure the impact of this
Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
|
Back from vacation, waiting until #13103 is merged in. |
Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
|
/retest |
|
Retrying Azure Pipelines: |
Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
|
@envoyproxy/senior-maintainers for final review. |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for working on this.
qq: did we consider just doing the touch as part of the prepare callback that runs before each event loop iteration? Would that be good enough as opposed to touching before every individual event? I know you made touching faster but there is still cost so just wondering if we can get most of the benefit by just doing it in the prepare callback (or maybe both pre-post callbacks for each iteration)?
/wait-any
I don't think touching on prepare is enough. |
OK fair enough! |
Commit Message: watchdog: Touch Watchdog before executing most event loop callbacks to avoid spurious misses when there are many queued callbacks.
Additional Description:
This change assumes that the primary purposes of the watchdog is to terminate the proxy if worker threads are deadlocked, and aid in the debugging of long-running operations that end up scheduled in worker threads through monitoring for miss/megamiss events and auxiliary watchdog actions like the recently added watchdog profile action.
Risk Level: medium. more frequent calls to the watchdog could have visible performance implications; mitigated by #13103
Testing: unit
Docs Changes: n/a
Release Notes: added
Runtime guard: n/a
Fixes: #11391