per-worker listener and watchdog stats#8263
Conversation
This PR does a few things: 1) Adds per-worker listener stats, useful for viewing worker connection imbalance. 2) Adds per-worker watchdog miss stats, useful for viewing per worker event loop latency. 3) Misc connection handling cleanups. Part of #4602 Signed-off-by: Matt Klein <mklein@lyft.com>
jmarantz
left a comment
There was a problem hiding this comment.
I just did a quick skim of this and I have some high level questions mostly to educate me...
What is the rough cardinality (order of magnitude) of:
- watched dogs (one per thread?)
- active listeners of various flavors
- connection handlers
I see some potential opportunities to factor out or front-load some name lookups but it may be immaterial of the cardinalities are low or they are all created at server-start as opposed to during requests.
I will dive in more today.
1 per worker
1 per listener (so per-worker stats is listeners X workers)
1 per worker and 1 on the main thread
Yeah I agree we could do better here, but given the low cardinality I'm not sure it matters. I think we could make these stats configurable if we think this is going to be an issue though, but I found these stats to be incredibly useful in debugging potential worker imbalance issues so they seem generally useful to me (but I wanted to get @mergeconflict @oschaaf opinion on that) |
Signed-off-by: Matt Klein <mklein@lyft.com>
mergeconflict
left a comment
There was a problem hiding this comment.
Looks pretty good to me; my only nits are about stat names.
|
/azp run envoy-linux |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
|
@danzh2010 @jmarantz @mergeconflict master merged and comments addressed. PTAL. |
jmarantz
left a comment
There was a problem hiding this comment.
lgtm generally modulo the test nit.
Signed-off-by: Matt Klein <mklein@lyft.com>
danzh2010
left a comment
There was a problem hiding this comment.
LGTM other than a few nits!
e1a0c99
|
@danzh2010 updated per comments. |
This PR does a few things: 1) Adds per-worker listener stats, useful for viewing worker connection imbalance. 2) Adds per-worker watchdog miss stats, useful for viewing per worker event loop latency. 3) Misc connection handling cleanups. Part of envoyproxy#4602 Signed-off-by: Matt Klein <mklein@lyft.com>
This PR does a few things: 1) Adds per-worker listener stats, useful for viewing worker connection imbalance. 2) Adds per-worker watchdog miss stats, useful for viewing per worker event loop latency. 3) Misc connection handling cleanups. Part of envoyproxy#4602 Signed-off-by: Matt Klein <mklein@lyft.com>
This PR does a few things: 1) Adds per-worker listener stats, useful for viewing worker connection imbalance. 2) Adds per-worker watchdog miss stats, useful for viewing per worker event loop latency. 3) Misc connection handling cleanups. Part of envoyproxy#4602 Signed-off-by: Matt Klein <mklein@lyft.com>
This PR does a few things:
connection imbalance.
worker event loop latency.
Part of #4602
Risk Level: Medium
Testing: Modified UTs and new integration test
Docs Changes: Added
Release Notes: Added