Skip to content

per-worker listener and watchdog stats#8263

Merged
mattklein123 merged 11 commits intomasterfrom
per_worker_stats
Sep 20, 2019
Merged

per-worker listener and watchdog stats#8263
mattklein123 merged 11 commits intomasterfrom
per_worker_stats

Conversation

@mattklein123
Copy link
Copy Markdown
Member

This PR does a few things:

  1. Adds per-worker listener stats, useful for viewing worker
    connection imbalance.
  2. Adds per-worker watchdog miss stats, useful for viewing per
    worker event loop latency.
  3. Misc connection handling cleanups.

Part of #4602

Risk Level: Medium
Testing: Modified UTs and new integration test
Docs Changes: Added
Release Notes: Added

This PR does a few things:
1) Adds per-worker listener stats, useful for viewing worker
   connection imbalance.
2) Adds per-worker watchdog miss stats, useful for viewing per
   worker event loop latency.
3) Misc connection handling cleanups.

Part of #4602

Signed-off-by: Matt Klein <mklein@lyft.com>
@mattklein123
Copy link
Copy Markdown
Member Author

@danzh2010 @mergeconflict @jmarantz PTAL

Signed-off-by: Matt Klein <mklein@lyft.com>
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #8263 was synchronize by mattklein123.

see: more, trace.

Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a quick skim of this and I have some high level questions mostly to educate me...

What is the rough cardinality (order of magnitude) of:

  • watched dogs (one per thread?)
  • active listeners of various flavors
  • connection handlers

I see some potential opportunities to factor out or front-load some name lookups but it may be immaterial of the cardinalities are low or they are all created at server-start as opposed to during requests.

I will dive in more today.

@mattklein123
Copy link
Copy Markdown
Member Author

watched dogs (one per thread?)

1 per worker

active listeners of various flavors

1 per listener (so per-worker stats is listeners X workers)

connection handlers

1 per worker and 1 on the main thread

I see some potential opportunities to factor out or front-load some name lookups but it may be immaterial of the cardinalities are low or they are all created at server-start as opposed to during requests.

Yeah I agree we could do better here, but given the low cardinality I'm not sure it matters. I think we could make these stats configurable if we think this is going to be an issue though, but I found these stats to be incredibly useful in debugging potential worker imbalance issues so they seem generally useful to me (but I wanted to get @mergeconflict @oschaaf opinion on that)

Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
Copy link
Copy Markdown

@mergeconflict mergeconflict left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me; my only nits are about stat names.

@mattklein123
Copy link
Copy Markdown
Member Author

/azp run envoy-linux

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
@mattklein123
Copy link
Copy Markdown
Member Author

@danzh2010 @jmarantz @mergeconflict master merged and comments addressed. PTAL.

mergeconflict
mergeconflict previously approved these changes Sep 19, 2019
Copy link
Copy Markdown

@mergeconflict mergeconflict left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

jmarantz
jmarantz previously approved these changes Sep 20, 2019
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm generally modulo the test nit.

Signed-off-by: Matt Klein <mklein@lyft.com>
danzh2010
danzh2010 previously approved these changes Sep 20, 2019
Copy link
Copy Markdown
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than a few nits!

Signed-off-by: Matt Klein <mklein@lyft.com>
@mattklein123 mattklein123 dismissed stale reviews from danzh2010 and jmarantz via e1a0c99 September 20, 2019 16:36
@mattklein123
Copy link
Copy Markdown
Member Author

@danzh2010 updated per comments.

@mattklein123 mattklein123 merged commit 483aa09 into master Sep 20, 2019
@mattklein123 mattklein123 deleted the per_worker_stats branch September 20, 2019 20:05
danzh2010 pushed a commit to danzh2010/envoy that referenced this pull request Sep 24, 2019
This PR does a few things:
1) Adds per-worker listener stats, useful for viewing worker
   connection imbalance.
2) Adds per-worker watchdog miss stats, useful for viewing per
   worker event loop latency.
3) Misc connection handling cleanups.

Part of envoyproxy#4602

Signed-off-by: Matt Klein <mklein@lyft.com>
danzh2010 pushed a commit to danzh2010/envoy that referenced this pull request Oct 4, 2019
This PR does a few things:
1) Adds per-worker listener stats, useful for viewing worker
   connection imbalance.
2) Adds per-worker watchdog miss stats, useful for viewing per
   worker event loop latency.
3) Misc connection handling cleanups.

Part of envoyproxy#4602

Signed-off-by: Matt Klein <mklein@lyft.com>
danzh2010 pushed a commit to danzh2010/envoy that referenced this pull request Oct 4, 2019
This PR does a few things:
1) Adds per-worker listener stats, useful for viewing worker
   connection imbalance.
2) Adds per-worker watchdog miss stats, useful for viewing per
   worker event loop latency.
3) Misc connection handling cleanups.

Part of envoyproxy#4602

Signed-off-by: Matt Klein <mklein@lyft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants