Skip to content

hot restart: send used stats only during hot restart#9121

Merged
mattklein123 merged 7 commits intoenvoyproxy:masterfrom
ramaraochavali:fix/hot_restart_stats
Nov 25, 2019
Merged

hot restart: send used stats only during hot restart#9121
mattklein123 merged 7 commits intoenvoyproxy:masterfrom
ramaraochavali:fix/hot_restart_stats

Conversation

@ramaraochavali
Copy link
Copy Markdown
Contributor

@ramaraochavali ramaraochavali commented Nov 23, 2019

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Description: During hot restart, when copying parent stats, we are not checking if stat is used or not. As a result of this, the gauges of type ACCUMULATED, are copied with zero values even though they are not used on the parent Envoy. Following are some of the example stats

upstream_rq_active
upstream_rq_pending_active
upstream_cx_active
upstream_cx_rx_bytes_buffered
upstream_cx_tx_bytes_buffered

Risk Level: low
Testing: Added
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
…tats

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@mattklein123
Copy link
Copy Markdown
Member

@ramaraochavali thanks, this LGTM, but is this fixing a bug or is this just a perf improvement?

/wait-any

@ramaraochavali
Copy link
Copy Markdown
Contributor Author

ramaraochavali commented Nov 25, 2019

It originated as a bug - We are trying to look the cluster usage from stats i.e. whether a cluster has requests from another service (background : we want to filter unused cluster stats (by looking at stats of type upstrea_rq or upsteam_cx from sending it to our stats backend). I know service specific config is the long term solution for it, while we work on it , in the short term we want to reduce the load on stats backend by filtering these unused cluster stats, as it has come up as an issue recently with so many Envoys running across the board :-) ). In that exercise, we found these stats generated for a cluster which is supposed to not have any requests. Digging deeper, found this as issue. But can help with perf as well - but that was not the original motivation.

@mattklein123
Copy link
Copy Markdown
Member

@ramaraochavali the output was still 0, it just counted as "used." Is that right?

@ramaraochavali
Copy link
Copy Markdown
Contributor Author

@mattklein123 Yes. That is correct. We are checking for existence of the stats "upstream_rq_", "upstream_cx" - not the values actually. If a cluster is not used, we do not see any of those stats.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants