Skip to content

stats: Use inlined primitive counter and gauge for HostStats in order to reduce per-host memory usage.#8216

Merged
mattklein123 merged 12 commits intoenvoyproxy:masterfrom
antoniovicente:stats_primitive_stats
Sep 25, 2019
Merged

stats: Use inlined primitive counter and gauge for HostStats in order to reduce per-host memory usage.#8216
mattklein123 merged 12 commits intoenvoyproxy:masterfrom
antoniovicente:stats_primitive_stats

Conversation

@antoniovicente
Copy link
Copy Markdown
Contributor

Description: Introduce low memory overhead counter and gauges classes for use in HostStats. These new isolated stats objects are meant to be instantiated inline on stats structs rather than created by name from a stats storage and ref counted. Low-memory stats objects are important in the case of HostStats due to the large number of hosts that a proxy can end up talking to.
Risk Level: Low
Testing: unit test
Docs Changes: n/a
Release Notes: n/a
Fixes #7525

…uce per-host memory usage.

Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
@antoniovicente antoniovicente changed the title Use inlined primitive counter and gauge for HostStats in order to reduce per-host memory usage. stats: Use inlined primitive counter and gauge for HostStats in order to reduce per-host memory usage. Sep 11, 2019
@mattklein123
Copy link
Copy Markdown
Member

@antoniovicente qq: are you still planning on having an option to completely remove host stats (or have them be canary only, a % of hosts, etc.)? Or is this good enough for your current case?

@antoniovicente
Copy link
Copy Markdown
Contributor Author

@antoniovicente qq: are you still planning on having an option to completely remove host stats (or have them be canary only, a % of hosts, etc.)? Or is this good enough for your current case?

Stats overhead is down to 96 bytes per host which seems good enough for now. I'll likely start looking into improvements in other areas next, including possibly thinking about how to share upstream connections across serving threads or optimize route selection.

@mattklein123
Copy link
Copy Markdown
Member

I'll likely start looking into improvements in other areas next, including possibly thinking about how to share upstream connections across serving threads or optimize route selection.

Cool sounds great.

Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome stuff. I have one minor suggestion and a potential behavior difference to check.

Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm modulo minor comments/prefs.

…tats

Signed-off-by: Antonio Vicente <avd@google.com>
…map<string,STAT> to vector<pair<string_view,STAT>>

Signed-off-by: Antonio Vicente <avd@google.com>
…emory usage in non golden build configurations.

Signed-off-by: Antonio Vicente <avd@google.com>
jmarantz
jmarantz previously approved these changes Sep 20, 2019
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few minor suggestions; up to you.

Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
…Gauge

Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@envoyproxy/senior-maintainers for another pass (need non-googler)

@mattklein123 mattklein123 self-assigned this Sep 24, 2019
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is awesome. Quick question.

/wait-any

Stats::StatNameManagedStorage locality_zone_stat_name_;
Stats::IsolatedStoreImpl stats_store_;
HostStats stats_;
mutable HostStats stats_;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: what was the reasoning behind the mutable and const to non-const switch here? Presumably because we were previously violating the constness anyway and this makes it more explicit? It would be pretty cool to not do this. I assume you looked and it's not easy to fix this right now because we modify stats already though a const host?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this change HostStats contained non-const references to counter objects, while now it holds the counters themselves as members.

Consider host.stats().rq_success_.inc() before and after that change:
before: The HostStats returned by Host::stats is const, but rq_success_ is not const so the call to inc() is fine.
after: if HostStats is const its members are also const, attempts to call inc() on the const counter object will fail compilation.

We could instead make the counters and gauges inside HostStats as mutable. Making the stats() method non-const seems challenging due to widespread use of HostConstSharedPtr in load balancers. Calls to stats() on a HostConstSharedPtr followed by increment calls on a counter are widespread, see source/common/router/router.cc for some examples.

Should we consider switch from std::shared_ptr to std::shared_ptr as the primary way to reference Host objects in order to avoid use of mutable?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider switch from std::shared_ptr to std::shared_ptr as the primary way to reference Host objects in order to avoid use of mutable?

I'm not sure what the best thing to do here is. This is a pre-existing problem elsewhere, so it's not something you introduced, it's just never thrilled me that we ended up here and I wanted to make sure it's the same issue. I would say let's just merge this and we can consider further cleanups later if you are so inclined?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thanks!

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@mattklein123 mattklein123 merged commit 3bf4b47 into envoyproxy:master Sep 25, 2019
@antoniovicente antoniovicente deleted the stats_primitive_stats branch September 26, 2019 09:52
danzh2010 pushed a commit to danzh2010/envoy that referenced this pull request Oct 4, 2019
… to reduce per-host memory usage. (envoyproxy#8216)

Signed-off-by: Antonio Vicente <avd@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

stats: high memory use when there are a large number of hosts

3 participants