Skip to content

stats: Consider communicating stats across hot-restart via RPC rather than shared memory #4974

@jmarantz

Description

@jmarantz

Description:

The stats architecture has evolved, increasing in complexity, and presenting barriers to people wishing to improve it. The stats-in-shared-memory feature allows continuity of gauges and counters across binary restarts, including binary upgrades. However it enforces flat, fixed-size stat structures that don't scale well when there are large numbers of clusters. Truncation of dynamically generated stats names that exceed the command-line-specified shared-memory per-stat limit adds complexity as well. The current solution involves alternate class implementations for stat memory.

An alternative approach was suggested by @mattklein123 which was to use heap-memory for stats always, and use a paginated RPC to send current gauge/counter data from the old process to the new one during hot restart. This issue can be used to capture design ideas and tradeoffs around that.

One variation on this is to avoid sending counters, as gauges are important to stay accurate. And in any case we'll avoid sending stats where the delta is zero.

[optional Relevant Links:]
https://blog.envoyproxy.io/envoy-stats-b65c7f363342
https://github.com/envoyproxy/envoy/blob/master/source/docs/stats.md

Metadata

Metadata

Assignees

Labels

design proposalNeeds design doc/proposal before implementationno stalebotDisables stalebot from closing an issue

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions