Skip to content

stats: add new BoolIndicator stat type#5813

Merged
mattklein123 merged 11 commits intoenvoyproxy:masterfrom
fredlas:STT_generic_stat_type
Mar 4, 2019
Merged

stats: add new BoolIndicator stat type#5813
mattklein123 merged 11 commits intoenvoyproxy:masterfrom
fredlas:STT_generic_stat_type

Conversation

@fredlas
Copy link
Copy Markdown
Contributor

@fredlas fredlas commented Feb 1, 2019

Description: Add the BoolIndicator stat type. A followup PR will migrate a number of gauges that only ever set 1 or 0, meaning them as true/false. Beyond adding more meaningful type distinction to stats, this will get us closer to consist logic for migrating/combining gauges during hot restart.

I chose BoolIndicator because Bool/BoolSharedPtr is not obviously something other than slightly-wrapped plain old data, and the stats classes' naming conventions would call for a function named bool(). On the other hand, I wasn't entirely sure if just Indicator would imply boolean strongly enough to make everyone happy.

This PR also converts all applicable 1-or-0 Gauges I could find to BoolIndicator.

Risk Level: medium
Testing: roughly duplicated every bit of test logic involving the word "gauge", added test verifying that gauges convert cleanly to bools across a hot restart.

#4974 #5790

Signed-off-by: Fred Douglas <fredlas@google.com>
…type

Signed-off-by: Fred Douglas <fredlas@google.com>
@fredlas
Copy link
Copy Markdown
Contributor Author

fredlas commented Feb 1, 2019

I tried migrating GrpcMux/GrpcStream's connected_state gauge over to this new type, as an example to confirm that it works in the actual code. I have left those changes out of this PR, though.

…type

Signed-off-by: Fred Douglas <fredlas@google.com>
@fredlas
Copy link
Copy Markdown
Contributor Author

fredlas commented Feb 5, 2019

I left the return type of value() as uint64_t, assuming that people could potentially have hard-to-change stuff built around that. But, taking a look at Google's use of them, I see we were actually already wanting a bool, and just doing a "> 0" to get it.

So, is it reasonable to change the return type to bool, and expect people to adapt their code?

@stale
Copy link
Copy Markdown

stale bot commented Feb 12, 2019

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Feb 12, 2019
@mattklein123
Copy link
Copy Markdown
Member

@jmarantz are you planning on taking a pass on this one?

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Feb 13, 2019
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many apologies for the slow review. No excuses.

Are we also going change some existing stat that is boolean in concept to use one of these?

I'm out for a bit so I'm good with this if we change value() to bool. @mattklein123 should take a pass afterward as there may be subtleties I missed.

@fredlas
Copy link
Copy Markdown
Contributor Author

fredlas commented Feb 19, 2019

No problem, there's nothing waiting on this!

Yes, there are several existing stats that will become this type. They are all among #5790 (comment)

@jmarantz
Copy link
Copy Markdown
Contributor

@fredlas I think it's better to change the type now. The call-sites will need to be explicitly changed anyway from Gauge& to BoolIndicator& so that's a good time also change the value() references to expect a bool.

@fredlas
Copy link
Copy Markdown
Contributor Author

fredlas commented Feb 27, 2019

Thanks, that makes sense.

Actually, now having gone further with the main hot restart work, I'm starting to think this change might also need to be part of the big hot-restart-breaking PR. Not sure yet, though.

@jmarantz
Copy link
Copy Markdown
Contributor

I think this doesn't have to be hot-restart breaking. Just lay out the BoolIndicator impl struct exactly the same as Gauge's impl, and ensure in the impl that the bit-patterns are the same.

Leave a TODO to simplify the BoolIndicator impl once hot-restart does not use shared mem.

@fredlas
Copy link
Copy Markdown
Contributor Author

fredlas commented Feb 28, 2019

Oh, yeah, you're totally right; as long as this change goes in before #5910 it's very straightforward. I was too far into the #5910 mindset when I wrote that!

@jmarantz
Copy link
Copy Markdown
Contributor

And in fact you could write an evil unit-test to prove it, by reinterpret_cast-ing a BoolIndicator& to a Gauge& and making sure the data written to one is correctly read by the other :)

…type

Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
@fredlas
Copy link
Copy Markdown
Contributor Author

fredlas commented Feb 28, 2019

I gave that test a try, and it didn't work, I'm guessing because the Gauge and BoolIndicator classes are laid out a bit differently (like BoolIndicator missing add/sub/inc/dec). Anyways, PTAL.

@jmarantz
Copy link
Copy Markdown
Contributor

jmarantz commented Mar 1, 2019

@fredlas I think my reinterpret_cast suggestion was a little misleading (ie wrong). Casting between the Gauge* and the BoolIndicator* doesn't make sense. It makes more sense to construct a GaugeImpl<RawStatData> and BoolIndicatorImpl<RawStatData> using the same RawStatData object declared in a test method. You should be able to then use the Gauge's write method and read from the BoolIndicator and vice versa. Does that make more sense?

fredlas added 2 commits March 1, 2019 11:06
Signed-off-by: Fred Douglas <fredlas@google.com>
…type

Signed-off-by: Fred Douglas <fredlas@google.com>
@fredlas
Copy link
Copy Markdown
Contributor Author

fredlas commented Mar 1, 2019

Yup, test added. Seeing the test pass, I think I get it, and am now largely confident that the gauge->indicator auto-conversion will work as desired. This whole stats store setup is very interesting/spooky. I didn't fully understand until now the extent to which stat name strings are important. It's like it's just a string->[raw chunk of memory] map, and the counter vs gauge classes are just syntactic sugar hiding conceptual casts of that raw memory.

In this most recent commit, I also converted every actually-a-bool gauge I could find.

PTAL!

jmarantz
jmarantz previously approved these changes Mar 1, 2019
:widths: 1, 1, 2

connected_state, Gauge, A boolean (1 for connected and 0 for disconnected) that indicates the current connection state with management server
connected_state, BoolIndicator, Current connection state with management server
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you might have accidentally deleted part of the sentence?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine either way, but please be consistent in lds.rst above.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are different columns in a table - the commas are delimiters.

Matt, by consistent do you mean control_plane.connected_state vs just connected_state? If so, that difference was already there, so I left it as is - is it better to have the "control_plane." prefix, or no prefix?

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, super awesome.

/wait

:widths: 1, 1, 2

connected_state, Gauge, A boolean (1 for connected and 0 for disconnected) that indicates the current connection state with management server
connected_state, BoolIndicator, Current connection state with management server
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine either way, but please be consistent in lds.rst above.

fredlas added 2 commits March 1, 2019 18:21
Signed-off-by: Fred Douglas <fredlas@google.com>
…type

Signed-off-by: Fred Douglas <fredlas@google.com>
…type

Signed-off-by: Fred Douglas <fredlas@google.com>
@fredlas
Copy link
Copy Markdown
Contributor Author

fredlas commented Mar 4, 2019

Merged to get whatever PR fixed the build breakage.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with small comment comment. LMK what you think... Thank you!

Signed-off-by: Fred Douglas <fredlas@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@mattklein123 mattklein123 merged commit 67d1eb4 into envoyproxy:master Mar 4, 2019
fredlas added a commit to fredlas/envoy that referenced this pull request Mar 5, 2019
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
mattklein123 added a commit that referenced this pull request Mar 13, 2019
This reverts commit 67d1eb4.

Signed-off-by: Matt Klein <mklein@lyft.com>
mattklein123 added a commit that referenced this pull request Mar 13, 2019
This reverts commit 67d1eb4.

Signed-off-by: Matt Klein <mklein@lyft.com>
spenceral added a commit to spenceral/envoy that referenced this pull request Mar 20, 2019
* master: (59 commits)
  http fault: add response rate limit injection (envoyproxy#6267)
  xds: introduce initial_fetch_timeout option to limit initialization time (envoyproxy#6048)
  test: fix cpuset-threads tests (envoyproxy#6278)
  server: add an API for registering for notifications for server instance life… (envoyproxy#6254)
  remove remains of TestBase (envoyproxy#6286)
  dubbo_proxy: Implement the routing of Dubbo requests (envoyproxy#5973)
  Revert "stats: add new BoolIndicator stat type (envoyproxy#5813)" (envoyproxy#6280)
  runtime: codifying runtime guarded features (envoyproxy#6134)
  mysql_filter: fix integration test flakes (envoyproxy#6272)
  tls: update BoringSSL to debed9a4 (3683). (envoyproxy#6273)
  rewrite buffer implementation to eliminate evbuffer dependency (envoyproxy#5441)
  Remove the dependency from TimeSystem to libevent by using the Event::Scheduler abstraction as a delegate. (envoyproxy#6240)
  fuzz: fix use of literal in default initialization. (envoyproxy#6268)
  http: add HCM functionality required for rate limiting (envoyproxy#6242)
  Disable mysql_integration_test until it is deflaked. (envoyproxy#6250)
  test: use ipv6_only IPv6 addresses in custom cluster integration tests. (envoyproxy#6260)
  tracing: If parent span is propagated with empty string, it causes th… (envoyproxy#6263)
  upstream: fix oss-fuzz issue envoyproxy#11095. (envoyproxy#6220)
  Wire up panic mode subset to receive updates (envoyproxy#6221)
  docs: clarify xds docs with warming information (envoyproxy#6236)
  ...
@fredlas fredlas deleted the STT_generic_stat_type branch October 9, 2019 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants