Convert Mixer self-monitoring to use OpenCensus libraries#7989
Convert Mixer self-monitoring to use OpenCensus libraries#7989istio-testing merged 15 commits intoistio:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7989 +/- ##
======================================
+ Coverage 71% 71% +1%
======================================
Files 374 370 -4
Lines 32673 32620 -53
======================================
+ Hits 22906 22937 +31
+ Misses 8760 8692 -68
+ Partials 1007 991 -16
Continue to review full report at Codecov.
|
|
cc: @Ramonza |
geeknoid
left a comment
There was a problem hiding this comment.
This looks pretty good.
Is there a solution to deal with the metrics used by our dependencies? This is an identical problem we had when replacing glog, which led to the istio/glog repo. Do we need something similar here?
mixer/pkg/checkcache/cache.go
Outdated
There was a problem hiding this comment.
Here and elsewhere in this file, I'm not thrilled with "defer" here since it induces an allocation (last I looked)
mixer/pkg/server/server.go
Outdated
There was a problem hiding this comment.
Need to add a unit test to cover this new code.
mixer/pkg/server/server.go
Outdated
mixer/pkg/checkcache/cache.go
Outdated
There was a problem hiding this comment.
I think you want the LastValue aggregation type here. Otherwise, with Sum you will keep adding to a cumulative total and it looks like the actual values you're recording already represent a cumulative total so you really just want to record the value and not total up all the values you ever passed to stats.Record I think.
mixer/pkg/checkcache/cache.go
Outdated
There was a problem hiding this comment.
FWIW, this is the default if no name is specified
mixer/pkg/checkcache/cache.go
Outdated
There was a problem hiding this comment.
This can be coalesced into a single Record call:
cs := cc.cache.Stats()
stats.Record(
context.Background(),
hitsTotal.M(int64(cs.Hits)),
missesTotal.M(int64(cs.Misses)),
...
There was a problem hiding this comment.
Can be coalesced to a single Record call
ffbed4c to
950fb11
Compare
|
Reviewers: PTAL. I believe I have addressed the existing comments (adding unit tests, etc.). I have also updated the |
|
@geeknoid was there something you had in mind re: metrics used by our deps? I was not aware of any dependencies that were using prometheus to monitor anything prior to this PR. I don't know that we need to solve that particular issue right now, but it does raise an interesting question of package instrumentation and playing well with others. @Ramonza does OC have any thoughts on such issues? |
|
/test istio-unit-tests and, of course, linting produces different results on every run... sigh. |
|
Sorry for my unclear comment. I was referring to this from your PR comment:
Can these be addressed too at some point in a systematic way? On a related note, the ControlZ library is currently mining Prometheus metrics for display in the UI. What happens in ControlZ after this PR goes in? |
|
@geeknoid ah, gotcha. Yes, probably. But rather than duplicate that functionality in this PR (the key is really finding some hook to generate them on the fly the same way prometheus does, I believe), I've skipped over providing those in an OpenCensus way. I think, ultimately, it probably makes more sense for OpenCensus to provide those measures than for Istio to build them. For now, skipping them seems fine. @Ramonza thoughts here? RE ControlZ: It still works. We use the default registry (gatherer) for the prometheus exporter here, and that is what is currently powering the |
|
/approve
/lgtm
…On Wed, Aug 22, 2018 at 8:42 AM Douglas Reid ***@***.***> wrote:
@geeknoid <https://github.com/geeknoid> ah, gotcha. Yes, probably. But
rather than duplicate that functionality in this PR (the key is really
finding some hook to generate them on the fly the same way prometheus does,
I believe), I've skipped over providing those in an OpenCensus way. I
think, ultimately, it probably makes more sense for OpenCensus to provide
those measures than for Istio to build them. For now, skipping them seems
fine. @Ramonza <https://github.com/ramonza> thoughts here?
RE ControlZ:
It still works. We use the default registry (gatherer) for the prometheus
exporter here, and that is what is currently powering the metrics portion
of ControlZ.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7989 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AVucHYZ8x4mX7qP9bQWH-hf9VylOus0yks5uTXvvgaJpZM4WAS-9>
.
|
c01f2fb to
c76dd92
Compare
|
Thanks for all the reviews! I've fixed up all of the linter issues and dep diff stuff now. PTAL for final approval. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: geeknoid The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@douglas-reid: The following test failed, say
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This PR is part of an exploration of moving Istio self-monitoring from pure Prometheus libraries to OpenCensus (OC) libraries. The ultimate goal being explored is the possibility of creating an Istio OpenCensus exporter mechanism that would allow flexible export of Istio self-monitoring metrics to different backends (cloudwatch, datadog, stackdriver, etc.).
This PR does the following:
checkcache,runtime)This PR does not replace the Prometheus collectors that provide additional metrics (not from Mixer). These include:
Reviewers:
I am interested in feedback on a few items. First, does this library change make sense? Second, is the way that OC is being used seem appropriate?
Important items to note when reviewing:
OC models things differently to prometheus and this results in some interesting changes:
view.Count. This is primarily because that results in exporting Prometheus metrics of typeCounter. Usingview.Sumresults in untyped prometheus metrics. This ultimately impacts how we use the Stats, asview.Countrecords the number of measurements, not the total of those measurements. I think this is the right thing to do for now, but maybe not. The Prometheus docs include the following related documentation:0s for things that haven't happened, etc.)