Skip to content

Circonus metrics adapter#1737

Merged
istio-merge-robot merged 11 commits intoistio:masterfrom
redhotpenguin:circonus
Dec 5, 2017
Merged

Circonus metrics adapter#1737
istio-merge-robot merged 11 commits intoistio:masterfrom
redhotpenguin:circonus

Conversation

@redhotpenguin
Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

This PR adds a metrics adapter for Circonus, similar to the Prometheus and Statsd adapters. It draws from my previous PR for the 0.1 based metrics implementation - this has been updated to use the new handler structure. It uses the ScheduleWork() call per the adapter development guide to schedule flushes to Circonus.

The tests are fashioned after the existing adapter tests. I implemented a version using testenv, but I wasn't able to access the adapter. It looks like it is possible, but will require a bit more understanding on my part, and possible additional test methods.

I've verified that the adapter works as advertised both in mixc client environment as well as with the BookInfo sample in GKE. I included a sample operator config - I didn't see that with the additional adapters, but think it will be useful to the end user.

Please let me know how this looks, any suggestions are welcome.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

Unit tests don't need an http trap submission url, a fake one is part of the testdata config.

Release note:

This change introduces a metrics adapter for Circonus metrics collection.

@istio-testing
Copy link
Copy Markdown
Collaborator

Hi @redhotpenguin. Thanks for your PR.

I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ZackButcher
Copy link
Copy Markdown
Contributor

/ok-to-test

@redhotpenguin
Copy link
Copy Markdown
Contributor Author

/retest

@redhotpenguin
Copy link
Copy Markdown
Contributor Author

Dropping a ping to @geeknoid for a review as suggested reviewer. Looks like the circleci tests that failed are an envoy issue - not sure if related.

Copy link
Copy Markdown
Contributor

@geeknoid geeknoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the adapter.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra line?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialize with struct init syntax instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and additional fmt/linter/doc changes have been pushed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run "go fmt" on the file.

Our linter bot is currently broken, so you won't get dinged during the presubmit tests unfortunately. We're working on fixing that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, surprised Goland didn't gofmt automatically, I thought it had been. Good to know.

Thanks for the feedback; I'll make these requested changes and run the tests again.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to have some logic here to validate your config. is submission_url a well-formed URL? Are metric names supplied, and other fields have valid values, etc.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should give an error if you've been giving a particular MetricType via SetMetricTypes, but your adapter config doesn't specify that particular metric.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adapters don't need to check for this case as it would indicate a bug in the Mixer. So we just assume this works and expect a crash later on due to nil pointers if that's not the case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant map lookup, already did it right above.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be caught at config validation time (assuming you update the Validate method), so at runtime you can just assume it worked.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment applies to other validations here...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if the interaction with the backend were handled synchronously so that errors are reported up. As it is, errors in this flush code will be invisible to the rest of the Istio pipeline.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to tie the circonus-gometrics logger which uses the standard log package into the env.Logger() interface. Log entries are being reported with the async approach for flush events. Let me know if this handles the spirit of what you were after here.

@geeknoid
Copy link
Copy Markdown
Contributor

geeknoid commented Nov 17, 2017 via email

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use standard godoc style here?

// Build does x and y

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted - will make this change. Took this from the adapter developer's guide example.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: we typically don't have large comment headers in our code

@codecov
Copy link
Copy Markdown

codecov bot commented Nov 21, 2017

Codecov Report

Merging #1737 into master will decrease coverage by 0.24%.
The diff coverage is 69.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1737      +/-   ##
==========================================
- Coverage   82.08%   81.84%   -0.25%     
==========================================
  Files         198      192       -6     
  Lines       16320    15919     -401     
==========================================
- Hits        13397    13029     -368     
+ Misses       2480     2428      -52     
- Partials      443      462      +19
Flag Coverage Δ
#broker 47.27% <ø> (ø) ⬆️
#mixer 83.35% <69.23%> (-0.35%) ⬇️
#pilot 80.78% <ø> (+0.2%) ⬆️
#security 92.03% <ø> (ø) ⬆️
Impacted Files Coverage Δ
mixer/adapter/circonus/circonus.go 69.23% <69.23%> (ø)
mixer/adapter/prometheus/server.go 89.58% <0%> (-6.25%) ⬇️
mixer/adapter/denier/denier.go 88.23% <0%> (-3.87%) ⬇️
mixer/adapter/stackdriver/metric/bufferedClient.go 90.32% <0%> (-3.23%) ⬇️
mixer/adapter/list/stringList.go 100% <0%> (ø) ⬆️
mixer/pkg/aspect/descriptors.go 100% <0%> (ø) ⬆️
mixer/adapter/memquota/rollingWindow.go 100% <0%> (ø) ⬆️
mixer/pkg/runtime/init.go
mixer/pkg/runtime/resourceType.go
mixer/pkg/runtime/dispatcher.go
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6889c52...22e1ce3. Read the comment docs.

@redhotpenguin
Copy link
Copy Markdown
Contributor Author

@geeknoid @ijsnellf let me know if you can take a look at the updates I pushed last week. Resolved all fmt/linting issues, and bridged to the env logger so any scheduled events will report in the log (in addition to synchronous events). Thanks.

@geeknoid
Copy link
Copy Markdown
Contributor

geeknoid commented Nov 27, 2017 via email

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the interval be in config?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm manually flushing here, so this needs to be set to 0.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to complain about entries defined in your config, but for which there is no MetricType?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably should synchronously flush here, otherwise your buffered data would never make it out.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the spot, will make that update.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized you're flushing for every HandleMetric call, so there is no buffered state to flush in Close...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just for debugging? This could be called at very high QPS, so flushing on every call seems unfortunate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was intentional; I had the same concern also, but wanted to follow this approach as opposed to letting cgm autoflush and not have the async work wrapped by ScheduleWork as documented in the developer guide. Let me dig into the flush call again and see if there's a way to pass the autoflush call to ScheduleWork.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we use ScheduleWork is so that we can trap and recover from panics in the code, and not crash/deadlock Mixer as a whole for a local adapter failure. This is definitely not very composable with existing packages that spawn their own goroutines.

If you can't wrap their autoflush, then you should probably use env.ScheduleDaemon and start your own auto-flusher goroutine that calls the flush call on a configurable period.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'll work on implementing that and push. Thanks for the feedback here; helps a lot.

@istio-merge-robot
Copy link
Copy Markdown

@redhotpenguin PR needs rebase

@istio-merge-robot istio-merge-robot added needs-rebase Indicates a PR needs to be rebased before being merged do-not-merge/post-submit and removed do-not-merge/post-submit labels Nov 29, 2017
@rshriram
Copy link
Copy Markdown
Member

can you also sync with master, so that you run the right set of circleci unit tests?

* Bridge cgm log to mixer glog implementation to set cgm logger
* Cgm constructor using struct declaration, not field assignment
* Rework function documentation
* Add configuration validation
* Remove testenv dep and update istio api dep
* Linter fixes
* Initialize startup background tasks under ScheduleWork()
* Flush on an interval under ScheduleWork()
* Additional config and metrictype validation
* Initialize struct in block syntax
* Remove Close() test which did not test anything meaningful
* Use a context WithCancel approach to shut down the flush goroutine
* Add validation for FlushInterval less than 1 second
* Add unit test for validation error cases
@rshriram
Copy link
Copy Markdown
Member

rshriram commented Dec 5, 2017

and the funky robot commands don't work with circleci :). They are meant only for prow..

@redhotpenguin
Copy link
Copy Markdown
Contributor Author

@rshriram thanks - still learning the ropes here with the bot commands :)

@redhotpenguin
Copy link
Copy Markdown
Contributor Author

@rshriram all tests pass, thanks for the help getting through that. Looks like the only thing remaining is the lgtm label.

@geeknoid
Copy link
Copy Markdown
Contributor

geeknoid commented Dec 5, 2017

/lgtm
/approve

@istio-merge-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: geeknoid

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@istio-merge-robot
Copy link
Copy Markdown

/test all [submit-queue is verifying that this PR is safe to merge]

@istio-merge-robot
Copy link
Copy Markdown

/test all [submit-queue is verifying that this PR is safe to merge]

@istio-merge-robot
Copy link
Copy Markdown

Automatic merge from submit-queue.

@istio-merge-robot istio-merge-robot merged commit 1362e71 into istio:master Dec 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants