Separate bulk upload behavior from CollectorSet#19691
Separate bulk upload behavior from CollectorSet#19691tsullivan merged 36 commits intoelastic:masterfrom
Conversation
pickypg
left a comment
There was a problem hiding this comment.
Let's see if we can work in the start/stop logic for the collector set (or just an enabled/disabled state).
There was a problem hiding this comment.
Is there a reason that this method was added as part of the constructor, while the other two below get called out separately? The old version was its own class method.
There was a problem hiding this comment.
The only reason is because I don't want this class to have a this.collectorSet, which it would need if start() wasn't able to reach the collectorSet via enclosure. It's not a big deal, just my personal preference.
What do you think about that reason?
There was a problem hiding this comment.
I'm fine with the reasoning. Perhaps add a comment to note that's why?
There was a problem hiding this comment.
This is a dangerous race condition. If Kibana starts up before/while Elasticsearch is starting (or happens to be non-responsive for some reason during this time period), then Kibana monitoring is guaranteed to be disabled until Kibana is restarted. Instead, I think that we would be better off stopping/starting the collector based on this logic (or, continuing to blindly start it). Since we can listen to the xpack_main.info object, we should be safe to do that.
Also, by not returning the collectorSet, we force other plugins to validate the CollectorSet's existence before using it in addition to the monitoring plugin being enabled.
There was a problem hiding this comment.
In the next version of this code, collectorSet will be moved to src/server/usage in core Kibana. There, it will be initiated before any of the plugins and core plugins are. Plugins will access it like:
const { collectorSet } = server.usage;
This part of the code looks like a miss in the refactoring - the CollectorSet constructor only needs server passed in, and it needs that for logging.
There was a problem hiding this comment.
collectorSet doesn't need to have conditional instantiation/startup, but the bulk uploader's start method probably does
x-pack/plugins/monitoring/init.js
Outdated
There was a problem hiding this comment.
Since initKibanaMonitoring (as this PR writes it) can return undefined, doesn't this fail hard in that scenario?
x-pack/plugins/monitoring/init.js
Outdated
There was a problem hiding this comment.
Per the previous comments on this area, I still need to handle stopping/restarting the uploader when xpack_main plugin goes red/back to green, and check the info before starting again
- takes out a lot of behavior from CollectorSet and moves it to a class called BulkUploader - simplifies kibana monitoring init by taking out the indirection that startCollectorSet / createCollectorSet had - removes start() method from CollectorSet and calls the collector objects' init() functions from CollectorSet.register() - removes cleanup method from collectorSet since that was doing work for bulk uploading
45c3f1a to
3046ff0
Compare
|
I'm backing off on my comment about breaking out some of the changes to a smaller PR. When I started going that way, the work it took to get those small changes working with no regressions were almost as big as this one. |
3046ff0 to
486c64e
Compare
|
@pickypg I updated code per your feedback about needing a start / stop. I only added a stop method to BulkUploader, the CollectorSet doesn't need start / stop methods. If Elasticsearch is down, we'll have to handle that possibility in the API route handlers that call |
| }); | ||
| } catch(err) { | ||
| req.log(['error'], err); | ||
| req.log(['error'], err); // FIXME doesn't seem to log anything useful if ES times out |
There was a problem hiding this comment.
The FIXME comment here seems to be the case for any API that relies on ES, but I want to look into it more later
| */ | ||
| collectorSet.register(getOpsStatsCollector(server)); | ||
| collectorSet.register(getKibanaUsageCollector(server)); | ||
| collectorSet.register(getSettingsCollector(server)); |
There was a problem hiding this comment.
Registering a collector doesn't need to be delayed on anything. It's only when fetch is called on them that we want to try to make sure Elasticsearch is up. The bulkUploader handles this by watching elasticsearch plugin status. This isn't happening robustly from the API caller yet, but I have some plans to make that better in a later PR.
| ); | ||
| } | ||
|
|
||
| monitoringPlugin.status.green('Ready'); |
There was a problem hiding this comment.
It was not a good idea for an earlier PR to add this - having the monitoring plugin status always be green was supposed to be a feature
| type: KIBANA_STATS_TYPE, | ||
| init, | ||
| fetch: buffer.flush, | ||
| fetchAfterInit: true, |
There was a problem hiding this comment.
I intended to pull the removal of fetchAfterInit and cleanup as properties of the Collector constructor argument object, but getting that working without regressions of its own turned out to be a huge change that would have been temporary.
x-pack/plugins/monitoring/init.js
Outdated
| ['info', LOGGING_TAG, KIBANA_MONITORING_LOGGING_TAG], | ||
| 'Internal collection for Kibana monitoring will is disabled per configuration.' | ||
| ); | ||
| } else if (!xpackMainInfo) { |
There was a problem hiding this comment.
server.plugins.xpack_main.info is a service, so if it's missing it doesn't mean that X-Pack info couldn't be fetched, but that the plugin has changed in some way. I think this block can just be deleted.
| * @return undefined | ||
| */ | ||
| start(collectorSet) { | ||
| this._log.info('Starting monitoring stats collection'); |
There was a problem hiding this comment.
Feels like this should go after the check for this._timer
There was a problem hiding this comment.
not sure why this comment didn't become outdated /shrug
|
|
||
| if (!monitoringBulkEnabled) { | ||
| // debug level to not spam the logs | ||
| this._log.debug(`Skipping fetch and upload of monitoring stats due to monitoring bulk endpoint not being available`); |
There was a problem hiding this comment.
Any reason we shouldn't call this.stop()?
There was a problem hiding this comment.
Continuing to check at an interval protects the possibility that the bulk endpoint in ES can be disabled with a dynamic setting (not sure if that is possible), or that the ES configuration changes without triggering an ES plugin state change in Kibana. That seems like it is possible with hot-swapping ES for one with a different configuration.
There was a problem hiding this comment.
@tsullivan The endpoint will not change based on the license, nor can it be changed dynamically (the endpoint either exists, or it doesn't because xpack.monitoring.enabled: false is explicitly set). However, calling this.stop() would require the xpack_main plugin to go from green to another state, then back to green for the service to be restarted, which is not a requirement for a license change.
Relative to the next comment, I think that is the superior option and then we can add this logic there.
x-pack/plugins/monitoring/init.js
Outdated
| if (kibanaCollectionEnabled && xpackMainInfo) { | ||
| const bulkUploader = initBulkUploader(monitoringPlugin.kbnServer, server, xpackMainInfo); | ||
|
|
||
| xpackMainPlugin.status.on('green', async () => { // any time xpack_main turns green |
There was a problem hiding this comment.
Would be great to switch this over to xpackInfo.onLicenseInfoChange() once #20018 is merged
There was a problem hiding this comment.
With that listener support, I think we can safely do that and I think we can then back out the license checking code from the BulkUploader itself per collection and put it back into such a listener.
| const data = await collectorSet.bulkFetch(this._callClusterWithInternalUser); | ||
| const usableData = data.filter(d => Boolean(d) && !isEmpty(d.result)); | ||
| const payload = usableData.map(({ result, type }) => { | ||
| if (!isEmpty(result)) { |
There was a problem hiding this comment.
This condition is redundant because of the .filter() right?
There was a problem hiding this comment.
I think it would also become more readable to just change this to
const payload = data.filter(d => Boolean(d) && !isEmpty(d.result)).map(({ result, type }) => [{ index: { _type: type } }, result]);|
unrelated visualization test failure (Vega) jenkins test this |
|
different unrelated failures this time: |
|
jenkins test this |
💚 Build Succeeded |
spalger
left a comment
There was a problem hiding this comment.
One last thing, but otherwise looks good!
x-pack/plugins/monitoring/init.js
Outdated
| monitoringPlugin.status.green('Ready'); | ||
| // xpack main plugin status goes red if ES client lost connection | ||
| xpackMainPlugin.status.on('red', () => { | ||
| bulkUploader.handleConnectionLost(); |
There was a problem hiding this comment.
I think this is unnecessary and will cause problems because we are stopping the bulkUploader without starting again. Instead I think the bulk uploader should just try to talk to elasticsearch and fail if it can't. If we really want to get rid of the errors while Kibana is a red status we can check xpackMainPlugin.status.state !== 'red' before logging about errors talking to elasticsearch or something, but I suggest avoiding toggling the bulkUploader based on plugin status.
There was a problem hiding this comment.
stopping the bulkUploader without starting again
It will get restarted when the the xpack main plugin status change causes the license info to change, but yeah - that is a pretty indirect signal.
we can check
xpackMainPlugin.status.state !== 'red'
I would have liked to not log avoidable errors, and checking the xpack main plugin state feels like something that should be handled by the platform with. Earlier in this code, I was passing xpackMainInfo into the bulk uploader constructor, and I really like that I'm not doing that in the latest version.
It sounds like checking the status to avoid logging errors is a nice-to-have. I like the idea of simplifying the code, so I'll just remove this status handler.
💚 Build Succeeded |
|
spencer gave verbal lgtm |
* Separate bulk upload behavior from CollectorSet
- takes out a lot of behavior from CollectorSet and moves it to a class called BulkUploader
- simplifies kibana monitoring init by taking out the indirection that startCollectorSet / createCollectorSet had
- removes start() method from CollectorSet and calls the collector objects' init() functions from CollectorSet.register()
- removes cleanup method from collectorSet since that was doing work for bulk uploading
* remove cleanup and fetchAfterInit methods
* test for bulk_uploader class
* improve test for collector_set
* fix reporting
* expose collectorSet if there actually is a collectorSet
* comment for enclosed function
* make collectorSet creation/expose unconditional, bulkUploader more conditional
* fix collector_set tests
* lifecycle events
* stab at collectorSet error logging from the API call
* clean up comments
* clean up comments
* fix BulkUploader mocha test
* check kibanaCollectionEnabled config before registering bulk upload and the plugin status listeners
* no singleton timer object
* just log a warning if bulk uploader start called twice
* normal quotes
* check if bulk is enabled inside of the _fetchAndUpload method
* log for stopping bulk stats
* call bulkUploader.start with the collectorSet object
* call bulkUploader.start with the collectorSet object
* roll back change for module scoped variable
* oops I broke init
* init and logging: if / elseif / elseif
* remove unnecessary check/log
* help log
* remove redundant, use data.filter.map
* use xpackInfo.onLicenseInfoChange not xpackMainPlugin.status.on('green')
* help logging
* fix unit test
* remove handler that stops upload when connection is lost
* Separate bulk upload behavior from CollectorSet
- takes out a lot of behavior from CollectorSet and moves it to a class called BulkUploader
- simplifies kibana monitoring init by taking out the indirection that startCollectorSet / createCollectorSet had
- removes start() method from CollectorSet and calls the collector objects' init() functions from CollectorSet.register()
- removes cleanup method from collectorSet since that was doing work for bulk uploading
* remove cleanup and fetchAfterInit methods
* test for bulk_uploader class
* improve test for collector_set
* fix reporting
* expose collectorSet if there actually is a collectorSet
* comment for enclosed function
* make collectorSet creation/expose unconditional, bulkUploader more conditional
* fix collector_set tests
* lifecycle events
* stab at collectorSet error logging from the API call
* clean up comments
* clean up comments
* fix BulkUploader mocha test
* check kibanaCollectionEnabled config before registering bulk upload and the plugin status listeners
* no singleton timer object
* just log a warning if bulk uploader start called twice
* normal quotes
* check if bulk is enabled inside of the _fetchAndUpload method
* log for stopping bulk stats
* call bulkUploader.start with the collectorSet object
* call bulkUploader.start with the collectorSet object
* roll back change for module scoped variable
* oops I broke init
* init and logging: if / elseif / elseif
* remove unnecessary check/log
* help log
* remove redundant, use data.filter.map
* use xpackInfo.onLicenseInfoChange not xpackMainPlugin.status.on('green')
* help logging
* fix unit test
* remove handler that stops upload when connection is lost
Part of #19534
Closes #19567
This PR prepares the way for the CollectorSet object to be moved to kbn_server in core and instantiated before the Kibana plugins are.
Changes
Testing:
One of the most important things to ensure with this PR is that Kibana monitoring and telemetry still works.
/api/_xpack/usageHTTP endpoint still shows all the same information as it does in the master branch.Kibana Ops monitoring resumes even if Elasticsearch was down for a minute (Kibana server did not restart):
