[data.search] Use incrementCounter for search telemetry#91230
[data.search] Use incrementCounter for search telemetry#91230lukasolson merged 10 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/kibana-app-services (Team:AppServices) |
TinaHeiligers
left a comment
There was a problem hiding this comment.
LGTM from a telemetry POV.
As for
"reset" the values for each of the properties stored in the saved object
You'd need a SO migration for that unless we simply throw the 'old' SO out and recreate the new one after the upgrade. I speak under correction here though, I'm not a SO expert! @Bamieh might be able to give more advice.
| trackSuccess: async (duration: number) => { | ||
| const repository = await getRepository(); | ||
| await repository.incrementCounter(SAVED_OBJECT_ID, SAVED_OBJECT_ID, [ | ||
| { fieldName: 'successCount' }, |
There was a problem hiding this comment.
IIRC, we only migrate SO between versions id the data from one version to the next needs to be persisted (i.e. end-user facing SO). Since telemetry-related SO is fetched once every 24 hours and indexed and that hasn't changed, it doesn't really matter if we blow the SO object out the water and reinitialize a new one when the cluster upgrades. If you did want to persist the data, then I guess you'd need to add a migration. @Bamieh WDYT?
There was a problem hiding this comment.
I agree with tina. if a transformation is not possible of the old data simply using a new saved object id and dropping this one is fine. We can drop the existing data in the same savedobject type via migrations.
const migrate713 = (doc) => {
return {
...doc,
attributes: {},
};
};
There was a problem hiding this comment.
I've updated this PR with a migration as suggested, @Bamieh could you take one more look? Thanks!
|
@elasticmachine merge upstream |
| }, | ||
| ]); | ||
| } catch (e) { | ||
| if (e.statusCode === 409 && retryCount < MAX_RETRY_COUNT) { |
There was a problem hiding this comment.
Can you use SavedObjectsErrorHelpers.isConflictError(err) ?
| { fieldName: 'errorCount' }, | ||
| ]); | ||
| } catch (e) { | ||
| if (e.statusCode === 409 && retryCount < MAX_RETRY_COUNT) { |
There was a problem hiding this comment.
SavedObjectsErrorHelpers.isConflictError(err)
lizozom
left a comment
There was a problem hiding this comment.
I would use SavedObjectsErrorHelpers.isConflictError(err) to detect conflicts, but otherwise LGTM
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: |
* [data.search] Use incrementCounter for search telemetry * Update reported type * Retry conflicts * Fix telemetry check * Use saved object migration to drop previous document * Review feedback * Fix import Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
* [data.search] Use incrementCounter for search telemetry * Update reported type * Retry conflicts * Fix telemetry check * Use saved object migration to drop previous document * Review feedback * Fix import Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
) * [data.search] Use incrementCounter for search telemetry * Update reported type * Retry conflicts * Fix telemetry check * Use saved object migration to drop previous document * Review feedback * Fix import Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
) * [data.search] Use incrementCounter for search telemetry * Update reported type * Retry conflicts * Fix telemetry check * Use saved object migration to drop previous document * Review feedback * Fix import Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
* master: (111 commits) [Logs UI] Replace dependencies in the infra bundle (elastic#91503) [Search Source] Do not request unmapped fields if source filters are provided (elastic#91921) [APM] Kql Search Bar suggests values outside the selected time range (elastic#91918) Refactored component edit policy tests into separate folders and using client integration testing setup (elastic#91657) [Fleet] Don't error on missing package_assets value (elastic#91744) [Lens] Pass used histogram interval to chart (elastic#91370) [Indexpattern management] Use indexPatterns Service instead of savedObjects client (elastic#91839) [Security Solutions] Fixes Cypress tests for indicator match by making the selectors more specific (elastic#91947) [CI] backportrc can skip CI (elastic#91886) Revert "[SOM] fix flaky suites (elastic#91809)" [Fleet] Install Elastic Agent integration by default during setup (elastic#91676) [Fleet] Silently swallow 404 errors when deleting ingest pipelines (elastic#91778) [data.search] Use incrementCounter for search telemetry (elastic#91230) [Fleet] Bootstrap functional test suite (elastic#91898) [Alerts][Docs] Added API documentation for alerts plugin (elastic#91067) Use correct environment in anomaly detection setup link (elastic#91877) [FTSR] Convert to tasks and add jest/api integration suites (elastic#91770) [CI] Build and publish storybooks (elastic#87701) docs: add PHP agent info to docs (elastic#91773) [DOCS] Adds and updates Visualization advanced settings (elastic#91904) ...
Summary
Resolves #89744.
When reporting
searchtelemetry, we query a saved object (typesearch-telemetry), which, prior to this PR, hadsuccessCount,errorCount, andaverageDuration, and forward these to the telemetry cluster.When search requests complete, prior to this PR, we would query the saved object, and update each of these properties in the saved object. This often resulted in many
409 conflicterrors, because of race conditions in updating the saved object. In addition, there was a bug in how we calculated theaverageDuration.This PR updates the collection to use
incrementCounter, which uses a script to update the counts instead of querying/updating the saved object, which should get rid of the race condition. It also replacesaverageDurationin the collected saved object withtotalDuration, and usesincrementCounterto increment that as well. Then, when reporting to the telemetry cluster, we query the saved object, and calculate theaverageDuration, which is then reported to the telemetry cluster.Open questions
Since
averageDurationwas being calculated incorrectly, and since the saved object will now havetotalDurationinstead, we need to "reset" the values for each of the properties stored in the saved object. How do we accomplish this?