added telemetry with most common error from agent logs#146107
Merged
juliaElastic merged 4 commits intoelastic:mainfrom Nov 29, 2022
Merged
added telemetry with most common error from agent logs#146107juliaElastic merged 4 commits intoelastic:mainfrom
juliaElastic merged 4 commits intoelastic:mainfrom
Conversation
Contributor
Author
|
@elasticmachine merge upstream |
Contributor
|
Pinging @elastic/fleet (Team:Fleet) |
Contributor
Author
|
@elasticmachine merge upstream |
💚 Build Succeeded
Metrics [docs]Unknown metric groupsESLint disabled in files
ESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: |
juliaElastic
added a commit
to juliaElastic/kibana
that referenced
this pull request
Nov 29, 2022
## Summary Closes elastic/ingest-dev#1261 Merged: [elasticsearch change](elastic/elasticsearch#91701) to give kibana_system the missing privilege to read logs-elastic_agent* indices. ## Top 3 most common errors in the Elastic Agent logs Added most common elastic-agent and fleet-server logs to telemetry. Using a query of message field using sampler and categorize text aggregation. This is a workaround as we can't directly do aggregation on `message` field. ``` GET logs-elastic_agent*/_search { "size": 0, "query": { "bool": { "must": [ { "term": { "log.level": "error" } }, { "range": { "@timestamp": { "gte": "now-1h" } } } ] } }, "aggregations": { "message_sample": { "sampler": { "shard_size": 200 }, "aggs": { "categories": { "categorize_text": { "field": "message", "size": 10 } } } } } } ``` Tested with latest Elasticsearch snapshot, and verified that the logs are added to telemetry: ``` { "agent_logs_top_errors": [ "failed to dispatch actions error failed reloading q q q nil nil config failed reloading artifact config for composed snapshot.downloader failed to generate snapshot config failed to detect remote snapshot repo proceeding with configured not an agent uri", "fleet-server stderr level info time message No applicable limit for agents using default \\n level info time message No applicable limit for agents using default \\n", "stderr panic close of closed channel n ngoroutine running Stop" ], "fleet_server_logs_top_errors": [ "Dispatch abort response", "error while closing", "failed to take ownership" ] } ``` Did some measurements locally, and the query took a few ms only. I'll try to check with larger datasets in elastic agent logs too. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
juliaElastic
added a commit
that referenced
this pull request
Nov 29, 2022
Contributor
|
For reference, this actually made it into 8.6.0. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes https://github.com/elastic/ingest-dev/issues/1261
Merged: elasticsearch change to give kibana_system the missing privilege to read logs-elastic_agent* indices.
Top 3 most common errors in the Elastic Agent logs
Added most common elastic-agent and fleet-server logs to telemetry.
Using a query of message field using sampler and categorize text aggregation. This is a workaround as we can't directly do aggregation on
messagefield.Tested with latest Elasticsearch snapshot, and verified that the logs are added to telemetry:
Did some measurements locally, and the query took a few ms only. I'll try to check with larger datasets in elastic agent logs too.
Checklist