Skip to content

[8.6] added telemetry with most common error from agent logs (#146107)#146507

Merged
juliaElastic merged 1 commit intoelastic:8.6from
juliaElastic:backport-8.6/telemetry-agent-logs
Nov 29, 2022
Merged

[8.6] added telemetry with most common error from agent logs (#146107)#146507
juliaElastic merged 1 commit intoelastic:8.6from
juliaElastic:backport-8.6/telemetry-agent-logs

Conversation

@juliaElastic
Copy link
Copy Markdown
Contributor

Summary

Backport #146107 to 8.6

## Summary

Closes elastic/ingest-dev#1261

Merged: [elasticsearch
change](elastic/elasticsearch#91701) to give
kibana_system the missing privilege to read logs-elastic_agent* indices.

## Top 3 most common errors in the Elastic Agent logs

Added most common elastic-agent and fleet-server logs to telemetry.

Using a query of message field using sampler and categorize text
aggregation. This is a workaround as we can't directly do aggregation on
`message` field.
```
GET logs-elastic_agent*/_search
{
    "size": 0,
    "query": {
        "bool": {
            "must": [
                {
                    "term": {
                        "log.level": "error"
                    }
                },
                {
                    "range": {
                        "@timestamp": {
                            "gte": "now-1h"
                        }
                    }
                }
            ]
        }
    },
    "aggregations": {
        "message_sample": {
            "sampler": {
                "shard_size": 200
            },
            "aggs": {
                "categories": {
                    "categorize_text": {
                        "field": "message",
                        "size": 10
                    }
                }
            }
        }
    }
}
```

Tested with latest Elasticsearch snapshot, and verified that the logs
are added to telemetry:
```
   {
      "agent_logs_top_errors": [
         "failed to dispatch actions error failed reloading q q q nil nil config failed reloading artifact config for composed snapshot.downloader failed to generate snapshot config failed to detect remote snapshot repo proceeding with configured not an agent uri",
         "fleet-server stderr level info time message No applicable limit for agents using default \\n level info time message No applicable limit for agents using default \\n",
         "stderr panic close of closed channel n ngoroutine running Stop"
      ],
      "fleet_server_logs_top_errors": [
         "Dispatch abort response",
         "error while closing",
         "failed to take ownership"
      ]
   }
```

Did some measurements locally, and the query took a few ms only. I'll
try to check with larger datasets in elastic agent logs too.


### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
@juliaElastic juliaElastic self-assigned this Nov 29, 2022
@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Nov 29, 2022
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/fleet (Team:Fleet)

@juliaElastic juliaElastic added v8.6.1 release_note:skip Skip the PR/issue when compiling release notes labels Nov 29, 2022
@kibana-ci
Copy link
Copy Markdown

💚 Build Succeeded

Metrics [docs]

Unknown metric groups

ESLint disabled in files

id before after diff
osquery 1 2 +1

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
fleet 59 65 +6
osquery 108 113 +5
securitySolution 441 447 +6
total +19

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
fleet 68 74 +6
osquery 109 115 +6
securitySolution 518 524 +6
total +20

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @juliaElastic

@juliaElastic juliaElastic merged commit bfad267 into elastic:8.6 Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v8.6.1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants