Skip to content

[8.6] [Fleet] cancel tasks when 3rd retry failed (#147190)#147230

Merged
kibanamachine merged 1 commit intoelastic:8.6from
kibanamachine:backport/8.6/pr-147190
Dec 8, 2022
Merged

[8.6] [Fleet] cancel tasks when 3rd retry failed (#147190)#147230
kibanamachine merged 1 commit intoelastic:8.6from
kibanamachine:backport/8.6/pr-147190

Conversation

@kibanamachine
Copy link
Copy Markdown
Contributor

Backport

This will backport the following commits from main to 8.6:

Questions ?

Please refer to the Backport tool documentation

## Summary

Related to elastic#144161

Found that on a bulk update tags task failure, the task didn't stop
after 3 retries (should be over in less then a minute), the retries kept
happening for 2 hours.
This change removes the retry task if 3 retries are reached.

Also testing in cloud deployment to see if the tags error can be
reproduced with this fix.
I could reproduce the reported error locally, and seeing it goes away
with this fix.

To verify:
- Add at least 50k agents with the `create_agents` script in kibana repo
- open Kibana, select the 50k agents, and open Actions / Add tags
- Try this in a few seconds: add 2 new tags, and remove one of them
- Wait about 30s, the agents should reflect the changes
- Check the logs to see that the tasks are removed after 3rd retry is
reached or successful.
- Check that there are no more running tasks. Any running task can be
found in Kibana Console by running this query: `GET
.kibana_task_manager/_search?q=task.taskType:"fleet:update_agent_tags:retry"`

Locally simulated an error to test that the retry (and check) task is
removed:

```
[2022-12-07T15:52:16.415+01:00][ERROR][plugins.fleet] Retry elastic#3 of task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b failed: failing task
[2022-12-07T15:52:16.416+01:00][WARN ][plugins.fleet] Stopping after 3rd retry. Error: failing task
[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:check:848984ab-c11d-4ebe-8d1f-606143dd656b
[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b
```

(cherry picked from commit 431c32b)
@kibanamachine kibanamachine added the backport This PR is a backport of another PR label Dec 8, 2022
@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Dec 8, 2022
@kibanamachine kibanamachine enabled auto-merge (squash) December 8, 2022 08:18
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/fleet (Team:Fleet)

@kibana-ci
Copy link
Copy Markdown

💚 Build Succeeded

Metrics [docs]

Unknown metric groups

ESLint disabled in files

id before after diff
osquery 1 2 +1

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
fleet 59 65 +6
osquery 108 113 +5
securitySolution 441 447 +6
total +19

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
fleet 68 74 +6
osquery 109 115 +6
securitySolution 518 524 +6
total +20

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @juliaElastic

@kibanamachine kibanamachine merged commit fd4a6fc into elastic:8.6 Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport This PR is a backport of another PR Team:Fleet Team label for Observability Data Collection Fleet team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants