Support scheduled actions and cancellation#419
Conversation
Support scheduled actions by adding a new queue that actions will be added to/removed from before they are sent to the dispatcher. The queue is a priority queue (ordered by start_time). fleet_gateway is responsible for syncing the queue to storage. Cancellation of an action will be handled by a new action dispatcher that will remove actions from the queue (if any) and update the targetID action status. TODO - cancel handler - action expiration - fleet_gateway tests
|
This pull request does not have a backport label. Could you fix it @michel-laterman? 🙏
NOTE: |
|
|
||
| // FleetAction represents an action from fleet-server. | ||
| // should copy the action definition in fleet-server/model/schema.json | ||
| type FleetAction struct { |
There was a problem hiding this comment.
I wanted to separate what we receive from fleet as far as check ins go instead of using ActionApp to better delineate concerns
| errMsg = fmt.Sprintf("failed to persist action_queue, error: %s", err) | ||
| f.log.Error(errMsg) | ||
| f.statusReporter.Update(state.Failed, errMsg, nil) | ||
| // TODO should we handle this failure differently? |
There was a problem hiding this comment.
Do we want to do anything differently if we are unable to save state here?
| return nil | ||
| } | ||
| h.log.Info("Cancel action id: %s target id: %s removed %d action(s) from queue.", action.ActionID, action.TargetID, n) | ||
| // TODO ack action.TargetID as failed |
There was a problem hiding this comment.
What do we want to call when we have cancelled a queued action successfully?
bc04209 to
175973c
Compare
175973c to
5ec423e
Compare
🌐 Coverage report
|
ph
left a comment
There was a problem hiding this comment.
@michel-laterman I answered a few inline question here, Can I do an early review in draft or should I wait?
|
/test |
|
This pull request is now in conflicts. Could you fix it? 🙏 |
AndersonQ
left a comment
There was a problem hiding this comment.
I don't see any blockers, but I left some suggestions.
| } | ||
| // Dispatch cancel actions | ||
| if len(cancelActions) > 0 { | ||
| if err := f.dispatcher.Dispatch(context.Background(), f.acker, cancelActions...); err != nil { |
There was a problem hiding this comment.
[Question | Suggestion]
Do we expect to have a context to pass to Dispatch at some point? If so, I'd suggest to use context.TODO() to indicate that.
There was a problem hiding this comment.
ensuring proper contexts is a larger issue then what I would like to do for this PR. I've made #464 to track it
| assert.Equal(t, "testid", action.ActionID) | ||
| assert.Equal(t, ActionTypePolicyChange, action.ActionType) | ||
| assert.NotNil(t, action.Policy) |
There was a problem hiding this comment.
[Suggestion]
Check the return of StartTime and Expiration to ensure the given start_time and expiration are ignored.
There was a problem hiding this comment.
turns out actions will have expiration times, however that is only used (by elastic agent) when searching ES for actions
|
This pull request is now in conflicts. Could you fix it? 🙏 |
Co-authored-by: Anderson Queiroz <me@andersonq.me>
|
i've ignored the linter in order to get it in for 8.3 |
What does this PR do?
Support scheduled actions by adding a new queue that actions will be
added to/removed from before they are sent to the dispatcher. The queue
is a priority queue (ordered by start_time). fleet_gateway is
responsible for syncing the queue to storage. Cancellation of an action
will be handled by a new action dispatcher that will remove actions from
the queue (if any) and update the targetID action status.
Why is it important?
Fleet-server should be able to inform agents of actions that are scheduled to start at a later time.
Currently the only supported action is an upgrade. This is to allow users to schedule upgrades during maintenance windows.
Checklist
CHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.How to test this PR locally
Note: For testing I altered some logging statements locally to better surface queue interactions (and scheduling in fleet-server.
Setup
Running Elasticsearch, Kibana, and elastic-agent/fleet-server instance locally.
Create a role in Elasticsearch that allowed a user to insert actions into
.fleet-actions:Queue Success
Using a user with the
fleet-accessrole, index a new action for the elastic-agent:Validate
fleet-serverwas propagating start-timeValidate elastic-agent was queuing it locally:
The checkin after
start_timewill execute a queued action:Canceling an action
Note: agent/fleet-server was reinstalled for this test
Created an upgrade request for a future time:
Ensure the agent queued it:
Ensure it is persisted in
state.yml:Send a
CANCELaction:Note that fleet-server queries for actions include an expiration range. All actions must have a valid expiration
Ensure the agent cancels the action:
Check
state.yml, theaction_queueattribute is no longer present.Action expiration
Create an upgrade action with expiration < start time, and expiration at a future date:
Validate that the action is queued:
Validate the action is cancelled after
start_time:Related issues