Skip to content

[Fleet] Implement rolling upgrades for Agent upgrades #130259

@joshdover

Description

@joshdover

Changes not related to rolling upgrade

UI Remove licence restrictions for bulk selection

@criamico #130981
Currently we restrict bulk actions to user with gold plus licence, (note we only restrict this in the UI, API allow bulk actions without validating licence

UI Move the bulk actions from the table to actions button

@criamico #131133
We want to move the bulk actions from the table to an action button near the add agent button when agents are selected

  • Move the actions button in the UI
  • Move the "Clear selection" action besides the selection
    Screenshot 2022-04-19 at 15 28 04

UI/API Add additional version option for "Upgrade" action on agent list

We want to allow the user to specify the version he want to upgrade too, we should allow a user to upgrade to a version <= kibana version.
Also we should not allow to upgrade agents before Fleet server are upgraded

API

  • In the upgrade API
    @criamico [Fleet] Changes to bulk upgrade api for allowing rolling upgrades #131947
    • change the restrictions to the bulk upgrade API that only allow to upgrade to the same version as kibana to <= kibana version.
    • add a restriction that do not allow to upgrade a non Fleet server agent to a version > Fleet server versions .
    • If the fleet server is not upgraded yet, throw with an explicit error message in the API and the user will be able to resolve that error by upgrading each Fleet server.
  • Add a new endpoint that has the list of elastic agent available versions @criamico Enhancement: [Fleet] Maintain versions list for Agent upgrade modal #133309
    Depends on https://github.com/elastic/website-development/issues/9331
    • It provides the list of available version GET /api/fleet/agents/available_upgrade_versions. This API should fetch data from an internal kibana endpoint and fallback to an hardcorded|configured list of version.
    • Filters out any prerelease versions, like 8.0.0-alpha1
    • Filters out any version > current Kibana version
    • Doesn't have any version < 7.17.0 since only 7.17.0+ is supported against Elastic 8.x. Put in some logic or test that breaks the feature if Kibana is bumped to 9.0 so the oldest allowed version can be updated

Note: the eol json doesn't have information about the elastic agent package, we'll likely need to find another source for this info - currently investigating how to get the info from https://www.elastic.co/downloads/past-releases#elastic-agent

UI Misc

  • Remove the beta label for the feature. Currently there's a badge indicating the "experimental" feature on the modal, this can be removed when updating the modal - @criamico:

Screenshot 2022-04-27 at 15 39 03

Rolling upgrade changes

This depends on the .fleet-agent-actions schema to be updated:

UI

  • Update the modal to upgrade the agent to add version selection, user can select a version coming from Fleet API or type one, and pass the version when calling Fleet upgrade API (for bulk upgrade and single agent upgrade). - @criamico [Fleet] Changes to agent upgrade modal to allow for rolling upgrades #132421
    • Use a EuiComboBox to search among the versions instead of the simple dropdown
    • There should be a warning message and upgrade aborted to inform the user that the fleet server needs to be upgraded to the same version (in the cloud we don’t have this issue).
    • Allow user to specify the upgrade window (rollout_duration_seconds) (bulk upgrade only) and pass that to the upgrade API
    • The Maintenance Window explanation should read: Defines the duration of time available to perform the upgrade. The agent upgrades are spread uniformly across this duration in order to avoid exhausting network resources.
    • When only one agent is selected, no maintainance window should be shown
    • When the number of selected agents <= 10, in the maintainance dropdown show the option Immediately (no wait between upgrades)
    • When the number of selected agents > 10 don't show the option Immediately (as it might impact upgrades of big batches)

Screenshot 2022-05-18 at 17 27 52

Upgrade_modal_1
Upgrade_,modal_2

Screenshot 2022-04-19 at 15 18 35

API

@nchaulet

.fleet-agent-actions document for an upgrade (@michel-laterman to confirm)

{
   "action_id": "action2",
   "@timestamp": "..",
   "expiration": "END_DATE",
   "start_time": "START_DATE",
   "minimum_execution_period": 123123,
  "type": "UPGRADE",
  "agents": [ "agent1" ],
  "data": { "version": "8.3.0", "source_uri": "nonmandatory" } 
}
  • Add a current upgrade API GET /api/fleet/current-upgrades [Fleet] Add new API to get current upgrades #132276

    • That API could query for .fleet-actions with the UPGRADE type that are not expired to get the action id and expected number of agent to upgrade and will then query .fleet-actions-results to know how many of these agents completed the upgrade.
  • Add an API to abort current upgrades POST /api/fleet/actions/{upgradeActionId}/cancel. This will create a new .fleet-actions of type CANCEL with the target action id to cancel and the agent ids that should be cancelled (we could reuse the agent ids from the action to cancel to populate that)
    .fleet-agent-actions document for a cancellation (@michel-laterman to confirm) [Fleet] Allow to cancel agent actions #132168

{
   "action_id": "action2",
   "@timestamp": "..",
   "expiration": "..",
  "type": "CANCEL",
  "agents": [ "agent1" ],
  "data": { "target_id": "action1" } 
}

Testing

  • Manual test of upgrade at scale, we can probably use Horde to simulate a lot of agents

Metadata

Metadata

Labels

Team:FleetTeam label for Observability Data Collection Fleet teamv8.3.0

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions