Skip to content

fix: send upgrade action to units before adding it to bkgActions#9634

Merged
pkoutsovasilis merged 5 commits intoelastic:mainfrom
pkoutsovasilis:fix/endpoint_upgrade_action_error
Sep 10, 2025
Merged

fix: send upgrade action to units before adding it to bkgActions#9634
pkoutsovasilis merged 5 commits intoelastic:mainfrom
pkoutsovasilis:fix/endpoint_upgrade_action_error

Conversation

@pkoutsovasilis
Copy link
Copy Markdown
Contributor

@pkoutsovasilis pkoutsovasilis commented Aug 28, 2025

What does this PR do?

This PR refactors the Coordinator.Upgrade API to use functional options (UpgradeOpt) instead of long argument lists, and introduces a preUpgradeCallback.

Specifically:

  • Moves the action proxying of Endpoint upgrades into a callback passed to the coordinator. This ensures that action dispatch only happens after the initial upgrade checks succeed (already upgrading, agent upgradeable, agent upgrade capability allowed, etc.), which is a more logical sequence.
  • Reorders checks inside Coordinator.Upgrade to make the flow more consistent (e.g. first verifying that the coordinator is not already upgrading before continuing).
  • Updates tests and server code to use the new options-based API.

All business logic changes are captured here bcdba7b (+90 -29 lines changed)

Why is it important?

Currently, upgrade actions can remain stuck in bkgActions when tamper protection is enabled (see #9629). This occurs because errors returned from the Endpoint action proxying are not correctly handled, leaving stale entries behind.

By moving this logic into a preUpgradeCallback, any error from proxying is now surfaced through the coordinator’s upgrade flow, which ensures that bkgActions are cleaned up correctly.

I am not fully satisfied with these changes — they are a pragmatic fix until we perform a larger rewrite to centralize upgrade action handling in a single place. Still, they are necessary to resolve the immediate issue of stuck upgrade actions.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

None expected - this change only affects the internal proxying of upgrade actions to Endpoint and does not alter the upgrade request API.

How to test this PR locally

mage unitTest

Related issues

@pkoutsovasilis pkoutsovasilis self-assigned this Aug 28, 2025
@pkoutsovasilis pkoutsovasilis added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team skip-changelog backport-active-all Automated backport with mergify to all the active branches labels Aug 28, 2025
@pkoutsovasilis pkoutsovasilis force-pushed the fix/endpoint_upgrade_action_error branch 2 times, most recently from deac63b to ad27fc6 Compare September 5, 2025 10:36
@pkoutsovasilis pkoutsovasilis force-pushed the fix/endpoint_upgrade_action_error branch from ad27fc6 to eb2f6a5 Compare September 5, 2025 13:47
@pkoutsovasilis pkoutsovasilis marked this pull request as ready for review September 8, 2025 05:51
@pkoutsovasilis pkoutsovasilis requested a review from a team as a code owner September 8, 2025 05:51
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link
Copy Markdown
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks good to me as a way to fix this without totally rewriting everything. A couple of small comments.

Comment thread internal/pkg/agent/application/actions/handlers/handler_action_upgrade.go Outdated
Comment thread internal/pkg/agent/application/actions/handlers/handler_action_upgrade_test.go Outdated
@cmacknz
Copy link
Copy Markdown
Member

cmacknz commented Sep 10, 2025

After your clarifications my only remaining comments are suggestions for readability or documentation, which I think are necessary but if you do these and get approval from someone else go ahead and merge and don't feel the need to wait for me to review again.

@elastic-sonarqube
Copy link
Copy Markdown

@elasticmachine
Copy link
Copy Markdown
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

History

cc @pkoutsovasilis

Copy link
Copy Markdown
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow up!

@pkoutsovasilis pkoutsovasilis merged commit bb98f2a into elastic:main Sep 10, 2025
23 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

@Mergifyio backport 8.18 8.19 9.0 9.1

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Sep 10, 2025

backport 8.18 8.19 9.0 9.1

✅ Backports have been created

Details

mergify bot pushed a commit that referenced this pull request Sep 10, 2025
* fix: refactor coordinator Upgrade to use opts and introduce pre-upgrade callback to properly handle errs and upgrade details

* ci: add unit-tests

* doc: add changelog fragment

* fix: rename notifyUnitsOfProxiedAction to notifyUnitsOfProxiedActionFn

* doc: add comment for notifyUnitsOfProxiedActionFn unit-test assertion

(cherry picked from commit bb98f2a)

# Conflicts:
#	internal/pkg/agent/application/coordinator/coordinator.go
mergify bot pushed a commit that referenced this pull request Sep 10, 2025
* fix: refactor coordinator Upgrade to use opts and introduce pre-upgrade callback to properly handle errs and upgrade details

* ci: add unit-tests

* doc: add changelog fragment

* fix: rename notifyUnitsOfProxiedAction to notifyUnitsOfProxiedActionFn

* doc: add comment for notifyUnitsOfProxiedActionFn unit-test assertion

(cherry picked from commit bb98f2a)

# Conflicts:
#	internal/pkg/agent/application/coordinator/coordinator.go
mergify bot pushed a commit that referenced this pull request Sep 10, 2025
* fix: refactor coordinator Upgrade to use opts and introduce pre-upgrade callback to properly handle errs and upgrade details

* ci: add unit-tests

* doc: add changelog fragment

* fix: rename notifyUnitsOfProxiedAction to notifyUnitsOfProxiedActionFn

* doc: add comment for notifyUnitsOfProxiedActionFn unit-test assertion

(cherry picked from commit bb98f2a)

# Conflicts:
#	internal/pkg/agent/application/coordinator/coordinator.go
mergify bot pushed a commit that referenced this pull request Sep 10, 2025
* fix: refactor coordinator Upgrade to use opts and introduce pre-upgrade callback to properly handle errs and upgrade details

* ci: add unit-tests

* doc: add changelog fragment

* fix: rename notifyUnitsOfProxiedAction to notifyUnitsOfProxiedActionFn

* doc: add comment for notifyUnitsOfProxiedActionFn unit-test assertion

(cherry picked from commit bb98f2a)

# Conflicts:
#	internal/pkg/agent/application/coordinator/coordinator.go
pkoutsovasilis added a commit that referenced this pull request Sep 10, 2025
…ng it to bkgActions (#9859)

* fix: send upgrade action to units before adding it to bkgActions (#9634)

* fix: refactor coordinator Upgrade to use opts and introduce pre-upgrade callback to properly handle errs and upgrade details

* ci: add unit-tests

* doc: add changelog fragment

* fix: rename notifyUnitsOfProxiedAction to notifyUnitsOfProxiedActionFn

* doc: add comment for notifyUnitsOfProxiedActionFn unit-test assertion

(cherry picked from commit bb98f2a)

# Conflicts:
#	internal/pkg/agent/application/coordinator/coordinator.go

* fix: resolve conflicts

---------

Co-authored-by: Panos Koutsovasilis <panos.koutsovasilis@elastic.co>
pkoutsovasilis added a commit that referenced this pull request Sep 10, 2025
…g it to bkgActions (#9861)

* fix: send upgrade action to units before adding it to bkgActions (#9634)

* fix: refactor coordinator Upgrade to use opts and introduce pre-upgrade callback to properly handle errs and upgrade details

* ci: add unit-tests

* doc: add changelog fragment

* fix: rename notifyUnitsOfProxiedAction to notifyUnitsOfProxiedActionFn

* doc: add comment for notifyUnitsOfProxiedActionFn unit-test assertion

(cherry picked from commit bb98f2a)

# Conflicts:
#	internal/pkg/agent/application/coordinator/coordinator.go

* fix: resolve conflicts

---------

Co-authored-by: Panos Koutsovasilis <panos.koutsovasilis@elastic.co>
pkoutsovasilis added a commit that referenced this pull request Sep 10, 2025
…ng it to bkgActions (#9860)

* fix: send upgrade action to units before adding it to bkgActions (#9634)

* fix: refactor coordinator Upgrade to use opts and introduce pre-upgrade callback to properly handle errs and upgrade details

* ci: add unit-tests

* doc: add changelog fragment

* fix: rename notifyUnitsOfProxiedAction to notifyUnitsOfProxiedActionFn

* doc: add comment for notifyUnitsOfProxiedActionFn unit-test assertion

(cherry picked from commit bb98f2a)

# Conflicts:
#	internal/pkg/agent/application/coordinator/coordinator.go

* fix: resolve conflicts

---------

Co-authored-by: Panos Koutsovasilis <panos.koutsovasilis@elastic.co>
pkoutsovasilis added a commit that referenced this pull request Sep 10, 2025
…g it to bkgActions (#9862)

* fix: send upgrade action to units before adding it to bkgActions (#9634)

* fix: refactor coordinator Upgrade to use opts and introduce pre-upgrade callback to properly handle errs and upgrade details

* ci: add unit-tests

* doc: add changelog fragment

* fix: rename notifyUnitsOfProxiedAction to notifyUnitsOfProxiedActionFn

* doc: add comment for notifyUnitsOfProxiedActionFn unit-test assertion

(cherry picked from commit bb98f2a)

# Conflicts:
#	internal/pkg/agent/application/coordinator/coordinator.go

* fix: resolve conflicts

---------

Co-authored-by: Panos Koutsovasilis <panos.koutsovasilis@elastic.co>
intxgo pushed a commit to intxgo/elastic-agent that referenced this pull request Sep 24, 2025
…stic#9634)

* fix: refactor coordinator Upgrade to use opts and introduce pre-upgrade callback to properly handle errs and upgrade details

* ci: add unit-tests

* doc: add changelog fragment

* fix: rename notifyUnitsOfProxiedAction to notifyUnitsOfProxiedActionFn

* doc: add comment for notifyUnitsOfProxiedActionFn unit-test assertion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-all Automated backport with mergify to all the active branches bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade actions can be ignored due to stale entries in bkgActions

3 participants