Skip to content

[9.3] (backport #11740) Avoid uninstalling and re-installing service components on policy change#12100

Merged
ycombinator merged 1 commit into9.3from
mergify/bp/9.3/pr-11740
Jan 6, 2026
Merged

[9.3] (backport #11740) Avoid uninstalling and re-installing service components on policy change#12100
ycombinator merged 1 commit into9.3from
mergify/bp/9.3/pr-11740

Conversation

@mergify
Copy link
Copy Markdown
Contributor

@mergify mergify bot commented Jan 6, 2026

What does this PR do?

This PR identifies Service Runtime components with only their input type; the output ID is not longer used.

Why is it important?

Service Runtime components are intended to be kept running (via a service) for as long as possible. We should only install/start or uninstall/stop them if they are being explicitly added or removed, respectively, from the component model. If only their configuration is being updated, we should keep the component running.

If a component's ID changes between the last and current component models, Elastic Agent will ask the component's service to uninstall and then reinstall itself. Prior to this PR, service components' ID were determined by their input type and output ID. Therefore, if a service component's output were changed, it would cause the service to be uninstall and then reinstalled. This is undesirable behavior, as services should be kept running as long as possible.

With the changes in this PR, we no longer consider the output ID when generating service components' IDs. If a service component's output is changed, it's ID remains the same between the last and current component models. Elastic Agent does not uninstall and reinstall the component's service but simply passes the configuration change to it (which it was doing prior to this PR anyway).

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

None.

How to test this PR locally

Policy reassign does not uninstall/reinstall Endpoint

  1. Using the Fleet UI, create three Agent policies:
    • default: containing only the system integration
    • tp-es: containing the Elastic Defend integration, with tamper protection enabled, and using the Elasticsearch output.
    • tp-ls: containing the Elastic Defend integration, with tamper protection enabled, and using the Logstash output. Note that you will need to create the Logstash output in Fleet > Settings.
  2. Enroll an Elastic Agent in the tp-es policy and verify the agent is healthy and shipping data.
  3. Assign the Agent to the tp-ls policy.
  4. Check the Agent logs and make sure the Endpoint component is not being uninstalled and reinstalled. Concretely, check that there is no log entry for uninstall endpoint service.
  5. Check the Endpoint logs (located under /opt/Elastic/Endpoint/state/log/ on Linux) and make sure that Endpoint has connected to Logstash (or has attempted to and failed if there is no actual Logstash endpoint listening).

Removing Endpoint from policy uninstalls Endpoint

  1. Assign the Agent to the default policy.
  2. Check the Agent logs and make sure the Endpoint component is stopped and uninstalled. Concretely, check that there is a log entry for stopping endpoint service runtime, followed by uninstall endpoint service, followed by Stopped: endpoint service runtime.

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

This is an automatic backport of pull request #11740 done by [Mergify](https://mergify.com).

…nge (#11740)

* Add UsesCommandRuntime and UsesServiceRuntime methods on Component

* Use new methods

* Add test case for only output being changed on service component

* Implement logic to not remove and add same service component

* Adding CHANGELOG fragment

* Improve comment

* Fix logic location

* Update unit test

* Update service component naming

* Refactor: extract logic into helper method

* Relocate unit test and add lots of cases

* Remove unnecessary code

* Clarify comments

* Remove unnecessary unit test

* Undo unnecessary changes

* Update component ID in integration test

* Add assertions on lengths of components added, removed, updated

* Add test case for only input ID changing

* Add integration test: TestPolicyReassignWithTamperProtectedEndpoint

* Update replace in go.mod

* Bump up context timeout and use for entire test

* Define fixture

* Fix syntax errors

* Fix installOpts

* Only cleanup Endpoint using first policy's uninstall token until successful policy reassignment

* Clarify log message

* Upgrade endpoint package version

* Use exec.CommandContext and separate out args

* Compare Endpoint policy IDs

* Use agentID from enrollment response

* Install Elastic Defend in second policy

* Add endpoint cleanup after reassigning policy

* Fixing log messages

* Give Endpoint time to receive reassigned policy

* Updating dependency version

* Adding log statements

* Remove replace

* Remove duplicate CHANGELOG fragment

* Remove PID checks

(cherry picked from commit c8deb6d)
@mergify mergify bot added the backport label Jan 6, 2026
@mergify mergify bot requested a review from a team as a code owner January 6, 2026 08:12
@mergify mergify bot requested review from swiatekm and ycombinator and removed request for a team January 6, 2026 08:12
@github-actions github-actions bot added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Jan 6, 2026
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@ycombinator ycombinator enabled auto-merge (squash) January 6, 2026 09:35
@ycombinator ycombinator merged commit 37de24d into 9.3 Jan 6, 2026
24 checks passed
@ycombinator ycombinator deleted the mergify/bp/9.3/pr-11740 branch January 6, 2026 10:07
@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

cc @ycombinator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants