Skip to content

[8.18] (backport #9673) Liveness agent state#9955

Merged
nkvoll merged 1 commit into8.18from
mergify/bp/8.18/pr-9673
Sep 15, 2025
Merged

[8.18] (backport #9673) Liveness agent state#9955
nkvoll merged 1 commit into8.18from
mergify/bp/8.18/pr-9673

Conversation

@mergify
Copy link
Copy Markdown
Contributor

@mergify mergify bot commented Sep 15, 2025

What does this PR do?

This PR includes the aggregated status of the agent node to the liveness health check.

As a bonus, it also adds status code assertion to the tests, which were missing before. (All liveness/readiness tests were passing without any assertions).

Why is it important?

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

Liveness probes will now fail if the configuration is invalid, likely causing the container to be restarted (see https://kubernetes.io/docs/concepts/configuration/liveness-readiness-startup-probes/#liveness-probe).

How to test this PR locally

  1. Create an elastic-agent.yml file with an invalid output, i.e set use_output: nonexistent
  2. Start elastic-agent with relevant monitoring endpoints enabled.
  3. Verify that the agent is failed with elastic-agent status
┌─ fleet
│  └─ status: (STOPPED) Not enrolled into Fleet
└─ elastic-agent
   ├─ status: (FAILED) Invalid component model: failed to render components: invalid 'inputs.0.use_output', references an unknown output 'nonexistent'
   └─ info
      ├─ id: e1a1e08b-9b0c-4394-a024-d35b823d415b
      ├─ version: 9.2.0
      └─ commit: ff80471809aca1f2280ce55f0e24f85cefec5d55
  1. Liveness probes should fail:
$ curl -w 'HTTP %{http_code}\n' 'http://localhost:6792/liveness?failon=degraded'
HTTP 500
$ curl -w 'HTTP %{http_code}\n' 'http://localhost:6792/liveness?failon=failed'
HTTP 500
$ curl -w 'HTTP %{http_code}\n' 'http://localhost:6792/liveness?failon=heartbeat'
HTTP 200

Related issues


This is an automatic backport of pull request #9673 done by [Mergify](https://mergify.com).

* fix(tests): update liveness/readiness test cases to assert status code and remove unused vars

* fix: correct order of fields in LivenessFailConfig for degraded state

* fix: remove unnecessary check for coordinator mode in liveness handler (already handled)

* fix: add unhealthy coordinator state handling in liveness handler

* add changelog fragment

(cherry picked from commit d3b9427)
@mergify mergify bot added the backport label Sep 15, 2025
@mergify mergify bot requested a review from a team as a code owner September 15, 2025 12:29
@mergify mergify bot requested review from blakerouse and straistaru and removed request for a team September 15, 2025 12:29
@mergify mergify bot added the backport label Sep 15, 2025
@mergify mergify bot assigned nkvoll Sep 15, 2025
@mergify mergify bot mentioned this pull request Sep 15, 2025
8 tasks
@github-actions github-actions bot added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Sep 15, 2025
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@elastic-sonarqube
Copy link
Copy Markdown

@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

cc @nkvoll

@nkvoll nkvoll merged commit 6c47528 into 8.18 Sep 15, 2025
21 checks passed
@nkvoll nkvoll deleted the mergify/bp/8.18/pr-9673 branch September 15, 2025 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants