Skip to content

[otelmanager] emit starting state in the beginning #11234

Merged
VihasMakwana merged 10 commits intoelastic:mainfrom
VihasMakwana:emity-startingstate
Nov 21, 2025
Merged

[otelmanager] emit starting state in the beginning #11234
VihasMakwana merged 10 commits intoelastic:mainfrom
VihasMakwana:emity-startingstate

Conversation

@VihasMakwana
Copy link
Copy Markdown
Contributor

@VihasMakwana VihasMakwana commented Nov 18, 2025

What does this PR do?

This PR introduces support for emitting a STARTING state when the collector is expect the collector to start.

Why is it important?

Right now, we default to a STOPPED state whenever an error occurs while accessing the healthcheck port. As a result, the elastic-agent status output does not show any monitoring components. If this error occurs consistently every time the collector starts, those components will never appear in elastic-agent status.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

How to test this PR locally

For mac,

  1. Build agent from this branch
  2. Sign out of docker desktop.
  3. Install and enroll the agent into fleet.
  4. Notice the status. You'll first see Starting state, then Failed state with message OTel manager failed ... process exited with status 1.

Related issues

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Nov 18, 2025

This pull request does not have a backport label. Could you fix it @VihasMakwana? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@VihasMakwana VihasMakwana added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team skip-changelog backport-8.19 Automated backport to the 8.19 branch backport-9.2 Automated backport to the 9.2 branch labels Nov 19, 2025
@VihasMakwana VihasMakwana marked this pull request as ready for review November 19, 2025 06:17
@VihasMakwana VihasMakwana requested a review from a team as a code owner November 19, 2025 06:17
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@VihasMakwana VihasMakwana changed the title emit starting state if healthcheck port is not yet available [otelmanager] emit starting state if healthcheck is not yet available Nov 19, 2025
@VihasMakwana VihasMakwana requested review from blakerouse, cmacknz and swiatekm and removed request for michel-laterman and ycombinator November 19, 2025 06:17
@VihasMakwana
Copy link
Copy Markdown
Contributor Author

CI failing due to known issue. It will be fixed via #11238

Copy link
Copy Markdown
Member

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like this approach.

If we need to always emit a STARTING state, then we should do so explicitly, without waiting for the health check first. And if there's an error getting the status, we should set it as an error on the collector instead of doing this. Finally, if some errors from the collector need to be turned into a specific status for all the components, then that should happen separately somewhere in the otel manager.

Does that make sense?

@VihasMakwana VihasMakwana changed the title [otelmanager] emit starting state if healthcheck is not yet available [otelmanager] emit starting state in the beginning Nov 19, 2025
@VihasMakwana
Copy link
Copy Markdown
Contributor Author

The CI has been failing but the test passes on my Mac consistently. Weird.
I've given it a retry.

@VihasMakwana
Copy link
Copy Markdown
Contributor Author

We already emit a STARTING state here:

currentStatus := aggregateStatus(componentstatus.StatusStarting, nil)
r.reportSubprocessCollectorStatus(ctx, statusCh, currentStatus)

We just need to handle that case while agent translation.

@elastic elastic deleted a comment from elasticmachine Nov 20, 2025
Copy link
Copy Markdown
Member

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like an additional unit test, other than that LGTM.

@elasticmachine
Copy link
Copy Markdown
Contributor

@VihasMakwana
Copy link
Copy Markdown
Contributor Author

/test

@VihasMakwana VihasMakwana merged commit 14c87c2 into elastic:main Nov 21, 2025
21 checks passed
@VihasMakwana VihasMakwana added the backport-9.1 Automated backport to the 9.1 branch label Nov 21, 2025
mergify bot pushed a commit that referenced this pull request Nov 21, 2025
* feat: emit starting state

* test

* rename method

* emit starting state

* fix test

* comments

* fix case

* ut

(cherry picked from commit 14c87c2)
mergify bot pushed a commit that referenced this pull request Nov 21, 2025
* feat: emit starting state

* test

* rename method

* emit starting state

* fix test

* comments

* fix case

* ut

(cherry picked from commit 14c87c2)
mergify bot pushed a commit that referenced this pull request Nov 21, 2025
* feat: emit starting state

* test

* rename method

* emit starting state

* fix test

* comments

* fix case

* ut

(cherry picked from commit 14c87c2)

# Conflicts:
#	internal/pkg/otel/manager/execution_subprocess.go
#	internal/pkg/otel/manager/manager_test.go
hayotbisonai pushed a commit to hayotbisonai/elastic-agent that referenced this pull request Nov 23, 2025
* feat: emit starting state

* test

* rename method

* emit starting state

* fix test

* comments

* fix case

* ut
VihasMakwana added a commit that referenced this pull request Nov 24, 2025
* feat: emit starting state

* test

* rename method

* emit starting state

* fix test

* comments

* fix case

* ut

(cherry picked from commit 14c87c2)

Co-authored-by: Vihas Makwana <121151420+VihasMakwana@users.noreply.github.com>
VihasMakwana added a commit that referenced this pull request Nov 24, 2025
* feat: emit starting state

* test

* rename method

* emit starting state

* fix test

* comments

* fix case

* ut

(cherry picked from commit 14c87c2)

Co-authored-by: Vihas Makwana <121151420+VihasMakwana@users.noreply.github.com>
swiatekm pushed a commit that referenced this pull request Nov 24, 2025
* feat: emit starting state

* test

* rename method

* emit starting state

* fix test

* comments

* fix case

* ut
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-8.19 Automated backport to the 8.19 branch backport-9.1 Automated backport to the 9.1 branch backport-9.2 Automated backport to the 9.2 branch skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[beats receivers] OTel manager failed: supervised collector exited with error: Error response from daemon: Sign in to continue using Docker Desktop

4 participants