Skip to content

fix(controller): Prevent duplicate loaders from being created#5446

Merged
kgeckhart merged 3 commits intomainfrom
kgeckhart/component-loading-prevent-duplicate-metric-registration
Feb 19, 2026
Merged

fix(controller): Prevent duplicate loaders from being created#5446
kgeckhart merged 3 commits intomainfrom
kgeckhart/component-loading-prevent-duplicate-metric-registration

Conversation

@kgeckhart
Copy link
Contributor

Brief description of Pull Request

This PR prevents a panic case where we create multiple loaders for the same controller. We can identify when we have created duplicate loaders when the prometheus metrics owned by the loader already exist. Attempting the register the metrics a second time resulted in the panic.

Issue(s) fixed by this Pull Request

Fixes: #2801

Notes to the Reviewer

This issue can be reproduced consistently by,

  1. Run alloy in docker with minimal resources + remotecfg
  2. Create a pipeline in fleet management which uses a component that bind a resource on Build (ex loki.source.api binds a port)
  3. Rename the pipeline in fleet management
  4. The new config fails to load because the port is already bound
  5. The remotecfg component attempts to load the old config
  6. Duplicate loader is created resulting in panic

Abridged logs for what will happen now,

alloy-1  | ts=2026-02-04T19:23:55.07699636Z level=debug msg="fetching remote configuration" service=remotecfg
alloy-1  | ts=2026-02-04T19:23:55.202305809Z level=info msg="attempting to parse and load new remote configuration" service=remotecfg config_hash=bace1e80
alloy-1  | ts=2026-02-04T19:23:55.34085171Z level=info msg="finished node evaluation" controller_path=/remotecfg controller_id=source_api.default trace_id=166fa51994d6f7723fe329ca53ea9b9a node_id=loki.write.grafana_cloud_loki duration=92.039756ms
alloy-1  | ts=2026-02-04T19:23:55.345658677Z level=info msg="starting push API server" component_path=/remotecfg/source_api.default component_id=loki.source.api.loki_push_api
alloy-1  | ts=2026-02-04T19:23:55.345713885Z level=info msg="starting server" component_path=/remotecfg/source_api.default component_id=loki.source.api.loki_push_api
alloy-1  | ts=2026-02-04T19:23:55.349334402Z level=error msg="failed to evaluate config" controller_path=/remotecfg controller_id=source_api.default trace_id=166fa51994d6f7723fe329ca53ea9b9a node=loki.source.api.loki_push_api err="building component: failed to run embedded server: listen tcp 0.0.0.0:9999: bind: address already in use"
alloy-1  | ts=2026-02-04T19:23:55.349425319Z level=info msg="finished node evaluation" controller_path=/remotecfg controller_id=source_api.default trace_id=166fa51994d6f7723fe329ca53ea9b9a node_id=loki.source.api.loki_push_api duration=8.420068ms
alloy-1  | ts=2026-02-04T19:23:55.349562734Z level=error msg="failed to evaluate config" controller_path=/ controller_id=remotecfg trace_id=747e92eb01b59f96fe555b578fc9fb6c node=source_api.default err="updating custom component: 129:2: Failed to build component: building component: failed to run embedded server: listen tcp 0.0.0.0:9999: bind: address already in use"
alloy-1  | ts=2026-02-04T19:23:55.349575859Z level=info msg="finished node evaluation" controller_path=/ controller_id=remotecfg trace_id=747e92eb01b59f96fe555b578fc9fb6c node_id=source_api.default duration=101.567234ms
alloy-1  | ts=2026-02-04T19:23:55.353856414Z level=error msg="failed to parse and load configuration" service=remotecfg config_size=8280 err="129:2: Failed to build component: building component: failed to run embedded server: listen tcp 0.0.0.0:9999: bind: address already in use"
alloy-1  | ts=2026-02-04T19:23:55.353934996Z level=error msg="failed to parse and load new remote configuration" service=remotecfg received_hash=bace1e80 loaded_hash=775b38f6 err="129:2: Failed to build component: building component: failed to run embedded server: listen tcp 0.0.0.0:9999: bind: address already in use"
alloy-1  | ts=2026-02-04T19:23:55.353956746Z level=info msg="attempting to reload cached configuration to restore component health" service=remotecfg
alloy-1  | ts=2026-02-04T19:23:55.354616409Z level=info msg="finished node evaluation" controller_path=/ controller_id=remotecfg trace_id=cd603bde9ef19de4c6bc63e6f03685ed node_id=declare.source_api_rename duration=2.5µs
alloy-1  | ts=2026-02-04T19:23:55.446986412Z level=error msg="failed to evaluate config" controller_path=/ controller_id=remotecfg trace_id=cd603bde9ef19de4c6bc63e6f03685ed node=source_api_rename.default err="creating custom component controller: failed to create module controller: failed to build loader: a loader exists already exists for \"remotecfg/source_api_rename.default\""
alloy-1  | ts=2026-02-04T19:23:55.447288993Z level=info msg="finished node evaluation" controller_path=/ controller_id=remotecfg trace_id=cd603bde9ef19de4c6bc63e6f03685ed node_id=source_api_rename.default duration=92.65121ms
alloy-1  | ts=2026-02-04T19:23:55.450888052Z level=error msg="failed to parse and load configuration" service=remotecfg config_size=8294 err="157:1: Failed to build component: creating custom component controller: failed to create module controller: failed to build loader: a loader exists already exists for \"remotecfg/source_api_rename.default\""
alloy-1  | ts=2026-02-04T19:23:55.452185752Z level=error msg="failed to reload cached configuration" service=remotecfg err="157:1: Failed to build component: creating custom component controller: failed to create module controller: failed to build loader: a loader exists already exists for \"remotecfg/source_api_rename.default\""
alloy-1  | ts=2026-02-04T19:23:55.45219946Z level=error msg="failed to fetch remote config, continuing with current config" service=remotecfg err="157:1: Failed to build component: creating custom component controller: failed to create module controller: failed to build loader: a loader exists already exists for \"remotecfg/source_api_rename.default\""
alloy-1  | ts=2026-02-04T19:23:55.452297418Z level=debug msg="making immediate GetConfig call to report status update" service=remotecfg

The PR also includes some small changes to add more support for structured logging in the controller.

PR Checklist

  • Tests updated

@kgeckhart kgeckhart requested a review from a team as a code owner February 4, 2026 21:11
// MustRegisterOrReturnExisting will attempt to register the supplied collector into the register. If it's already
// registered, it will return that one otherwise nil.
// In case that the register procedure fails with something other than already registered, this will panic.
func MustRegisterOrReturnExisting(reg prometheus.Registerer, c prometheus.Collector) prometheus.Collector {
Copy link
Contributor

@dehaansa dehaansa Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is identical to the MustRegisterOrGet function above it, right? Except it doesn't return c, is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's intentional, I only care to know it's already registered. If I use the other implementation I'm going to get a non-nil every time so I won't know if it already existed or if it was registered. Since I'm the only one using it I could inline it too. WDYT?

@kgeckhart kgeckhart merged commit a6ab764 into main Feb 19, 2026
47 checks passed
@kgeckhart kgeckhart deleted the kgeckhart/component-loading-prevent-duplicate-metric-registration branch February 19, 2026 18:19
@grafana-alloybot grafana-alloybot bot mentioned this pull request Feb 19, 2026
jharvey10 pushed a commit that referenced this pull request Feb 26, 2026
### Brief description of Pull Request

This PR prevents a panic case where we create multiple loaders for the
same controller. We can identify when we have created duplicate loaders
when the prometheus metrics owned by the loader already exist.
Attempting the register the metrics a second time resulted in the panic.

### Issue(s) fixed by this Pull Request

Fixes: #2801

### Notes to the Reviewer

This issue can be reproduced consistently by,

1. Run alloy in docker with minimal resources + remotecfg
2. Create a pipeline in fleet management which uses a component that
bind a resource on Build (ex loki.source.api binds a port)
3. Rename the pipeline in fleet management
4. The new config fails to load because the port is already bound
5. The remotecfg component attempts to load the old config
6. Duplicate loader is created resulting in panic

Abridged logs for what will happen now,
```
alloy-1  | ts=2026-02-04T19:23:55.07699636Z level=debug msg="fetching remote configuration" service=remotecfg
alloy-1  | ts=2026-02-04T19:23:55.202305809Z level=info msg="attempting to parse and load new remote configuration" service=remotecfg config_hash=bace1e80
alloy-1  | ts=2026-02-04T19:23:55.34085171Z level=info msg="finished node evaluation" controller_path=/remotecfg controller_id=source_api.default trace_id=166fa51994d6f7723fe329ca53ea9b9a node_id=loki.write.grafana_cloud_loki duration=92.039756ms
alloy-1  | ts=2026-02-04T19:23:55.345658677Z level=info msg="starting push API server" component_path=/remotecfg/source_api.default component_id=loki.source.api.loki_push_api
alloy-1  | ts=2026-02-04T19:23:55.345713885Z level=info msg="starting server" component_path=/remotecfg/source_api.default component_id=loki.source.api.loki_push_api
alloy-1  | ts=2026-02-04T19:23:55.349334402Z level=error msg="failed to evaluate config" controller_path=/remotecfg controller_id=source_api.default trace_id=166fa51994d6f7723fe329ca53ea9b9a node=loki.source.api.loki_push_api err="building component: failed to run embedded server: listen tcp 0.0.0.0:9999: bind: address already in use"
alloy-1  | ts=2026-02-04T19:23:55.349425319Z level=info msg="finished node evaluation" controller_path=/remotecfg controller_id=source_api.default trace_id=166fa51994d6f7723fe329ca53ea9b9a node_id=loki.source.api.loki_push_api duration=8.420068ms
alloy-1  | ts=2026-02-04T19:23:55.349562734Z level=error msg="failed to evaluate config" controller_path=/ controller_id=remotecfg trace_id=747e92eb01b59f96fe555b578fc9fb6c node=source_api.default err="updating custom component: 129:2: Failed to build component: building component: failed to run embedded server: listen tcp 0.0.0.0:9999: bind: address already in use"
alloy-1  | ts=2026-02-04T19:23:55.349575859Z level=info msg="finished node evaluation" controller_path=/ controller_id=remotecfg trace_id=747e92eb01b59f96fe555b578fc9fb6c node_id=source_api.default duration=101.567234ms
alloy-1  | ts=2026-02-04T19:23:55.353856414Z level=error msg="failed to parse and load configuration" service=remotecfg config_size=8280 err="129:2: Failed to build component: building component: failed to run embedded server: listen tcp 0.0.0.0:9999: bind: address already in use"
alloy-1  | ts=2026-02-04T19:23:55.353934996Z level=error msg="failed to parse and load new remote configuration" service=remotecfg received_hash=bace1e80 loaded_hash=775b38f6 err="129:2: Failed to build component: building component: failed to run embedded server: listen tcp 0.0.0.0:9999: bind: address already in use"
alloy-1  | ts=2026-02-04T19:23:55.353956746Z level=info msg="attempting to reload cached configuration to restore component health" service=remotecfg
alloy-1  | ts=2026-02-04T19:23:55.354616409Z level=info msg="finished node evaluation" controller_path=/ controller_id=remotecfg trace_id=cd603bde9ef19de4c6bc63e6f03685ed node_id=declare.source_api_rename duration=2.5µs
alloy-1  | ts=2026-02-04T19:23:55.446986412Z level=error msg="failed to evaluate config" controller_path=/ controller_id=remotecfg trace_id=cd603bde9ef19de4c6bc63e6f03685ed node=source_api_rename.default err="creating custom component controller: failed to create module controller: failed to build loader: a loader exists already exists for \"remotecfg/source_api_rename.default\""
alloy-1  | ts=2026-02-04T19:23:55.447288993Z level=info msg="finished node evaluation" controller_path=/ controller_id=remotecfg trace_id=cd603bde9ef19de4c6bc63e6f03685ed node_id=source_api_rename.default duration=92.65121ms
alloy-1  | ts=2026-02-04T19:23:55.450888052Z level=error msg="failed to parse and load configuration" service=remotecfg config_size=8294 err="157:1: Failed to build component: creating custom component controller: failed to create module controller: failed to build loader: a loader exists already exists for \"remotecfg/source_api_rename.default\""
alloy-1  | ts=2026-02-04T19:23:55.452185752Z level=error msg="failed to reload cached configuration" service=remotecfg err="157:1: Failed to build component: creating custom component controller: failed to create module controller: failed to build loader: a loader exists already exists for \"remotecfg/source_api_rename.default\""
alloy-1  | ts=2026-02-04T19:23:55.45219946Z level=error msg="failed to fetch remote config, continuing with current config" service=remotecfg err="157:1: Failed to build component: creating custom component controller: failed to create module controller: failed to build loader: a loader exists already exists for \"remotecfg/source_api_rename.default\""
alloy-1  | ts=2026-02-04T19:23:55.452297418Z level=debug msg="making immediate GetConfig call to report status update" service=remotecfg

```

The PR also includes some small changes to add more support for
structured logging in the controller.

### PR Checklist

- [x] Tests updated
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

duplicate metrics collector registration panic when using remotecfg

2 participants