Skip to content

fix(scheduling): Shutdown runnables with a timeout before starting new ones#5401

Merged
kgeckhart merged 3 commits intomainfrom
kgeckhart/scheduler-shutdown-old-before-running-new
Feb 4, 2026
Merged

fix(scheduling): Shutdown runnables with a timeout before starting new ones#5401
kgeckhart merged 3 commits intomainfrom
kgeckhart/scheduler-shutdown-old-before-running-new

Conversation

@kgeckhart
Copy link
Contributor

Brief description of Pull Request

Adjust the ordering for component scheduling so that we give components time to shutdown before we start new components. If components take too long we will still move on but wait for all to shutdown during scheduling.

Pull Request Details

We have run in to issues with components which bind shared resources not being able to release those resources when the component is rescheduled due to a module rename / component rename. An example of this is the loki.source.syslog component which will bind 1 or more ports depending on the listener config. This causes issues with our current implementation where shutdown/startup order is not defined,

  1. Old components are told to shutdown
  2. As soon as all old components have been told to shutdown, new components are started
  3. If the syslog component did not shutdown fast enough the new component will fail to bind ports
  4. Failing to bind ports does not cause the syslog component to exit we just won't open the listener forcing a user to restart alloy to resolve the problem

Issue(s) fixed by this Pull Request

I need to look to see if there are issues which describe this sort of behavior

Notes to the Reviewer

This PR is intended to work with components which bind resources when Run. We have multiple components which bind resources when the is newly created before running. This causes a panic instead which will be resolved in a separate PR.

I chose to leave an escape hatch on shutdown because our default component shutdown timeout is 10 minutes. This is incredibly long to leave a pipeline offline. Using the warning log window of 1 minute felt much safer.

PR Checklist

  • Tests updated

@kgeckhart kgeckhart requested a review from a team as a code owner January 29, 2026 22:19
@kgeckhart kgeckhart requested a review from kalleep February 2, 2026 18:28
Copy link
Contributor

@kalleep kalleep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for walking me through this.

I think the comments are sufficient as is 👍

@kgeckhart kgeckhart added the backport/v1.13 Backport to release/v1.13 label Feb 4, 2026
@kgeckhart kgeckhart merged commit 0fb959d into main Feb 4, 2026
52 checks passed
@kgeckhart kgeckhart deleted the kgeckhart/scheduler-shutdown-old-before-running-new branch February 4, 2026 17:05
grafana-alloybot bot pushed a commit that referenced this pull request Feb 4, 2026
…w ones (#5401)

### Brief description of Pull Request

Adjust the ordering for component scheduling so that we give components
time to shutdown before we start new components. If components take too
long we will still move on but always wait for all components to shutdown
during scheduling.

### Pull Request Details

We have run in to issues with components which bind shared resources not
being able to release those resources when the component is rescheduled
due to a module rename / component rename. An example of this is the
`loki.source.syslog` component which will bind 1 or more ports depending
on the listener config. This causes issues with our current
implementation where shutdown/startup order is not defined,

1. Old components are told to shutdown
2. As soon as all old components have been told to shutdown, new
components are started
3. If the syslog component did not shutdown fast enough the new
component will fail to bind ports
4. Failing to bind ports does not cause the syslog component to exit we
just won't open the listener forcing a user to restart alloy to resolve
the problem

### Issue(s) fixed by this Pull Request

I need to look to see if there are issues which describe this sort of
behavior

### Notes to the Reviewer

This PR is intended to work with components which bind resources when
Run. We have multiple components which bind resources when the is newly
created before running. This causes a panic instead which will be
resolved in a separate PR.

I chose to leave an escape hatch on shutdown because our default
component shutdown timeout is 10 minutes. This is incredibly long to
leave a pipeline offline. Using the warning log window of 1 minute felt
much safer.

### PR Checklist

- [x] Tests updated

(cherry picked from commit 0fb959d)
kgeckhart added a commit that referenced this pull request Feb 12, 2026
…w ones [backport] (#5443)

## Backport of #5401

This PR backports #5401 to release/v1.13.

### Original PR Author
@kgeckhart

### Description
### Brief description of Pull Request

Adjust the ordering for component scheduling so that we give components
time to shutdown before we start new components. If components take too
long we will still move on but wait for all to shutdown during
scheduling.

### Pull Request Details

We have run in to issues with components which bind shared resources not
being able to release those resources when the component is rescheduled
due to a module rename / component rename. An example of this is the
`loki.source.syslog` component which will bind 1 or more ports depending
on the listener config. This causes issues with our current
implementation where shutdown/startup order is not defined,

1. Old components are told to shutdown
2. As soon as all old components have been told to shutdown, new
components are started
3. If the syslog component did not shutdown fast enough the new
component will fail to bind ports
4. Failing to bind ports does not cause the syslog component to exit we
just won't open the listener forcing a user to restart alloy to resolve
the problem

### Issue(s) fixed by this Pull Request

I need to look to see if there are issues which describe this sort of
behavior

### Notes to the Reviewer

This PR is intended to work with components which bind resources when
Run. We have multiple components which bind resources when the is newly
created before running. This causes a panic instead which will be
resolved in a separate PR.

I chose to leave an escape hatch on shutdown because our default
component shutdown timeout is 10 minutes. This is incredibly long to
leave a pipeline offline. Using the warning log window of 1 minute felt
much safer.

### PR Checklist

- [x] Tests updated


---
*This backport was created automatically.*

Co-authored-by: Kyle Eckhart <kgeckhart@users.noreply.github.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

backport/v1.13 Backport to release/v1.13 frozen-due-to-age

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants