Skip to content

Fix syslog shutdown race conditions#4352

Merged
thampiotr merged 2 commits intomainfrom
thampiotr/fix-syslog-shutdown
Sep 8, 2025
Merged

Fix syslog shutdown race conditions#4352
thampiotr merged 2 commits intomainfrom
thampiotr/fix-syslog-shutdown

Conversation

@thampiotr
Copy link
Contributor

@thampiotr thampiotr commented Sep 4, 2025

PR Description

loki.source.syslog had a few race conditions:

  • on config reload the following happens: new components are created, old components shut down and new components are run. syslog was binding port in the New function, so this led to port conflict because the old component didn't shut down yet. So what I do here is make sure we don't try to bind ports when there's a call to New, but on call to Run.
  • on shutdown, calling Stop on targets drains the pending messages - this won't work if there is nothing reading from the handler channel - so I add a similar approach to loki.source.file, where we start a temporary goroutine that will continue draining when we Stop the targets.

Which issue(s) this PR fixes

Internal escalation https://github.com/grafana/support-escalations/issues/18202

Notes to the Reviewer

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

@thampiotr thampiotr marked this pull request as ready for review September 4, 2025 15:37
@thampiotr thampiotr requested a review from a team as a code owner September 4, 2025 15:37
Copy link
Contributor

@kalleep kalleep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just left one comment

Comment on lines +156 to +175
c.mut.RLock()
var rcs []*relabel.Config
if len(c.args.RelabelRules) > 0 {
rcs = alloy_relabel.ComponentToPromRelabelConfigs(c.args.RelabelRules)
}

return nil
for _, l := range c.targets {
err := l.Stop()
if err != nil {
level.Error(c.opts.Logger).Log("msg", "error while stopping syslog listener", "err", err)
}
}
c.mut.RUnlock()

// Stop draining routine
cancel()

// Create new targets
c.mut.Lock()
defer c.mut.Unlock()
Copy link
Contributor

@kgeckhart kgeckhart Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 is there a potential race condition / panic here if a reload happens at the same time as shutdown? Shutdown is going to trigger

defer func() {
// Start draining routine to prevent potential deadlock if targets attempt to send during Stop().
cancel := c.startDrainingRoutine()
defer cancel()
// Stop all targets
c.mut.RLock()
defer c.mut.RUnlock()
level.Info(c.opts.Logger).Log("msg", "loki.source.syslog component shutting down, stopping listeners")
for _, l := range c.targets {
err := l.Stop()
if err != nil {
level.Error(c.opts.Logger).Log("msg", "error while stopping syslog listener", "err", err)
}
}
}()
where both only take a readlock so they could simultaneously try to stop targets right? The target stop function has the potential to close an already close channel from what I can tell (hopefully something else would error ahead of it).

I don't know that it's a realistic/possible scenario but I do wonder if it's safer to take a write lock before we attempt to mutate the state of the targets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a closer look.

if a reload happens at the same time as shutdown

if I understood right and by "reload" you mean call to reloadTargets - then this is not possible to happen at the same time as shutdown deferred function, because reloadTargets happens on the same thread as the Run function uses.

Let me know if you were thinking of a different scenario

@thampiotr thampiotr force-pushed the thampiotr/fix-syslog-shutdown branch from cf995f0 to 60dc7a8 Compare September 8, 2025 10:51
@thampiotr thampiotr merged commit dc83a8c into main Sep 8, 2025
43 of 44 checks passed
@thampiotr thampiotr deleted the thampiotr/fix-syslog-shutdown branch September 8, 2025 13:47
@thampiotr thampiotr mentioned this pull request Oct 8, 2025
4 tasks
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 9, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants