Skip to content

feat: add --pid-file option to write PID files#25474

Merged
gwossum merged 6 commits intomain-2.xfrom
gw/25473/pidfile
Oct 24, 2024
Merged

feat: add --pid-file option to write PID files#25474
gwossum merged 6 commits intomain-2.xfrom
gw/25473/pidfile

Conversation

@gwossum
Copy link
Copy Markdown
Member

@gwossum gwossum commented Oct 17, 2024

Add --pid-file option to write PID files on startup. The PID filename is specified by the argument after --pid-file.

Example: influxd --pid-file /var/lib/influxd/influxd.pid

PID files are automatically removed when the influxd process is shutdown.

Closes: #25473

Add `--pid-file` option to write PID files on startup. The PID filename
is specified by the argument after `--pid-file`.

Example: `influxd --pid-file /var/lib/influxd/influxd.pid`

PID files are automatically removed when the influxd process is shutdown.

Closes: 25473
@gwossum gwossum added area/configuration kind/feature area/2.x OSS 2.0 related issues and PRs labels Oct 17, 2024
@gwossum gwossum self-assigned this Oct 17, 2024
@gwossum gwossum marked this pull request as ready for review October 18, 2024 14:32
o.PIDFile = pidFilename
})
defer func() {
l.ShutdownOrFail(t, ctx)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran in to a similar thing as this recently: #25398 (comment)

I think that this method call ShutdownOrFail will get called immediately? Might be good to check but this may not be the case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's inside a lambda, so it will not be evaluated until the defer calls the lambda. Also note that if it got called immediately the require.FileExists() on line 184 would fail because the PID file gets removed when l.ShutdownOrFail(t, ctx) is called.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devanbenz - the arguments to a deferred function are evaluated on the execution of the defer line, and those evaluated arguments are stored for the eventual call, but the deferred call itself doesn't happen until after the end of the closing method.

@devanbenz devanbenz self-requested a review October 18, 2024 17:34
Copy link
Copy Markdown
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some error message changes, and a question about handling a (currently) undetected error.

if writeErr := os.WriteFile(pidFile, []byte(pidStr), 0666); writeErr != nil {
// Let's make sure we don't leave a PID file behind on error.
removeErr := os.Remove(pidFile)
return fmt.Errorf("write file: %w; remove file: %w", writeErr, removeErr)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add the path to the error message

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we want to use something lower level than WriteFile like OpenFile so we can pass in O_EXCL and detect existing PID files that weren't cleaned up (or multiple InfluxDB instances). I think that might be an error we want to detect, report, and possibly abort on.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to find using the PID file as a lock file more annoying than useful (looking at you PostgreSQL!). It also generates tons of support questions.

The less annoying way of preventing multiple instances is to check for for already listening sockets. We kind of do that today by aborting if we can't bind to the bind addresses. We do the abort after we've opened the PID file, so we would blow away the PID file on shutdown in those situations even though the other (presumed) influxd instance is running.

If we want more robust handling of the PID file if you accidentally launch multiple influxd instances with the same configuration, my suggestion would be a feature request to try connecting to the bind addresses very early on to see if anything is listening on them, or use a Unix domain socket / named pipe for the check. This way when the application dies, there is nothing listening or responding on the socket. With a PID or lock file, the file can be stay around if the application is not shut down cleanly.

Or we could simply write the PID after we bind the sockets.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I don't like the idea of dropping the PID file after we're listening for connections, because that could put a long delay between influxd starts and when the PID file drops.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion then, to avoid support questions, would be to check for existence, issue a warning if present, and then continue, rather than a fatal error. That will let the customer know that the WAL flush may not have happened on shutdown, for instance.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can do a compromise between the PID approaches. If you specify --pid-file, the default behavior is to abort with an error if the PID file already exists and the PID file is not touched. If you additionally specify a binary flag --overwrite-pid-file, then a warning is issued if the PID file is already present, but it will be overwritten and startup will continue.

Adding an extra parameter does add complexity, but the impact of --overwrite-pid-file would be extremely localized and simple to exhaustively test in the automated unit tests.

Thoughts?

label: "pidfile",
closer: func(context.Context) error {
if err := os.Remove(pidFile); err != nil {
return fmt.Errorf("removing PID file %q: %w", pidFile, err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the way!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the caller to the cleanup function has no context, so the error must contain the filename.

Change default behavior to abort with an error if the PID file is
requested but it already exists. Add `-ooverwrite-pid-file` flag
to change behavior to attempt to overwrite PID file if the file exists
instead of aborting.
Copy link
Copy Markdown
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gwossum gwossum merged commit c35321b into main-2.x Oct 24, 2024
@gwossum gwossum deleted the gw/25473/pidfile branch October 24, 2024 20:20
@gwossum
Copy link
Copy Markdown
Member Author

gwossum commented Oct 29, 2024

Docs PR created: influxdata/docs-v2#5662

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants