-
Notifications
You must be signed in to change notification settings - Fork 49
Make one-shot units more robust #311
Description
Current situation
We have some one-shot units, like coreos-metadata, that don't get retried if they failed when they ran the first time. They just stay around as failed.
Impact
For coreos-metadata this means that if the metadata service is unavailable when the machine boots, but later becomes available, the machine never recovers.
Ideal future situation
To make this type of units more robust, we should add Restart=on-failure (as well as some delay, like say RestartSec=10 or maybe 1m, unfortunately there's no exponential backoff).
Additionally, we should consider adding RemainAfterExit=yes, so that these units don't get executed more than once it they get pulled in as as wanted/required. Otherwise, it could mean that an existing file gets lost when the server is unavailable later.