libct/cg/sd: reconnect and retry on dbus connection error#2923
Merged
cyphar merged 5 commits intoopencontainers:masterfrom Apr 28, 2021
Merged
libct/cg/sd: reconnect and retry on dbus connection error#2923cyphar merged 5 commits intoopencontainers:masterfrom
cyphar merged 5 commits intoopencontainers:masterfrom
Conversation
Contributor
Author
|
@wzshiming PTAL |
Closed
mrunalp
previously approved these changes
Apr 27, 2021
Contributor
Author
|
I was wrong in my initial analysis that this is a regression caused by PR #2203. In fact it's not, it has always been working that way (see e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1634092 from 2018). |
[@kolyshkin: documentation nits] Signed-off-by: Shiming Zhang <wzshiming@foxmail.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Generalize isUnitExists as isDbusError, and use errors.As while at it (which can handle wrapped errors as well). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
[@kolyshkin: doc nits, use dbus.ErrClosed and isDbusError] Signed-off-by: Shiming Zhang <wzshiming@foxmail.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Instead of reconnecting to dbus after some failed operations, and returning an error (so a caller has to retry), reconnect AND retry in place for all such operations. This should fix issues caused by a stale dbus connection after e.g. a dbus daemon restart. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
mrunalp
approved these changes
Apr 27, 2021
cyphar
reviewed
Apr 28, 2021
Comment on lines
+31
to
+36
| d.RLock() | ||
| if conn := d.conn; conn != nil { | ||
| d.RUnlock() | ||
| return conn, nil | ||
| } | ||
| d.RUnlock() |
Member
There was a problem hiding this comment.
This could be
d.RLock()
conn := d.conn
d.RUnlock()
if conn != nil { ... }But it doesn't really matter.
This was referenced Apr 30, 2021
Closed
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR #2203 started to reuse the same dbus connection. While this improves the performance,In case the dbus daemon is ever restarted, the connection is no longer valid and every operation
fails. This is a minor concern for short-lived runc, but much more of a issue in case there is
a long-running daemon (e.g.
cri-o) is using runc's libcontainer, as the connection is neverretried and the only remedy is to restart the daemon.
The solution to the above is to check the errors returned for
dbus: connection closed by usererror, and try to re-connect on that. This is what PR #2862 does.
This is a carry of #2862, implementing the idea of retry-in-place (first described
at #2862 (comment) and #2862 (comment)) on top of what it does.
For more info, see commit messages as well as #2862.
Fixes:
Changelog entry