ceph-volume: system.get_mounts() refactor#47514
Merged
Conversation
5d6203a to
ba089ed
Compare
guits
commented
Aug 9, 2022
When a network mount is present in `/proc/mounts` but for any reason the corresponding server is down, this function hangs forever. In a cluster deployed with cephadm, the consequence is that it triggers `ceph-volume inventory` commands that hang and stay in D state. The idea here is to use a thread with a timeout to abort the call if the timeout is reached. `get_mounts()` is now a method of a class so we can exclude a path altogether during the whole `inventory` execution (otherwise, ceph-volume would try to access it as many devices there is on the host which could slow down the inventory execution) Fixes: https://tracker.ceph.com/issues/57070 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
ba089ed to
89cad1f
Compare
Contributor
Author
|
jenkins test ceph-volume tox |
Contributor
Author
|
jenkins test windows |
1 similar comment
Contributor
Author
|
jenkins test windows |
adk3798
approved these changes
Aug 9, 2022
Comment on lines
+306
to
+307
| if t.is_alive(): | ||
| return None |
Contributor
There was a problem hiding this comment.
out of curiosity, does this mean the thread just hangs around until the whole process exits if it is hanging?
Contributor
Author
There was a problem hiding this comment.
t.join(timeout) blocks the current thread until the target thread (t) finishes or reaches the timeout
This was referenced Aug 10, 2022
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a network mount is present in
/proc/mountsbut for any reasonthe corresponding server is down, this function hangs forever.
In a cluster deployed with cephadm, the consequence is that
it triggers
ceph-volume inventorycommands that hang and stay in Dstate.
The idea here is to use a thread with a timeout to abort the call if the
timeout is reached.
get_mounts()is now a method of a class so we can exclude a pathaltogether during the whole
inventoryexecution (otherwise,ceph-volume would try to access it as many devices there is on the
host which could slow down the inventory execution)
Fixes: https://tracker.ceph.com/issues/57070
Signed-off-by: Guillaume Abrioux gabrioux@redhat.com