Skip to content

ceph-volume: system.get_mounts() refactor#47514

Merged
guits merged 1 commit intoceph:mainfrom
guits:c-v-thread-inventory-2
Aug 10, 2022
Merged

ceph-volume: system.get_mounts() refactor#47514
guits merged 1 commit intoceph:mainfrom
guits:c-v-thread-inventory-2

Conversation

@guits
Copy link
Contributor

@guits guits commented Aug 9, 2022

When a network mount is present in /proc/mounts but for any reason
the corresponding server is down, this function hangs forever.
In a cluster deployed with cephadm, the consequence is that
it triggers ceph-volume inventory commands that hang and stay in D
state.

The idea here is to use a thread with a timeout to abort the call if the
timeout is reached.
get_mounts() is now a method of a class so we can exclude a path
altogether during the whole inventory execution (otherwise,
ceph-volume would try to access it as many devices there is on the
host which could slow down the inventory execution)

Fixes: https://tracker.ceph.com/issues/57070

Signed-off-by: Guillaume Abrioux gabrioux@redhat.com

@guits guits requested a review from a team as a code owner August 9, 2022 09:25
@guits guits force-pushed the c-v-thread-inventory-2 branch from 5d6203a to ba089ed Compare August 9, 2022 09:31
When a network mount is present in `/proc/mounts` but for any reason
the corresponding server is down, this function hangs forever.
In a cluster deployed with cephadm, the consequence is that
it triggers `ceph-volume inventory` commands that hang and stay in D
state.

The idea here is to use a thread with a timeout to abort the call if the
timeout is reached.
`get_mounts()` is now a method of a class so we can exclude a path
altogether during the whole `inventory` execution (otherwise,
ceph-volume would try to access it as many devices there is on the
host which could slow down the inventory execution)

Fixes: https://tracker.ceph.com/issues/57070

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
@guits guits force-pushed the c-v-thread-inventory-2 branch from ba089ed to 89cad1f Compare August 9, 2022 09:44
@guits
Copy link
Contributor Author

guits commented Aug 9, 2022

jenkins test ceph-volume tox

@guits
Copy link
Contributor Author

guits commented Aug 9, 2022

jenkins test windows

1 similar comment
@guits
Copy link
Contributor Author

guits commented Aug 9, 2022

jenkins test windows

Copy link
Contributor

@adk3798 adk3798 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +306 to +307
if t.is_alive():
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, does this mean the thread just hangs around until the whole process exits if it is hanging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.join(timeout) blocks the current thread until the target thread (t) finishes or reaches the timeout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants