osdc: add timeout configs for mons/osds#37529
Conversation
b1a9eda to
8606ca1
Compare
|
retest this please |
|
jenkins test api |
dillaman
left a comment
There was a problem hiding this comment.
osdc changes lgtm (PR has been intermixed with apparent unrelated changes per the PR title, though)
|
retest this please |
|
jenkins test api |
|
https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20201007.214100 failures unrelated. |
|
jenkins test make check |
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Have the Objecter track the rados_(mon|osd)_op_timeout configs so that it can be configured at runtime/startup. This is useful for the MDS/ceph-fuse so that we can avoid waiting forever for a response from the Monitors that will never come (statfs on a deleted file system's pools). Also: make these configs take a time value rather than double. This is simpler to deal with in the code and allows time units to be used (e.g. "5m" for 5 minutes). Fixes: https://tracker.ceph.com/issues/47734 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Otherwise we have unnecessary timeout waits. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The mount.cleanup method will remove the mount point. This `rm -rf` will always fail (with exit status 0). Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Now that the osdc Objecter obeys updates to these configs, let's use them to avoid having them block forever on operations that may never complete (or should complete in a timely manner). Fixes: https://tracker.ceph.com/issues/47734 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
|
@dillaman this is ready for a final review. I've sorted out the API test failures and cleaned up the commits. |
|
The dashboard QA test run was successful: https://pulpito.ceph.com/tdehler-2020-10-16_09:25:17-rados:dashboard-wip-tdehler-testing-37529-37564-distro-basic-smithi/ |
|
https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20201013.174240 failures unrelated |
|
@batrick https://tracker.ceph.com/issues/48030 is a regression introduced by this PR, @sseshasa has root-caused it here https://tracker.ceph.com/issues/48030#note-11. |
This issue is really caused by this commit 306eebe. The RadosClient seemed to be plugged into the conf change interface as an observer but the Of course, the changes in this PR exposed the problem so I'll fix it. |
DNM because I want to test this with Octopus before merging. The problem is avoided on testing in master.
See: https://tracker.ceph.com/issues/47734