Skip to content

osdc: add timeout configs for mons/osds#37529

Merged
batrick merged 7 commits intoceph:masterfrom
batrick:i47734
Oct 20, 2020
Merged

osdc: add timeout configs for mons/osds#37529
batrick merged 7 commits intoceph:masterfrom
batrick:i47734

Conversation

@batrick
Copy link
Member

@batrick batrick commented Oct 3, 2020

DNM because I want to test this with Octopus before merging. The problem is avoided on testing in master.

See: https://tracker.ceph.com/issues/47734

@batrick
Copy link
Member Author

batrick commented Oct 6, 2020

retest this please

@batrick
Copy link
Member Author

batrick commented Oct 6, 2020

#37530 (comment)

@batrick
Copy link
Member Author

batrick commented Oct 6, 2020

jenkins test api

dillaman
dillaman previously approved these changes Oct 7, 2020
Copy link

@dillaman dillaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

osdc changes lgtm (PR has been intermixed with apparent unrelated changes per the PR title, though)

@batrick
Copy link
Member Author

batrick commented Oct 7, 2020

retest this please

@batrick
Copy link
Member Author

batrick commented Oct 8, 2020

jenkins test api

@batrick
Copy link
Member Author

batrick commented Oct 8, 2020

@batrick
Copy link
Member Author

batrick commented Oct 13, 2020

jenkins test make check

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Have the Objecter track the rados_(mon|osd)_op_timeout configs so that
it can be configured at runtime/startup. This is useful for the
MDS/ceph-fuse so that we can avoid waiting forever for a response from
the Monitors that will never come (statfs on a deleted file system's
pools).

Also: make these configs take a time value rather than double. This is
simpler to deal with in the code and allows time units to be used (e.g.
"5m" for 5 minutes).

Fixes: https://tracker.ceph.com/issues/47734
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Otherwise we have unnecessary timeout waits.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The mount.cleanup method will remove the mount point. This `rm -rf` will
always fail (with exit status 0).

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Now that the osdc Objecter obeys updates to these configs, let's use
them to avoid having them block forever on operations that may never
complete (or should complete in a timely manner).

Fixes: https://tracker.ceph.com/issues/47734
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@batrick
Copy link
Member Author

batrick commented Oct 13, 2020

@dillaman this is ready for a final review. I've sorted out the API test failures and cleaned up the commits.

Copy link

@dillaman dillaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@s0nea
Copy link
Member

s0nea commented Oct 19, 2020

@batrick
Copy link
Member Author

batrick commented Oct 20, 2020

@batrick batrick merged commit 008eaca into ceph:master Oct 20, 2020
@batrick batrick deleted the i47734 branch October 20, 2020 02:27
@neha-ojha
Copy link
Member

@batrick https://tracker.ceph.com/issues/48030 is a regression introduced by this PR, @sseshasa has root-caused it here https://tracker.ceph.com/issues/48030#note-11.

@batrick
Copy link
Member Author

batrick commented Nov 30, 2020

@batrick https://tracker.ceph.com/issues/48030 is a regression introduced by this PR, @sseshasa has root-caused it here https://tracker.ceph.com/issues/48030#note-11.

This issue is really caused by this commit 306eebe. The RadosClient seemed to be plugged into the conf change interface as an observer but the add_observer call in the cons was forgotten.

Of course, the changes in this PR exposed the problem so I'll fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants