mon/MonClient: resurrect original client_mount_timeout handling#52124
Merged
mon/MonClient: resurrect original client_mount_timeout handling#52124
Conversation
While reducing a "waiting for config" timeout from 30 seconds to 3 (mon_client_hunt_interval default) and instead introducing 10 retries, commit 3c2b30e ("mon/MonClient: apply timeout while fetching config") also subjected authenticate() to these retries. However, authenticate() is going by client_mount_timeout which defaults to 5 minutes. As a result, when the monitors are unreachable or there are other connectivity issues, we end up taking 50 minutes to return ETIMEDOUT from rados_connect(). Fixes: https://tracker.ceph.com/issues/61733 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Over the years, with commits 8bc1974 ("ceph_test_librados_api_misc: fix stupid LibRadosMiscConnectFailure.ConnectFailure test") and f357459 ("test/librados: modify LibRadosMiscConnectFailure.ConnectFailure to comply with new seconds unit"), this test has lost its original meaning. Any ability to time out is definitely gone since a 1 second timeout is too high to kick in a normally functioning setup, not to mention that the timeout was being ballooned to 10 seconds until the previous commit. The first connection attempt would normally succeed while the second one would immediately fail with "cannot identify monitors to contact" error due do the cluster handle getting recreated. The 16 iteration loop is dead code. This commit just codifies the above to avoid the appearance that this test has anything to do with timeouts. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Contributor
Author
|
jenkins test api |
Contributor
Author
|
jenkins test windows |
rzarzynski
approved these changes
Jun 21, 2023
|
|
||
| utime_t dur = end - start; | ||
| ASSERT_GE(dur, utime_t(2, 0)); | ||
| ASSERT_LT(dur, utime_t(4, 0)); |
Member
|
@idryomov I'm seeing a failure that looks related to this PR:
The failure reproduced several other times in this test link: |
Member
|
Please re-add "needs-qa" when this needs testing again. |
Contributor
Author
|
@batrick and I agree that this appears to be caused by #50503 that was present in the same batch. What I see happening is:
Please retest without #50503 in the mix. |
Member
|
Sure, thanks for looking into it @idryomov |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows