objecter: request OSDMap after idle ticks#63425
Conversation
There was a problem hiding this comment.
Pull Request Overview
Adds logic to re-request an OSDMap if the objecter goes two consecutive ticks without receiving a fresh map by tracking when the last map was requested.
- Introduce
last_osdmap_receivedtimestamp inObjecter - Update timestamp in
_maybe_request_map() - In
tick(), check for stale map after two intervals and re-request if needed
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/osdc/Objecter.h | Added last_osdmap_received member to track map timing |
| src/osdc/Objecter.cc | Set timestamp on map request and stale-check in tick() |
Comments suppressed due to low confidence (2)
src/osdc/Objecter.cc:2248
- [nitpick] The name
windowis generic and may be unclear in context. Consider a more descriptive name likestale_windoworrequest_threshold.
auto window = ceph::make_timespan(cct->_conf->objecter_tick_interval) * 2;
src/osdc/Objecter.cc:2245
- The new stale-check path in
tick()isn't covered by existing tests. Consider adding unit or integration tests to verify that an OSDMap is re-requested after two idle ticks.
} else if (last_osdmap_received != ceph::coarse_mono_clock::time_point()) {
95d9df6 to
025a984
Compare
|
thanks @NitzanMordhai! this bug was initially caught by an api test timeout, and we temporarily disabled the failing test case in #63126. can you please reenable it in this pr so we can verify that it reduces the request latency? -@skip_unless_dashboard_pr
class RgwSiteTest(RgwTestCase): |
| double thresh_s = std::chrono::duration<double>(stale_window).count(); | ||
| ldout(cct, 10) << __func__ << ": osdmap stale: " << elapsed_s | ||
| << "s > " << thresh_s << "s, maybe requesting map" << dendl; | ||
| _maybe_request_map(); |
There was a problem hiding this comment.
I very like the high-level idea of spreading the work across time, even if (slightly) more needs to be done.
sure, will revert the commit |
025a984 to
ae84da7
Compare
@NitzanMordhai, thanks for the fix. Btw, reverting the dashboard commit will make the decorator go away which we don't want because that's being reused in other parts as well. So let's just only remove the decorator above the |
If the objecter goes two ticks without receiving an OSDMap, it may fall behind on incremental updates. Call `maybe_request_osdmap()` in the tick handler whenever we detect that two consecutive ticks have elapsed without a fresh map, so we minimize the time needed to catch up on missed changes. Fixes: https://tracker.ceph.com/issues/71261 Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>
ae84da7 to
9e35235
Compare
|
@nizamial09 done |
|
radosgw.8000.log shows very low latency👍
will try again a couple times to make sure it's consistent |
|
jenkins test api |
|
https://jenkins.ceph.com/job/ceph-api/96543/artifact/build/out/radosgw.8000.log
|
|
jenkins test make check |
|
jenkins test make check arm64 |
|
@NitzanMordhai dashboard lint error is a valid one. Looks like the unused import |
|
@NitzanMordhai ping^ |
thanks, will handle it now |
9e35235 to
54df886
Compare
|
@nizamial09 done |
|
make check is failing on a random error I guess but the arm64 is still failing on lint might be easier if you run You can verify it by running |
Fixes: https://tracker.ceph.com/issues/71261 Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>
54df886 to
adaab09
Compare
Config Diff Tool Output- removed: breakpad
! changed: osd_deep_scrub_interval: old: Deep scrub each PG (i.e., verify data checksums) at least this often. Note that this option must be set at ``global`` scope, or for both ``mgr`` and``osd``.
! changed: osd_deep_scrub_interval: new: Deep scrub each PG (i.e., verify data checksums) at least this often
! changed: osd_scrub_min_interval: old: The desired interval between scrubs of a specific PG. Note that this option must be set at ``global`` scope, or for both ``mgr`` and``osd``.
! changed: osd_scrub_min_interval: new: The desired interval between scrubs of a specific PG.
! changed: osd_scrub_max_interval: old: Scrub each PG no less often than this interval. Note that this option must be set at ``global`` scope, or for both ``mgr`` and``osd``.
! changed: osd_scrub_max_interval: new: Scrub each PG no less often than this interval
! changed: osd_deep_scrub_interval_cv: old: deep scrub intervals are varied by a random amount to prevent stampedes. This parameter determines the amount of variation. Technically ``osd_deep_scrub_interval_cv`` is the coefficient of variation for the deep scrub interval.
! changed: osd_deep_scrub_interval_cv: new: deep scrub intervals are varied by a random amount to prevent stampedes. This parameter determines the amount of variation. Technically - osd_deep_scrub_interval_cv is the coefficient of variation for the deep scrub interval.
! changed: osd_deep_scrub_interval_cv: old: The coefficient of variation for the deep scrub interval, specified as a ratio. On average, the next deep scrub for a PG is scheduled osd_deep_scrub_interval after the last deep scrub . The actual time is randomized to a normal distribution with a standard deviation of osd_deep_scrub_interval * osd_deep_scrub_interval_cv (clamped to within 2 standard deviations). The default value guarantees that 95% of deep scrubs will be scheduled in the range [0.8 * osd_deep_scrub_interval, 1.2 * osd_deep_scrub_interval].
! changed: osd_deep_scrub_interval_cv: new: The coefficient of variation for the deep scrub interval, specified as a ratio. On average, the next deep scrub for a PG is scheduled osd_deep_scrub_interval after the last deep scrub . The actual time is randomized to a normal distribution with a standard deviation of osd_deep_scrub_interval * osd_deep_scrub_interval_cv (clamped to within 2 standard deviations). The default value guarantees that 95% of the deep scrubs will be scheduled in the range [0.8 * osd_deep_scrub_interval, 1.2 * osd_deep_scrub_interval].
The above configuration changes are found in the PR. Please update the relevant release documentation if necessary. |
|
@nizamial09 thanks for your review and pointing all the issues, the PR is ready now, I don't think there are any errors related to it |
nizamial09
left a comment
There was a problem hiding this comment.
Thank you too @NitzanMordhai !
|
can we merge this or is it waiting on teuthology? |
i added needs-qa, will ask that it will be picked and tested |
If the objecter goes two ticks without receiving an OSDMap, it may fall behind on incremental updates. Call
maybe_request_osdmap()in the tick handler whenever we detect that two consecutive ticks have elapsed without a fresh map, so we minimize the time needed to catch up on missed changes.Fixes: https://tracker.ceph.com/issues/71261
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job Definition