Actions
Bug #70811
closedosd: Recovery latency related perf counters are calculated incorrectly.
% Done:
0%
Source:
Development
Backport:
reef, squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Tags (freeform):
backport_processed
Merge Commit:
Fixed In:
v20.0.0-1384-g99d9fb5558
Released In:
v20.2.0~603
Upkeep Timestamp:
2025-11-01T01:28:44+00:00
Description
This was noticed while analyzing some logs and code inspection.
This affects PGRecovery, PGRecoveryContext and PGRecoveryMsg objects.
Example of incorrect calculation for PGRecovery object in OpSchedulerItem.cc:
void PGRecovery::run(
OSD *osd,
OSDShard *sdata,
PGRef& pg,
ThreadPool::TPHandle &handle)
{
osd->logger->tinc(
l_osd_recovery_queue_lat,
time_queued - ceph_clock_now());
osd->do_recovery(pg.get(), epoch_queued, reserved_pushes, priority, handle);
pg->unlock();
}
The correct latency calculation must be ceph_clock_now() - time_queued
Results from perf dump showing incorrect results:
"l_osd_recovery_push_queue_latency": {
"avgcount": 55093,
"sum": 6247005052.341587249,
"avgtime": 113390.177560517
},
"l_osd_recovery_push_reply_queue_latency": {
"avgcount": 130713,
"sum": 18297766598.029444451,
"avgtime": 139984.290759369
},
"l_osd_recovery_pull_queue_latency": {
"avgcount": 130713,
"sum": 18297766598.029444451,
"avgtime": 139984.290759369
},
"l_osd_recovery_backfill_queue_latency": {
"avgcount": 130723,
"sum": 5907207331.184129853,
"avgtime": 45188.737492133
},
"l_osd_recovery_backfill_remove_queue_latency": {
"avgcount": 130723,
"sum": 5907207331.184129853,
"avgtime": 45188.737492133
},
"l_osd_recovery_scan_queue_latency": {
"avgcount": 130764,
"sum": 15980169803.779663494,
"avgtime": 122206.186746961
},
"l_osd_recovery_queue_latency": {
"avgcount": 80616,
"sum": 16144014518.570876928,
"avgtime": 200258.193393009
},
"l_osd_recovery_context_queue_latency": {
"avgcount": 0,
"sum": 0.000000000,
"avgtime": 0.000000000
},
Updated by Sridhar Seshasayee 12 months ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 62704
Updated by Sridhar Seshasayee 12 months ago
The counters after applying the fix:
"l_osd_recovery_push_queue_latency": {
"avgcount": 17977,
"sum": 1.066402826,
"avgtime": 0.000059320
},
"l_osd_recovery_push_reply_queue_latency": {
"avgcount": 68344,
"sum": 3.422215369,
"avgtime": 0.000050073
},
"l_osd_recovery_pull_queue_latency": {
"avgcount": 68386,
"sum": 9.632210712,
"avgtime": 0.000140850
},
"l_osd_recovery_backfill_queue_latency": {
"avgcount": 68386,
"sum": 9.632210712,
"avgtime": 0.000140850
},
"l_osd_recovery_backfill_remove_queue_latency": {
"avgcount": 68386,
"sum": 9.632210712,
"avgtime": 0.000140850
},
"l_osd_recovery_scan_queue_latency": {
"avgcount": 68386,
"sum": 9.632210712,
"avgtime": 0.000140850
},
"l_osd_recovery_queue_latency": {
"avgcount": 52590,
"sum": 5.761666190,
"avgtime": 0.000109558
},
"l_osd_recovery_context_queue_latency": {
"avgcount": 0,
"sum": 0.000000000,
"avgtime": 0.000000000
},
Updated by Sridhar Seshasayee 11 months ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to reef, squid
Updated by Upkeep Bot 11 months ago
- Copied to Backport #70903: reef: osd: Recovery latency related perf counters are calculated incorrectly. added
Updated by Upkeep Bot 11 months ago
- Copied to Backport #70904: squid: osd: Recovery latency related perf counters are calculated incorrectly. added
Updated by Sridhar Seshasayee 10 months ago
- Status changed from Pending Backport to Resolved
Updated by Upkeep Bot 9 months ago
- Merge Commit set to 99d9fb5558ccdc34deacf2541587ee4775329ed1
- Fixed In set to v20.0.0-1384-g99d9fb5558c
- Upkeep Timestamp set to 2025-07-09T18:11:33+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v20.0.0-1384-g99d9fb5558c to v20.0.0-1384-g99d9fb5558
- Upkeep Timestamp changed from 2025-07-09T18:11:33+00:00 to 2025-07-14T18:12:15+00:00
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~603
- Upkeep Timestamp changed from 2025-07-14T18:12:15+00:00 to 2025-11-01T01:28:44+00:00
Actions