Bug #72778
closedcrimson-debug: Health check failure - SLOW_OPS (osd.0 blocked for 32s)
0%
Description
failure_reason: '"2025-08-26T13:05:11.032708+0000 mon.a (mon.0) 1538 : cluster [WRN] Health check failed: 1 slow ops, oldest one blocked for 32 sec, osd.0 has slow ops (SLOW_OPS)" in cluster log' flavor: crimson-debug
Result: Tests passed, but slow ops reported.
Updated by Matan Breizman 7 months ago
Thanks for reporting this tracker!
I would recommend to provide as much context as possible:
1) What was the slow op that is reported on osd.1?
The logs attached only shows that we had one, without additional information.
For the actual information we would need to look at the osd log of this job and grep for: `slow request` `slow_ops` and so on.
2) In which job did the error occurred? Meaning, rbd_api_tests
3) If relevant, which backend was used for the reported job. This could potentially further help with debugging the issue.
See for example and older slow op report which we had - https://tracker.ceph.com/issues/69612
Updated by Matan Breizman 6 months ago
- Tags changed from crimson to crimson, frequent-failure
- Priority changed from Normal to High
Updated by Jose J Palacios Perez 6 months ago
- This https://pulpito.ceph.com/jjperez-2025-09-17_20:02:04-crimson-rados-wip-perezjos-crimson-only-17-09-2025-PR65166-distro-crimson-debug-smithi/8506473/ looks like a duplicate:
"2025-09-17T22:10:41.129589+0000 osd.2 (osd.2) 1 : cluster [WRN] slow requests (most affected pool [ 'test-librbd-smithi114-35807-49' : 1 ])" in cluster log
Executed as:
teuthology-suite -m smithi -s crimson-rados -p 101 --force-priority -c wip-perezjos-crimson-only-17-09-2025-PR65166 -f crimson-debug --filter seastore --distro centos --distro-version 9 --suite-branch main --suite-repo https://github.com/ceph/ceph.git #PR 65166
Updated by Matan Breizman 6 months ago
- Assignee set to Shraddha Agrawal
Hey Shraddha, would you like to take a look on this one?
Please feel free to ping to me or Aishwarya if you have any questions
Thanks! :)
Updated by Shraddha Agrawal 6 months ago
Hey Matan, thanks for assigning this tracker to me. I'll investigate and get back with my findings!
Updated by Shraddha Agrawal 5 months ago
I see mon.a reported SLOW_OPS:
2025-08-26T13:05:18.908+0000 7f43849f1640 20 mon.a@0(leader).mgrstat health checks:
{
"SLOW_OPS": {
"severity": "HEALTH_WARN",
"summary": {
"message": "1 slow ops, oldest one blocked for 37 sec, osd.0 has slow ops",
"count": 1
},
"detail": []
}
}
In osd.0 logs I see:
DEBUG 2025-08-26 13:05:27,456 [shard 0:main] ms - [0x7c7ab5c7d1c0 osd.0(cluster) v2:172.21.15.50:6804/668993626 >> osd.1 v2:172.21.15.50:6800/3018203893@62110] GOT AckFrame: seq=53987
DEBUG 2025-08-26 13:05:28,100 [shard 0:main] seastore - SeaStore::stat: ...
DEBUG 2025-08-26 13:05:28,100 [shard 0:main] seastore - SeaStoreS::stat: stat=store_statfs(0x63be49000/0x0/0x63c000000, data 0x1b7000/0x1b7000, compress 0x0/0x0/0x0, omap 0x0, meta 0x0)
WARN 2025-08-26 13:05:28,100 [shard 0:main] osd - OSD::get_health_metrics: slow request m=[osd_op(client.4462.0:1 51.1a 51:5bdd5772:::image1.rbd:head [stat] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e1209)] initiated 2025-08-26T13:04:37.130137+0000
ERROR 2025-08-26 13:05:28,100 [shard 0:main] osd - OSD::get_health_metrics: get_health_metrics reporting 1 slow ops, oldest is m=[osd_op(client.4462.0:1 51.1a 51:5bdd5772:::image1.rbd:head [stat] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e1209)]
WARN 2025-08-26 13:05:28,100 [shard 0:main] osd - OSD::get_health_metrics: slow requests (most affected pool [ 'test-librbd-smithi050-36017-49' : 1 ])
WARN 2025-08-26 13:05:28,100 [shard 0:main] monc - log [WRN] : slow requests (most affected pool [ 'test-librbd-smithi050-36017-49' : 1 ])
DEBUG 2025-08-26 13:05:28,101 [shard 1:main] seastore - SeaStoreS::stat: stat=store_statfs(0x63bf02000/0x0/0x63c000000, data 0xfe000/0xfe000, compress 0x0/0x0/0x0, omap 0x0, meta 0x0)
DEBUG 2025-08-26 13:05:28,101 [shard 0:main] seastore - SeaStore::stat: done, stat=store_statfs(0xc77d4b000/0x0/0xc78000000, data 0x2b5000/0x2b5000, compress 0x0/0x0/0x0, omap 0x0, meta 0x0)
This on the first look, looks similar to https://tracker.ceph.com/issues/69612. I am still trying to determine if that is indeed the case or not.
Updated by Shraddha Agrawal 4 months ago · Edited
Here is the summary of all the above instances:
| No. | Job | Slow op time | Object store | Operation |
| 1 | crimson-rados/rbd/{clusters/fixed-1 crimson-supported-all-distro/centos_latest crimson_qa_overrides deploy/ceph objectstore/seastore/seastore-segmented tasks/rbd_api_tests} |
24s. started at 13:05:08 and ended at 13:05:32. | seastore | image1.rbd:head |
| 2 | crimson-rados/rbd/{clusters/fixed-1 crimson-supported-all-distro/centos_latest crimson_qa_overrides deploy/ceph objectstore/seastore/seastore-segmented tasks/rbd_api_tests} |
4m 48s. starts at 23:03:05 and ends at 23:07:53. | seastore | undecoded |
| 3 | crimson-rados/rbd/{clusters/fixed-1 crimson-supported-all-distro/centos_latest crimson_qa_overrides deploy/ceph objectstore/seastore/seastore-segmented tasks/rbd_api_tests} |
22s. starts at 22:10:41 and ends at 22:11:03. | seastore | undecoded |
| 4 | crimson-rados/singleton/{all/osd-pg-splitting crimson-supported-all-distro/centos_latest crimson_qa_overrides objectstore/bluestore rados} |
1m27s.started at 06:20:09 and ended at 06:21:36. | bluestore | benchmark_data_smithi032_39257_object10731 |
| 5 | crimson-rados/perf/{clusters/{fixed-2} crimson-supported-all-distro/centos_latest crimson_qa_overrides deploy/ceph objectstore/seastore/seastore-segmented settings/optimized workloads/radosbench_4K_rand_read} |
2s. started at 5:37:58 and ended at 5:38:00. | seastore | benchmark_data_smithi083_37953_object9185 |
| 6 | crimson-rados/rbd/{clusters/crimson-fixed-1 crimson-supported-all-distro/centos_latest crimson_qa_overrides deploy/ceph objectstore/seastore/segmented$/{crimson_seastore_segmented_2q} |
1m9s. started at 17:32:36 and ended at 17:33:45. | seastore | undeocded |
Updated by Matan Breizman 4 months ago · Edited
Shraddha Agrawal wrote in #note-11:
Here is the summary of all the above instances:
No. Job Slow op time Object store Operation 6 crimson-rados/rbd/{clusters/crimson-fixed-1 crimson-supported-all-distro/centos_latest
crimson_qa_overrides deploy/ceph objectstore/seastore/segmented$/{crimson_seastore_segmented_2q}1m9s. started at 17:32:36 and ended at 17:33:45. seastore > undeocded
Based on the summary above, we can see that the slow_op reports are quite random (object store, workload and so on) I've tracked the last report below:
Looking at:
https://pulpito.ceph.com/matan-2025-11-09_14:54:31-crimson-rados-main-distro-crimson-debug-smithi/8590288/
With:
"2025-11-09T17:32:43.041212+0000 mon.a (mon.0) 1538 : cluster [WRN] Health check failed: 1 slow ops, oldest one blocked for 34 sec, osd.1 has slow ops (SLOW_OPS)" in cluster log
From mon.0 log:
"message": "1 slow ops, oldest one blocked for 34 sec, osd.1 has slow ops",
...
"message": "1 slow ops, oldest one blocked for 94 sec, osd.1 has slow ops",
"message": "1 slow ops, oldest one blocked for 94 sec, osd.1 has slow ops",
While the message reported is "for 34 sec", the operation is blocked for longer than that - due to the way we scan the log for warnings, the first message is the one that is reported.
From osd.1 log:
DEBUG 2025-11-09 17:33:40,744 [shard 0:main] seastore - SeaStore::stat: ... DEBUG 2025-11-09 17:33:40,744 [shard 0:main] seastore - SeaStoreS::stat: stat=store_statfs(0x63a461000/0x0/0x63c000000, data 0x1b9f000/0x1b9f000, compress 0x0/0x0/0x0, omap 0x0, meta 0x0) WARN 2025-11-09 17:33:40,744 [shard 0:main] osd - OSD::get_health_metrics: slow request m=[osd_op(client.4429.0:2 51.17 51.ba46737 (undecoded) ondisk+read+known_if_redirected+supports_pool_eio e1222)] initiated 2025-11-09T17:32:05.773478+0000 ERROR 2025-11-09 17:33:40,745 [shard 0:main] osd - OSD::get_health_metrics: get_health_metrics reporting 1 slow ops, oldest is m=[osd_op(client.4429.0:2 51.17 51.ba46737 (undecoded) ondisk+read+known_if_redirected+supports_pool_eio e1222)] WARN 2025-11-09 17:33:40,745 [shard 0:main] osd - OSD::get_health_metrics: slow requests (most affected pool [ 'test-librbd-smithi022-35747-49' : 1 ]) WARN 2025-11-09 17:33:40,745 [shard 0:main] monc - log [WRN] : slow requests (most affected pool [ 'test-librbd-smithi022-35747-49' : 1 ])
Tracking client.4429.0:2 51.17 51.ba46737:
DEBUG 2025-11-09 17:32:05,773 [shard 0:main] ms - [0x7cb2d7513840 osd.1(client) v2:172.21.15.22:6809/3486956418 >> client.4429 172.21.15.22:0/4286543537@45138] <== #1 === osd_op(client.4429.0:2 51.17 51.ba46737 (undecoded) ondisk+read+known_if_redirected+supports_pool_eio e1222) v9 (42) DEBUG 2025-11-09 17:32:05,773 [shard 0:main] osd - client_request(id=209132, detail=m=[osd_op(client.4429.0:2 51.17 51.ba46737 (undecoded) ondisk+read+known_if_redirected+supports_pool_eio e1222)]): starting start_pg_operation DEBUG 2025-11-09 17:32:05,773 [shard 0:main] osd - client_request(id=209132, detail=m=[osd_op(client.4429.0:2 51.17 51.ba46737 (undecoded) ondisk+read+known_if_redirected+supports_pool_eio e1222)]): start_pg_operation in await_active stage DEBUG 2025-11-09 17:32:05,774 [shard 0:main] osd - client_request(id=209132, detail=m=[osd_op(client.4429.0:2 51.17 51.ba46737 (undecoded) ondisk+read+known_if_redirected+supports_pool_eio e1222)]): start_pg_operation active, entering await_map DEBUG 2025-11-09 17:32:05,774 [shard 0:main] osd - client_request(id=209132, detail=m=[osd_op(client.4429.0:2 51.17 51.ba46737 (undecoded) ondisk+read+known_if_redirected+supports_pool_eio e1222)]): start_pg_operation await_map stage DEBUG 2025-11-09 17:32:07,905 [shard 0:main] osd - client_request(id=209132, detail=m=[osd_op(client.4429.0:2 51.17 51.ba46737 (undecoded) ondisk+read+known_if_redirected+supports_pool_eio e1222)]): got map 1222, entering get_pg_mapping DEBUG 2025-11-09 17:32:07,908 [shard 0:main] osd - client_request(id=209132, detail=m=[osd_op(client.4429.0:2 51.17 51.ba46737 (undecoded) ondisk+read+known_if_redirected+supports_pool_eio e1222)]): can_create=false, target-core=0 DEBUG 2025-11-09 17:32:07,908 [shard 0:main] osd - client_request(id=209132, detail=m=[osd_op(client.4429.0:2 51.17 51.ba46737 (undecoded) ondisk+read+known_if_redirected+supports_pool_eio e1222)]): entering create_or_wait_pg ... Operation is blocked for more than one minute! ... DEBUG 2025-11-09 17:33:41,611 [shard 0:main] osd - client_request(id=209132, detail=m=[osd_op(client.4429.0:2 51.17 51.ba46737 (undecoded) ondisk+read+known_if_redirected+supports_pool_eio e1222)]): have_pg
Step 2: Why was create_or_wait_pg blocked?
Following get_pg_mapping:
DEBUG 2025-11-09 17:32:07,905 [shard 0:main] osd - client_request(id=209132, detail=m=[osd_op(client.4429.0:2 51.17 51.ba46737 (undecoded) ondisk+read+known_if_redirected+supports_pool_eio e1222)]): got map 1222, entering get_pg_mapping DEBUG 2025-11-09 17:32:07,906 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: calling primary to add mapping for pg 51.17 to the expected core 4294967295 DEBUG 2025-11-09 17:32:07,906 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 51.17 to core 0 (primary): num_pgs 23 DEBUG 2025-11-09 17:32:07,907 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 51.17 to core 0 (others) DEBUG 2025-11-09 17:32:07,908 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 51.17 mapping to core 0 after broadcasted
Following PG 51.17:
DEBUG 2025-11-09 17:33:41,582 [shard 0:main] osd - PGMap::set_creating: Creating 51.17 DEBUG 2025-11-09 17:33:41,583 [shard 0:main] seastore - SeaStoreS::create_new_collection: cid=51.17_head DEBUG 2025-11-09 17:33:41,597 [shard 0:main] osd - PGAdvanceMap: pg_advance_map(id=0, detail=PGAdvanceMap(pg=51.17 from=4294967295 to=1230 do_init)): created DEBUG 2025-11-09 17:33:41,611 [shard 0:main] osd - client_request(id=209132, detail=m=[osd_op(client.4429.0:2 51.17 51.ba46737 (undecoded) ondisk+read+known_if_redirected+supports_pool_eio e1222)]): have_pg
It seems that pg 51.17 was created after a minute or so (17:33:41) which is exactly when the client_request moved to "have_pg" stage and was unblocked.
Conclusions:
The operation reported was blocked for more than a minute, when tracking the cause for the block we have found that it was caused due to waiting for the pg to be created. The positive side is that all is working as expected, there is no outlined issue with the behavior. However, we should still understand how we can avoid these kinds of reports. There are two possible approaches:
1) Increase slow op report threshold
2) Bump reactor number or cores used for testing
For 1, we've seen in the report above that the op was blocked for almost 2 minutes - increasing the threshold this high will possibly a) make us miss valueable reports and b) might not be high enough for future reports which will be blocked for longer than that.
For 2, Currently we use 2/3 reactors per OSD, which is low. As we seen in the mapping above:
mapping pg 51.17 to core 0 (primary): num_pgs 23
core 0, has 23 pgs mapped into it! This should explain the delay in creation on the pg. Increasing the core count / reactor count should better distribute the pgs to cores and possibly solve the delays we see in the suite. See how we currently set the number of reactors to use with "crimson cpu num:" in the crimson-rados suite.
Feel free to reach out if any of the above is unclear! btw, my favorite tool for logging investigation, used above, is "klogg".
Updated by Shraddha Agrawal 4 months ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 66307
Updated by Upkeep Bot 4 months ago
- Status changed from Fix Under Review to Resolved
- Merge Commit set to d1eb2440991ec1337a2213c426fd7ec581817c59
- Fixed In set to v20.3.0-4255-gd1eb244099
- Upkeep Timestamp set to 2025-11-19T07:23:06+00:00