Actions
Bug #70502
closedradosbench-high-concurrency: [Backfill] Single PG is stuck in waiting
% Done:
0%
Source:
Backport:
tentacle
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
backport_processed
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:
Description
Note: reproducible with radosbench-high-concurrency¶
Found on a new test case added "radosbench-even-higher-concurrency" with "short_pg_log":
short_pg_log objectstore/bluestore thrashers/default thrashosds-health workloads/radosbench-even-higher-concurrency}
pg 5.d not recovered and we timeout:
2025-03-16T16:13:19.593 INFO:tasks.ceph.ceph_manager.ceph:dumping pgs not recovered yet
2025-03-16T16:13:19.948 INFO:tasks.ceph.ceph_manager.ceph:PG 5.d is not active+clean
2025-03-16T16:13:19.948 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '5.d', 'version': "221'727", 'reported_seq': 927, 'reported_epoch': 222, 'state': 'active+undersized+remapped+backfilling', 'last_fresh': '2025-03-16T15:33:09.740620+0000', 'last_change': '2025-03-16T15:33:07.895829+0000', 'last_active': '2025-03-16T15:33:09.740620+0000', 'last_peered': '2025-03-16T15:33:09.740620+0000', 'last_clean': '2025-03-16T15:29:52.796377+0000', 'last_became_active': '2025-03-16T15:33:04.842816+0000', 'last_became_peered': '2025-03-16T15:33:04.842816+0000', 'last_unstale': '2025-03-16T15:33:09.740620+0000', 'last_undegraded': '2025-03-16T15:33:09.740620+0000', 'last_fullsized': '2025-03-16T15:33:04.717503+0000', 'mapping_epoch': 217, 'log_start': "209'648", 'ondisk_log_start': "209'648", 'created': 150, 'last_epoch_clean': 151, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "0'0", 'last_scrub_stamp': '2025-03-16T15:29:14.210686+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2025-03-16T15:29:14.210686+0000', 'last_clean_scrub_stamp': '2025-03-16T15:29:14.210686+0000', 'objects_scrubbed': 0, 'log_size': 79, 'log_dups_size': 648, 'ondisk_log_size': 79, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'last_scrub_duration': 0, 'scrub_schedule': '--', 'scrub_duration': 0, 'objects_trimmed': 0, 'snaptrim_duration': 0, 'stat_sum': {'num_bytes': 24, 'num_objects': 1, 'num_object_clones': 0, 'num_object_copies': 3, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 0, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 0, 'num_whiteouts': 0, 'num_read': 1, 'num_read_kb': 1, 'num_write': 727, 'num_write_kb': 5161, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 22, 'num_bytes_recovered': 1155072, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [0, 2, 3], 'acting': [0, 3], 'avail_no_missing': ['0', '2', '3'], 'object_location_counts': [{'shards': '0,3', 'objects': 1}], 'blocked_by': [], 'up_primary': 0, 'acting_primary': 0, 'purged_snaps': []}
2025-03-16T17:05:47.123 INFO:tasks.ceph.ceph_manager.ceph:waiting for all up
2025-03-16T17:07:47.282 INFO:tasks.ceph.ceph_manager.ceph:waiting for clean
From osd.0, never exit Waiting:
DEBUG 2025-03-16 15:33:07,960 [shard 2:main] osd - background_recovery_sub(id=1578, detail=pg_scan(digest 5.d 5:bc54ca8d:::benchmark_data_smithi059_45406_object1197:head-MAX e 220/217)): entering create_or_wait_pg
DEBUG 2025-03-16 15:33:07,960 [shard 2:main] osd - background_recovery_sub(id=1578, detail=pg_scan(digest 5.d 5:bc54ca8d:::benchmark_data_smithi059_45406_object1197:head-MAX e 220/217)): have_pg
DEBUG 2025-03-16 15:33:07,960 [shard 2:main] osd - 0x50300098ed30 RecoverySubRequest::with_pg: RecoverySubRequest::with_pg: background_recovery_sub(id=1578, detail=pg_scan(digest 5.d 5:bc54ca8d:::benchmark_data_smithi059_45406_object1197:head-MAX e 220/217))
DEBUG 2025-03-16 15:33:07,960 [shard 2:main] osd - pg_epoch 220 pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling RecoveryBackend::handle_scan:
DEBUG 2025-03-16 15:33:07,960 [shard 2:main] osd - pg_epoch 220 pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling RecoveryBackend::handle_scan_digest:
DEBUG 2025-03-16 15:33:07,961 [shard 2:main] osd - pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling BackfillState::ReplicasScanning::react::ReplicaScanned: got scan result from osd=2, result=BackfillInfo(5:bc54ca8d:::benchmark_data_smithi059_45406_object1197:head-MAX 13 objects {5:bc54ca8d:::benchmark_data_smithi059_45406_object1197:head=202'632,5:bc93b8ad:::benchmark_data_smithi059_45406_object144:head=152'96,5:bd21253a:::benchmark_data_smithi059_45406_object384:head=162'232,5:bd61efc7:::benchmark_data_smithi059_45406_object1158:head=196'616,5:bd6e5dab:::benchmark_data_smithi059_45406_object431:head=163'256,5:bda978a3:::benchmark_data_smithi059_45406_object977:head=190'544,5:be0f38db:::benchmark_data_smithi059_45406_object998:head=190'560,5:be238c71:::benchmark_data_smithi059_45406_object70:head=152'32,5:be621623:::benchmark_data_smithi059_45406_object526:head=166'296,5:bf6596f4:::benchmark_data_smithi059_45406_object927:head=190'496,5:bf6d875c:::benchmark_data_smithi059_45406_object564:head=171'304,5:bf8760b3:::benchmark_data_smithi059_45406_object1096:head=195'592,5:bfd42a22:::benchmark_data_smithi059_45406_object856:head=182'456})
DEBUG 2025-03-16 15:33:07,961 [shard 2:main] osd - pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling BackfillState::StateHelper: enter crimson::osd::BackfillState::Enqueuing
DEBUG 2025-03-16 15:33:07,962 [shard 2:main] osd - pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling BackfillState::Enqueuing::maybe_update_range: maybe_update_range(lambda): updating from version 220'701
DEBUG 2025-03-16 15:33:07,963 [shard 2:main] osd - pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling BackfillState::Enqueuing::remove_on_peers: BACKFILL removing 5:bc93b8ad:::benchmark_data_smithi059_45406_object144:head from peers {2}
DEBUG 2025-03-16 15:33:07,963 [shard 2:main] osd - pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling BackfillState::Enqueuing::remove_on_peers: BACKFILL removing 5:bd21253a:::benchmark_data_smithi059_45406_object384:head from peers {2}
DEBUG 2025-03-16 15:33:07,963 [shard 2:main] osd - pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling BackfillState::Enqueuing::update_on_peers: check=5:bd61efc7:::benchmark_data_smithi059_45406_object1158:head
DEBUG 2025-03-16 15:33:07,963 [shard 2:main] osd - pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling BackfillState::Enqueuing::remove_on_peers: BACKFILL removing 5:bd6e5dab:::benchmark_data_smithi059_45406_object431:head from peers {2}
DEBUG 2025-03-16 15:33:07,963 [shard 2:main] osd - pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling BackfillState::Enqueuing::update_on_peers: check=5:bda978a3:::benchmark_data_smithi059_45406_object977:head
DEBUG 2025-03-16 15:33:07,964 [shard 2:main] osd - pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling BackfillState::Enqueuing::update_on_peers: check=5:be0f38db:::benchmark_data_smithi059_45406_object998:head
DEBUG 2025-03-16 15:33:07,964 [shard 2:main] osd - pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling BackfillState::Enqueuing::remove_on_peers: BACKFILL removing 5:be238c71:::benchmark_data_smithi059_45406_object70:head from peers {2}
DEBUG 2025-03-16 15:33:07,964 [shard 2:main] osd - pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling BackfillState::Enqueuing::remove_on_peers: BACKFILL removing 5:be621623:::benchmark_data_smithi059_45406_object526:head from peers {2}
DEBUG 2025-03-16 15:33:07,964 [shard 2:main] osd - pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling BackfillState::Enqueuing::update_on_peers: check=5:bf6596f4:::benchmark_data_smithi059_45406_object927:head
DEBUG 2025-03-16 15:33:07,964 [shard 2:main] osd - pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling BackfillState::Enqueuing::remove_on_peers: BACKFILL removing 5:bf6d875c:::benchmark_data_smithi059_45406_object564:head from peers {2}
DEBUG 2025-03-16 15:33:07,965 [shard 2:main] osd - pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling BackfillState::Enqueuing::update_on_peers: check=5:bf8760b3:::benchmark_data_smithi059_45406_object1096:head
DEBUG 2025-03-16 15:33:07,965 [shard 2:main] osd - pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling BackfillState::Enqueuing::update_on_peers: check=5:bfd42a22:::benchmark_data_smithi059_45406_object856:head
DEBUG 2025-03-16 15:33:07,965 [shard 2:main] osd - pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling BackfillState::Enqueuing::Enqueuing: reached end for both local and all peers but still has in-flight operations
DEBUG 2025-03-16 15:33:07,965 [shard 2:main] osd - pg[5.d( v 220'701 (209'648,220'701] local-lis/les=217/218 n=27 ec=150/150 lis/c=217/150 les/c/f=218/151/0 sis=217) [0,2,3]/[0,3] backfill=[2] r=0 lpr=217 pi=[150,217)/2 pct=209'648 lua=209'648 crt=220'701 lcod 209'648 mlcod 0'0 active+undersized+remapped+backfilling BackfillState::StateHelper: enter crimson::osd::BackfillState::Waiting
DEBUG 2025-03-16 15:33:07,965 [shard 2:main] osd - 0x50300098ed30 RecoverySubRequest::with_pg: background_recovery_sub(id=1578, detail=pg_scan(digest 5.d 5:bc54ca8d:::benchmark_data_smithi059_45406_object1197:head-MAX e 220/217)): complete
DEBUG 2025-03-16 15:33:07,965 [shard 2:main] osd - background_recovery_sub(id=1578, detail=pg_scan(digest 5.d 5:bc54ca8d:::benchmark_data_smithi059_45406_object1197:head-MAX e 220/217)): exit
Updated by Matan Breizman 12 months ago
- Has duplicate Bug #70765: Single PG is not active+clean (backfill hang) added
Updated by Matan Breizman 12 months ago
- Priority changed from Normal to High
Seems like a duplicate of: https://tracker.ceph.com/issues/70765
2025-04-06T15:20:02.729 INFO:tasks.ceph.ceph_manager.ceph:PG 9.e is not active+clean
Updated by Matan Breizman 12 months ago
Matan Breizman wrote in #note-3:
Seems like a duplicate of: https://tracker.ceph.com/issues/70765
pgid 6.5 is stuck in backfilling state:
HEALTH_WARN","summary":{"message":"256 slow ops, oldest one blocked for 2460 sec, osd.2 has slow ops","count":1}
025-04-01T08:41:32.712 INFO:teuthology.orchestra.run.gibba029.stdout:{"fsid":"531c7267-5cab-4a52-b3c4-42ccffe77b46","health":{"status":"HEALTH_WARN","checks":{"PG_DEGRADED":{"severity":"HEALTH_WARN","summary":{"message":"Degraded data redundancy: 15/933 objects degraded (1.608%), 1 pg degraded, 1 pg undersized","count":2},"muted":false},"SLOW_OPS":{"severity":"HEALTH_WARN","summary":{"message":"256 slow ops, oldest one blocked for 2460 sec, osd.2 has slow ops","count":1},"muted":false}},"mutes":[]},"election_epoch":3,"quorum":[0],"quorum_names":["a"],"quorum_age":2870,"monmap":{"epoch":1,"min_mon_release_name":"tentacle","num_mons":1},"osdmap":{"epoch":140,"num_osds":4,"num_up_osds":4,"osd_up_since":1743494475,"num_in_osds":4,"osd_in_since":1743494024,"num_remapped_pgs":1},"pgmap":{"pgs_by_state":[{"state_name":"active+clean","count":24},{"state_name":"active+undersized+degraded+remapped+backfilling","count":1}],"num_pgs":25,"num_pools":3,"num_objects":311,"data_bytes":20685363,"bytes_used":157880320,"bytes_avail":386389176320,"bytes_total":386547056640,"degraded_objects":15,"degraded_total":933,"degraded_ratio":0.01607717041800643},"fsmap":{"epoch":1,"btime":"2025-04-01T07:53:42:173127+0000","by_rank":[],"up:standby":0},"mgrmap":{"available":true,"num_standbys":0,"modules":["iostat","nfs"],"services":{}},"servicemap":{"epoch":5,"modified":"2025-04-01T08:39:48.449259+0000","services":{}},"progress_events":{}}
2025-04-01T08:41:33.267 INFO:tasks.ceph.ceph_manager.ceph:PG 6.5 is not active+clean
2025-04-01T08:41:33.267 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '6.5', 'version': "119'117", 'reported_seq': 185, 'reported_epoch': 140, 'state': 'active+undersized+degraded+remapped+backfilling', 'last_fresh': '2025-04-01T08:01:27.248878+0000', 'last_change': '2025-04-01T08:00:34.593337+0000', 'last_active': '2025-04-01T08:01:27.248878+0000', 'last_peered': '2025-04-01T08:01:27.248878+0000', 'last_clean': '2025-04-01T08:00:25.231745+0000', 'last_became_active': '2025-04-01T08:00:32.305057+0000', 'last_became_peered': '2025-04-01T08:00:32.305057+0000', 'last_unstale': '2025-04-01T08:01:27.248878+0000', 'last_undegraded': '2025-04-01T08:00:32.834211+0000', 'last_fullsized': '2025-04-01T08:00:32.236978+0000', 'mapping_epoch': 121, 'log_start': "119'73", 'ondisk_log_start': "119'73", 'created': 116, 'last_epoch_clean': 0, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "0'0", 'last_scrub_stamp': '2025-04-01T08:00:25.231745+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2025-04-01T08:00:25.231745+0000', 'last_clean_scrub_stamp': '2025-04-01T08:00:25.231745+0000', 'objects_scrubbed': 0, 'log_size': 44, 'log_dups_size': 73, 'ondisk_log_size': 44, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'last_scrub_duration': 0, 'scrub_schedule': '--', 'scrub_duration': 0, 'objects_trimmed': 0, 'snaptrim_duration': 0, 'stat_sum': {'num_bytes': 958464, 'num_objects': 15, 'num_object_clones': 0, 'num_object_copies': 45, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 15, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 0, 'num_whiteouts': 0, 'num_read': 0, 'num_read_kb': 0, 'num_write': 117, 'num_write_kb': 936, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 3, 'num_bytes_recovered': 131072, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [2, 3, 0], 'acting': [2, 3], 'avail_no_missing': ['2', '3'], 'object_location_counts': [{'shards': '2,3', 'objects': 15}], 'blocked_by': [], 'up_primary': 2, 'acting_primary': 2, 'purged_snaps': []}
2025-04-01T08:41:33.268 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_285a99e52975a4e4219727322052d6b5714e03b7/qa/tasks/ceph_manager.py", line 192, in wrapper
return func(self)
File "/home/teuthworker/src/github.com_ceph_ceph-c_285a99e52975a4e4219727322052d6b5714e03b7/qa/tasks/ceph_manager.py", line 1488, in _do_thrash
self.ceph_manager.wait_for_recovery(
File "/home/teuthworker/src/github.com_ceph_ceph-c_285a99e52975a4e4219727322052d6b5714e03b7/qa/tasks/ceph_manager.py", line 3015, in wait_for_recovery
assert now - start < timeout, \
AssertionError: wait_for_recovery: failed before timeout expired
Updated by Matan Breizman 12 months ago
- Subject changed from Backfill: backfill stuck in waiting to Backfill: Single PG is stuck in waiting
Updated by Matan Breizman 12 months ago
- Subject changed from Backfill: Single PG is stuck in waiting to radosbench-high-concurrency: [Backfill] Single PG is stuck in waiting
- Description updated (diff)
Updated by MOHIT AGRAWAL 11 months ago
- Status changed from New to Fix Under Review
- Pull request ID set to 62760
Updated by MOHIT AGRAWAL 11 months ago
During mclock scheduler testing i am able to reproduce an issue.
Below are the steps to rerproduce the same
1) Setup a crimson cluster on latest main branch
MGR=1 MON=1 OSD=4 MDS=0 ../src/vstart.sh -n --bluestore --crimson --crimson-smp 4 --without-dashboard --debug
2) Set the below configuration
./bin/ceph config set osd crimson_osd_scheduler_concurrency 5
./bin/ceph config set osd osd_min_pg_log_entries 1
./bin/ceph config set osd osd_max_pg_log_entries 2
./bin/ceph config set osd osd_pg_log_trim_min 0
3) Create a small pool
./bin/ceph osd pool create test 128 128 replicated
4) stop osd.0
./bin/ceph osd out 0
./bin/ceph osd stop 0
5) Populate some data
./bin/rados -c /nvme0/ceph/build/ceph.conf -p test bench -b 4096 300 write --concurrent-ios 8 --no-cleanup
6) mark osd.0 to in
./bin/ceph osd in 0
7) Killed previous running osd.0
8) Start a new osd.0
/bin/sh /nvme0/ceph/src/ceph-run /nvme0/ceph/build/bin/crimson-osd --debug -i 0 -c /nvme0/ceph/build/ceph.conf -f
9) Monitor backfill progress via ./bin/ceph -s
Updated by Matan Breizman 11 months ago
- Has duplicate Bug #71003: crimson: backfill stuck in enquueing added
Updated by Matan Breizman 11 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Upkeep Bot 11 months ago
- Copied to Backport #71145: tentacle: radosbench-high-concurrency: [Backfill] Single PG is stuck in waiting added
Updated by MOHIT AGRAWAL 10 months ago
- Status changed from Pending Backport to Closed
Actions