Bug #56149
openthrash-erasure-code: AssertionError: wait_for_recovery timeout due to "active+recovering+undersized" pg
0%
Description
Description:
rados/thrash-erasure-code/{ceph clusters/{fixed-2 openstack} fast/normal mon_election/connectivity msgr-failures/few objectstore/filestore-xfs rados recovery-overrides/{more-async-recovery} supported-random-distro$/{centos_8} thrashers/fastread thrashosds-health workloads/ec-radosbench}
/a/yuriw-2022-06-17_13:58:31-rados-wip-yuri7-testing-2022-06-16-1051-pacific-distro-default-smithi/6884394
2022-06-17T15:36:17.287 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '5.13', 'version': "880'5475", 'reported_seq': 1214, 'reported_epoch': 1855, 'state': 'active+recovery_wait+undersized+degraded+remapped', 'last_fresh': '2022-06-17T15:36:11.235599+0000', 'last_change': '2022-06-17T14:53:18.577038+0000', 'last_active': '2022-06-17T15:36:11.235599+0000', 'last_peered': '2022-06-17T15:36:11.235599+0000', 'last_clean': '2022-06-17T14:49:28.443280+0000', 'last_became_active': '2022-06-17T14:53:17.177381+0000', 'last_became_peered': '2022-06-17T14:53:17.177381+0000', 'last_unstale': '2022-06-17T15:36:11.235599+0000', 'last_undegraded': '2022-06-17T14:53:17.116488+0000', 'last_fullsized': '2022-06-17T14:53:16.633990+0000', 'mapping_epoch': 832, 'log_start': "0'0", 'ondisk_log_start': "0'0", 'created': 811, 'last_epoch_clean': 705, 'parent': '0.0', 'parent_split_bits': 5, 'last_scrub': "0'0", 'last_scrub_stamp': '2022-06-17T14:46:48.316282+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2022-06-17T14:46:48.316282+0000', 'last_clean_scrub_stamp': '2022-06-17T14:46:48.316282+0000', 'log_size': 5475, 'ondisk_log_size': 5475, 'stats_invalid': True, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'stat_sum': {'num_bytes': 111214592, 'num_objects': 1697, 'num_object_clones': 0, 'num_object_copies': 5091, 'num_objects_missing_on_primary': 2665, 'num_objects_missing': 2665, 'num_objects_degraded': 7995, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 1697, 'num_whiteouts': 0, 'num_read': 0, 'num_read_kb': 0, 'num_write': 1849, 'num_write_kb': 113451, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 1286, 'num_bytes_recovered': 84257451, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [3, 0, 1], 'acting': [2147483647, 0, 1], 'avail_no_missing': ['0(0)', '3(1)', '6(2)'], 'object_location_counts': [{'shards': '0(0),3(1),5(2),6(2)', 'objects': 1748}, {'shards': '0(0),3(1),6(2)', 'objects': 868}, {'shards': '0(1),1(2)', 'objects': -919}], 'blocked_by': [], 'up_primary': 0, 'acting_primary': 0, 'purged_snaps': []}
2022-06-17T15:36:17.287 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_583dad03a4c486407143b2fa31042148953bda62/qa/tasks/ceph_manager.py", line 188, in wrapper
return func(self)
File "/home/teuthworker/src/github.com_ceph_ceph-c_583dad03a4c486407143b2fa31042148953bda62/qa/tasks/ceph_manager.py", line 1404, in _do_thrash
timeout=self.config.get('timeout')
File "/home/teuthworker/src/github.com_ceph_ceph-c_583dad03a4c486407143b2fa31042148953bda62/qa/tasks/ceph_manager.py", line 2809, in wait_for_recovery
'wait_for_recovery: failed before timeout expired'
AssertionError: wait_for_recovery: failed before timeout expired
Last pg map before Traceback:
{
"pgs_by_state": [
{
"state_name": "active+clean",
"count": 47
},
{
"state_name": "active+recovery_wait+undersized+degraded+remapped",
"count": 4
},
{
"state_name": "active+recovering+undersized+degraded+remapped",
"count": 2
},
{
"state_name": "active+recovering+undersized",
"count": 1
},
{
"state_name": "active+recovery_wait+undersized+degraded",
"count": 1
}
],
"num_pgs": 55,
"num_pools": 3,
"num_objects": 80600,
"data_bytes": 5282005016,
"bytes_used": 15347728384,
"bytes_avail": 656289427456,
"bytes_total": 671637155840,
"degraded_objects": 21240,
"degraded_total": 241798,
"degraded_ratio": 0.08784191763372733,
"misplaced_objects": 9003,
"misplaced_total": 241798,
"misplaced_ratio": 0.037233558590228207
}
Updated by Laura Flores almost 4 years ago
- Related to Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC backend. added
Updated by Laura Flores almost 4 years ago
- Subject changed from thrash-erasure-code to thrash-erasure-code: AssertionError: wait_for_recovery: failed before timeout expired
Updated by Laura Flores over 3 years ago
- Priority changed from Normal to Low
According to some tests I ran on main and pacific, this kind of failure does not happen very frequently:
Main:
http://pulpito.front.sepia.ceph.com/lflores-2022-06-21_17:43:27-rados:thrash-erasure-code-main-distro-default-smithi/ - 100/100 tests passed
Pacific:
http://pulpito.front.sepia.ceph.com/lflores-2022-06-21_17:44:35-rados:thrash-erasure-code-pacific-distro-default-smithi/ - 100/100 tests passed
This failure is definitely annoying and something to look into, but the low frequency of it means that we can lower the priority a bit.
Updated by Matan Breizman over 3 years ago
/a/yuriw-2022-09-14_13:16:11-rados-wip-yuri6-testing-2022-09-13-1352-distro-default-smithi/7032356
Updated by Laura Flores over 3 years ago
Matan Breizman wrote:
/a/yuriw-2022-09-14_13:16:11-rados-wip-yuri6-testing-2022-09-13-1352-distro-default-smithi/7032356
This one is https://tracker.ceph.com/issues/57546:
active+clean+remapped+laggy
Updated by Laura Flores over 3 years ago
- Subject changed from thrash-erasure-code: AssertionError: wait_for_recovery: failed before timeout expired to thrash-erasure-code: AssertionError: wait_for_recovery timeout due to "active+recovering+undersized" pg
Updated by Laura Flores over 3 years ago
- Copied to Bug #57599: thrash-erasure-code: AssertionError: wait_for_recovery timeout due to "recovering+forced_recovery+undersized+remapped+peered" pg added
Updated by Laura Flores over 3 years ago
- Copied to Bug #57600: thrash-erasure-code: AssertionError: wait_for_recovery timeout due to "active+recovery_unfound+degraded+remapped" pg added
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-08-09_02:35:19-rados-squid-release-distro-default-smithi/7845280