Bug #72945
closedData digests are inconsistent during scrubbing
0%
Description
/a/teuthology-2025-09-07_20:00:25-rados-main-distro-default-smithi/8485449
From the mon log:
2025-09-07T20:53:55.158+0000 7f6644c59640 7 mon.a@0(leader).log v978 update_from_paxos applying incremental log 978 2025-09-07T20:46:41.837545+0000 osd.3 (osd.3) 337 : cluster [DBG] 3.b0 scrub starts 2025-09-07T20:53:55.158+0000 7f6644c59640 7 mon.a@0(leader).log v978 update_from_paxos applying incremental log 978 2025-09-07T20:46:41.839526+0000 osd.3 (osd.3) 338 : cluster [DBG] 3.b0 scrub ok 2025-09-07T20:53:55.158+0000 7f6644c59640 7 mon.a@0(leader).log v978 update_from_paxos applying incremental log 978 2025-09-07T20:46:43.825545+0000 osd.3 (osd.3) 339 : cluster [DBG] 3.b3 scrub starts 2025-09-07T20:53:55.158+0000 7f6644c59640 7 mon.a@0(leader).log v978 update_from_paxos applying incremental log 978 2025-09-07T20:46:43.829306+0000 osd.3 (osd.3) 340 : cluster [DBG] 3.b3 scrub ok 2025-09-07T20:53:55.158+0000 7f6644c59640 7 mon.a@0(leader).log v978 update_from_paxos applying incremental log 978 2025-09-07T20:46:44.836992+0000 osd.3 (osd.3) 341 : cluster [DBG] 3.48 scrub starts 2025-09-07T20:53:55.158+0000 7f6644c59640 7 mon.a@0(leader).log v978 update_from_paxos applying incremental log 978 2025-09-07T20:46:44.847499+0000 osd.3 (osd.3) 342 : cluster [ERR] 3.48s0 3:12813970:::smithi14520280-46:4adata digests are inconsistent
In checking between a good commit and a bad commit, this set came up:
$ git log --pretty=oneline --no-merges 346846543c6bfc93a360476c739580cd2344fec0..f96567578976c2c84a31d366a04c28fb95ceb0d9 src/osd aaa198692734666459ef4110c0ebf26b8499707f osd/scrub: clear m_ec_digest_map between objects 6b85e4d453f829c69f6441007bc3a6893b6b3d99 osd/scrub: reinstate one-warning-per-chunk behaviour 5e59c521f8dcd3a5d86ee5cf1f0576a7be6c274e osd/scrub: modify OMAP stats collection 547d13f7f88652e5a96f2a432f5c53358cd07cf3 osd/scrub: avoid using moved-from auth_n_errs 100c20b7d6588295f539208a2812ba7fd3fb5222 osd/scrub: fix heap-buffer-overflow when checking digest emptiness b6f50d5f89b66188d3fafcf58a535dc43aecae9c osd: add missing includes
I suspect it's coming from one of these.
Updated by Ronen Friedman 6 months ago
(update: currently being investigated by Jonathan Bailey)
Updated by Shraddha Agrawal 6 months ago
/a/skanta-2025-09-11_16:30:11-rados-wip-bharath7-testing-2025-09-11-1359-distro-default-smithi/8494747
Updated by Laura Flores 6 months ago
- Assignee changed from Ronen Friedman to Jonathan Bailey
Updated by Radoslaw Zarzynski 6 months ago
@Jonathan Bailey: would you mind taking a look and judge whether it's EC related?
Updated by Laura Flores 6 months ago
/a/yuriw-2025-09-12_19:42:42-rados-wip-yuri3-testing-2025-09-12-0906-distro-default-smithi/8496787
Several more on this run.
Updated by Jonathan Bailey 6 months ago · Edited
@Radoslaw Zarzynski I am already investigating. This is isolated to runs which have EC Optimizations turned on from what I have seen. Trying to put together a fix and am adding in more logging to get the cause of failure.
Updated by Jonathan Bailey 6 months ago
To expand further, it appears this bug only appears when using profiles that are using the ISA plugin and have erasure coding optimizations enabled.
This should be isolated to main as the code for ec checking during scrubbing is not part of the code going into tentacle.
Updated by Jonathan Bailey 6 months ago
- We were comparing crc buffers beyond the end of the crcs
- There was a double call to logical_to_ondisk_size when creating the crcs for zero buffers, causing them to be mis-sized
- The code was not padding smaller shards as its a requirement for shards to be the same sized when used for parity comparison.
I'm currently running the code through some testing to make sure these are all the issues and will do some tidying up of my currently very messy and verbose code before I create a PR to check this in and fix the issue.
Updated by Jonathan Bailey 6 months ago
Created a PR with a fix here: https://github.com/ceph/ceph/pull/65623
Updated by Laura Flores 6 months ago
- Status changed from New to Fix Under Review
- Pull request ID set to 65623
Updated by Radoslaw Zarzynski 6 months ago
The EC scrubbing optimizations aren't in Tentacle, so likely we don't need to backport the fix.
Updated by Jonathan Bailey 6 months ago
Agreed. Just to further expand, the proposed PR to fix this only changes parts of the code that are in main and nothing that is in Tentacle.
Updated by Laura Flores 6 months ago · Edited
/a/skanta-2025-09-07_23:32:26-rados-wip-bharath2-testing-2025-09-07-1916-distro-default-smithi/8486564
Updated by Aishwarya Mathuria 5 months ago
/a/skanta-2025-10-07_22:45:50-rados-wip-bharath1-testing-2025-10-06-2038-distro-default-smithi/8540411
8540414, 8540415, 8540416, 8540422, 8540423, 8540426, 8540428, 8540430
Updated by Kamoltat (Junior) Sirivadhna 5 months ago
/a/skanta-2025-09-08_23:33:07-rados-wip-bharath2-testing-2025-09-07-1916-distro-default-smithi/
[8488706, 8488712, 8488718, 8488719, 8488723, 8488727, 8488729]
Updated by Kamoltat (Junior) Sirivadhna 5 months ago
suite watcher: this tracker is in progress, currently being tested in teuthology
Updated by Jaya Prakash 5 months ago
Updates from Rados Watcher :
yuriw-2025-10-22_23:56:36-rados-wip-yuri5-testing-2025-10-22-1314-distro-default-smithi
10 jobs: ['8566156', '8566144', '8566000', '8565920', '8566079', '8565992', '8566056', '8565995', '8566045', '8566089']
teuthology-2025-10-26_20:00:25-rados-main-distro-default-smithi
11 jobs: ['8569767', '8569680', '8569595', '8569620', '8569558', '8569590', '8569605', '8569711', '8569662', '8569763', '8569512']
Updated by Aishwarya Mathuria 5 months ago
/a/skanta-2025-11-01_01:03:27-rados-wip-bharath1-testing-2025-10-31-0445-distro-default-smithi/
10 jobs: ['8578565', '8578556', '8578574', '8578566', '8578580', '8578579', '8578567', '8578573', '8578564', '8578576']
Updated by Radoslaw Zarzynski 5 months ago
The PR needs a rebase and maybe some rework.
Updated by Jonathan Bailey 5 months ago
I am looking into failures currently and will update the PR with fix once I have done so
Updated by Kamoltat (Junior) Sirivadhna 4 months ago
RADOS bug scrub: bump (waiting for rebasing)
Updated by Aishwarya Mathuria 4 months ago
RADOS main watcher update:
Quite a few failures in:
/a/teuthology-2025-11-09_20:00:24-rados-main-distro-default-smithi
/a/teuthology-2025-11-16_20:00:21-rados-main-distro-default-smithi
The PR has been re-based, added needs-qa label again
Updated by Radoslaw Zarzynski 4 months ago
scrub note: ACK, waiting for QA to pick it up!
Updated by Sridhar Seshasayee 4 months ago
/a/skanta-2025-11-13_10:26:04-rados-wip-bharath3-testing-2025-11-12-2038-distro-default-smithi/
[8601365, 8601366, 8601367, 8601371, 8601372, 8601381, 8601387]
Updated by Laura Flores 4 months ago
/a/lflores-2025-11-19_18:47:12-rados-wip-lflores-testing-2-2025-11-19-1711-distro-default-smithi
['8613505', '8613391', '8613310', '8613307', '8613316', '8613454', '8613357', '8613468', '8613367']
Updated by Kamoltat (Junior) Sirivadhna 4 months ago
/a/skanta-2025-11-01_02:37:10-rados-wip-bharath4-testing-2025-10-31-1459-distro-default-smithi/
[8578618, 8578627, 8578628, 8578637, 8578638, 8578641, 8578645, 8578646]
Updated by Laura Flores 4 months ago
QA ticket in progress here: https://tracker.ceph.com/issues/73898
Updated by Radoslaw Zarzynski 4 months ago
scrub note: QA results under analysis, should be ready soon.
Updated by Laura Flores 4 months ago
/a/lflores-2025-12-02_17:29:40-rados-wip-lflores-testing-4-2025-12-01-1527-distro-default-smithi/8636005
Updated by Aishwarya Mathuria 4 months ago
/a/yuriw-2025-12-03_15:44:36-rados-wip-yuri5-testing-2025-12-02-1256-distro-default-smithi/8639534
Updated by Laura Flores 3 months ago
Note from bug scrub: In second round of testing.
Updated by Naveen Naidu 3 months ago
/a/skanta-2025-11-21_10:17:34-rados-wip-bharath11-testing-2025-11-21-0531-distro-default-smithi
7 jobs: ['8617903', '8617816', '8617840', '8617756', '8617917', '8617806', '8617958']
Updated by Sridhar Seshasayee 3 months ago
/a/skanta-2025-12-03_02:50:04-rados-wip-bharath5-testing-2025-12-02-1511-distro-default-smithi
7 Jobs
[8638385, 8638391, 8638392, 8638397, 8638401,8638404, 8638410]
Updated by Radoslaw Zarzynski 3 months ago
scrub note: an unrelated failure delays the merging after-the-lab-migration (see https://tracker.ceph.com/issues/73898).
Updated by Laura Flores 2 months ago · Edited
Scrub note: QA in progress (delays due to lab migration)
Updated by Sridhar Seshasayee about 2 months ago
/a/skanta-2026-01-27_05:35:03-rados-wip-bharath1-testing-2026-01-26-1242-distro-default-trial/
7 jobs: ['19749', '19748', '19759', '19766', '19781', '19752', '19757']
Updated by Aishwarya Mathuria about 2 months ago
/a/skanta-2026-01-30_23:46:16-rados-wip-bharath7-testing-2026-01-29-2016-distro-default-trial
['28583', '28567', '28560', '28592', '28561', '28573', '28568', '28586', '28564', '28571']
Updated by Connor Fawcett about 1 month ago
/a/skanta-2026-01-27_07:02:07-rados-wip-bharath3-testing-2026-01-26-1323-distro-default-trial/19851
/a/skanta-2026-01-27_07:02:07-rados-wip-bharath3-testing-2026-01-26-1323-distro-default-trial/19860
/a/skanta-2026-01-27_07:02:07-rados-wip-bharath3-testing-2026-01-26-1323-distro-default-trial/19887
/a/skanta-2026-01-27_07:02:07-rados-wip-bharath3-testing-2026-01-26-1323-distro-default-trial/19865
/a/skanta-2026-01-27_07:02:07-rados-wip-bharath3-testing-2026-01-26-1323-distro-default-trial/19850
/a/skanta-2026-01-27_07:02:07-rados-wip-bharath3-testing-2026-01-26-1323-distro-default-trial/19879
/a/skanta-2026-01-27_07:02:07-rados-wip-bharath3-testing-2026-01-26-1323-distro-default-trial/19875
/a/skanta-2026-01-27_07:02:07-rados-wip-bharath3-testing-2026-01-26-1323-distro-default-trial/19859
/a/skanta-2026-01-27_07:02:07-rados-wip-bharath3-testing-2026-01-26-1323-distro-default-trial/19853
Updated by Upkeep Bot about 1 month ago
- Status changed from Fix Under Review to Resolved
- Merge Commit set to dfa42f16005f96c47ba21da048edf9c5294b3871
- Fixed In set to v20.3.0-5156-gdfa42f1600
- Upkeep Timestamp set to 2026-02-05T22:49:06+00:00
Updated by Lee Sanders about 1 month ago
/a/skanta-2026-01-29_02:19:11-rados-wip-bharath5-testing-2026-01-28-2018-distro-default-trial/
['24696', '24636', '24645', '24720', '24834', '24686', '24797', '24639' ]
Updated by Lee Sanders about 1 month ago
/a/skanta-2026-01-29_13:05:02-rados-wip-bharath5-testing-2026-01-28-2018-distro-default-trial/
['25719', '25730', '25732', '25712', '25705', '25704', '25707', '25711', '25739']
Updated by Jaya Prakash about 1 month ago
8 jobs: ['38087', '38172', '38056', '38157', '38084', '38101', '38045', '38174']
jayaprakash-2026-02-06_12:54:34-rados-jaya-bs-testing-05-02-2026-distro-default-trial
Updated by Aishwarya Mathuria about 1 month ago
/a/skanta-2026-02-05_03:38:32-rados-wip-bharath2-testing-2026-02-03-0542-distro-default-trial
['35643', '35655', '35651', '35644', '35674', '35668', '35646', '35650']
Updated by Naveen Naidu about 1 month ago
/a/skanta-2026-01-26_08:54:40-rados-wip-bharath4-testing-2026-01-26-1300-distro-default-trial/
9 jobs: ['17847', '17759', '17686', '17736', '17695', '17770', '17884', '17689', '17746']