Bug #68060
open_deferred_replay can overwrite BlueFS
0%
Description
_deferred_replay avoids BlueFS locations.
But the set of BlueFS allocations is captured before procedure starts.
During the procedure keys are removed from DB, which might trigger allocation.
These newly allocated spaces are not protected and can be written over.
Fixed algorithm:
1. open BlueFS read-only
2. write blocks
3. open BlueFS read-write
4. delete keys
Updated by Igor Fedotov over 1 year ago
- Priority changed from Normal to High
- Severity changed from 3 - minor to 2 - major
Updated by Adam Kupczyk over 1 year ago
Replicator unittest: https://github.com/ceph/ceph/pull/60732
Updated by Adam Kupczyk over 1 year ago
There are two flavors of corruption possible.
1.
In _deferred_replay, it might happen that after
bool has_some = _eliminate_outdated_deferred(deferred_txn, bluefs_extents);
has_some is false, meaning that no IO should be executed.
In this case we skip sending deferred_txn through BlueStore state machine.
By that we skip the last step of removing according L entry from RocksDB.
Remaining deferred is lurking in DB until such restart that something of it can be applied.
That new target could be:
- BlueFS (ok - we skip this)
- empty space (ok - its empty)
- Object data (bad - corruption)
2.
The original reported case when BlueFS is compacting in the background.
Our deferred made a snapshot of BlueFS allocations at some point,
but it does not track the changes.
The target of deferred could be:
- BlueFS as originally allocated (ok - we skip this)
- empty space (ok - its empty)
- Object data (ok - its what we are supposed to do)
- BlueFS newly allocated (bad! corrupting the files)
Updated by Adam Kupczyk over 1 year ago
- Status changed from New to Fix Under Review
Updated by Igor Fedotov 11 months ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to squid, reef
Updated by Upkeep Bot 9 months ago
- Merge Commit set to 9ef9c124511eeafb09eb4cbdfea083dc00cae106
- Fixed In set to v20.0.0-232-g9ef9c124511
- Upkeep Timestamp set to 2025-07-08T18:45:00+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v20.0.0-232-g9ef9c124511 to v20.0.0-232-g9ef9c124511e
- Upkeep Timestamp changed from 2025-07-08T18:45:00+00:00 to 2025-07-14T15:45:20+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v20.0.0-232-g9ef9c124511e to v20.0.0-232-g9ef9c12451
- Upkeep Timestamp changed from 2025-07-14T15:45:20+00:00 to 2025-07-14T21:09:48+00:00
Updated by Igor Fedotov 8 months ago
- Status changed from Pending Backport to New
Reverted by https://github.com/ceph/ceph/pull/62214
Updated by Upkeep Bot 7 months ago
- Status changed from New to Pending Backport
- Upkeep Timestamp changed from 2025-07-14T21:09:48+00:00 to 2025-08-13T14:01:32+00:00
Updated by Patrick Donnelly 7 months ago ยท Edited
Igor Fedotov wrote in #note-13:
Reverted by https://github.com/ceph/ceph/pull/62214
Should the backports be Rejected then? The bot is confused.
Updated by Igor Fedotov 7 months ago
- Status changed from Pending Backport to New
- Tags (freeform) deleted (
backport_processed)
Updated by Igor Fedotov 7 months ago
Patrick Donnelly wrote in #note-15:
Igor Fedotov wrote in #note-13:
Reverted by https://github.com/ceph/ceph/pull/62214
Should the backports be Rejected then? The bot is confused.
Done, thanks for pointing.
Updated by Upkeep Bot 7 months ago
- Status changed from New to Pending Backport
- Upkeep Timestamp changed from 2025-08-13T14:01:32+00:00 to 2025-08-21T11:39:21+00:00
Updated by Upkeep Bot 7 months ago
- Status changed from Pending Backport to Resolved
- Upkeep Timestamp changed from 2025-08-21T11:39:21+00:00 to 2025-08-25T19:43:40+00:00
Updated by Upkeep Bot 7 months ago
- Status changed from New to Pending Backport
- Upkeep Timestamp changed from 2025-08-25T19:43:40+00:00 to 2025-08-29T00:56:41+00:00
Updated by Upkeep Bot 7 months ago
- Status changed from Pending Backport to Resolved
- Upkeep Timestamp changed from 2025-08-29T00:56:41+00:00 to 2025-08-29T01:36:33+00:00
Updated by Patrick Donnelly 7 months ago
- Status changed from Resolved to New
- Pull request ID deleted (
60753) - Tags (freeform) deleted (
backport_processed) - Merge Commit deleted (
9ef9c124511eeafb09eb4cbdfea083dc00cae106) - Fixed In deleted (
v20.0.0-232-g9ef9c12451) - Upkeep Timestamp deleted (
2025-08-29T01:36:33+00:00)
Igor, in this situation just remove the PR number.
Updated by Igor Fedotov 7 months ago
Patrick Donnelly wrote in #note-24:
Igor, in this situation just remove the PR number.
ah, ok. Good to know.
Updated by Yuma Ogami 7 months ago
I have two questions about this issue.
Q1: The following my understanding correct?
- If object data is corrupted: If there are other healthy replicas, the corruption will be detected and fixed on scrub or read.
- If BlueFS is corrupted: The OSD would crash during operation or fail to start at next BlueFS's open process.
Q2: And is there a new fix being developed after the following PR was reverted?
https://github.com/ceph/ceph/pull/60753
Updated by Stefan Kooman 6 months ago
I'm also interested in how this will be fixed. This issue got a severity of major. Since it involves data corruption I would expect a high priority. Is the likelihood of hitting this issue really low?
Updated by Adam Kupczyk 4 months ago
See also: