Project

General

Profile

Actions

Bug #68060

open

_deferred_replay can overwrite BlueFS

Added by Adam Kupczyk over 1 year ago. Updated 4 months ago.

Status:
New
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Backport:
squid, reef
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

_deferred_replay avoids BlueFS locations.
But the set of BlueFS allocations is captured before procedure starts.
During the procedure keys are removed from DB, which might trigger allocation.
These newly allocated spaces are not protected and can be written over.

Fixed algorithm:
1. open BlueFS read-only
2. write blocks
3. open BlueFS read-write
4. delete keys

Actions #1

Updated by Igor Fedotov over 1 year ago

  • Priority changed from Normal to High
  • Severity changed from 3 - minor to 2 - major
Actions #2

Updated by Adam Kupczyk over 1 year ago

Actions #3

Updated by Adam Kupczyk over 1 year ago

There are two flavors of corruption possible.

1.
In _deferred_replay, it might happen that after

bool has_some = _eliminate_outdated_deferred(deferred_txn, bluefs_extents);

has_some is false, meaning that no IO should be executed.
In this case we skip sending deferred_txn through BlueStore state machine.
By that we skip the last step of removing according L entry from RocksDB.
Remaining deferred is lurking in DB until such restart that something of it can be applied.
That new target could be:
- BlueFS (ok - we skip this)
- empty space (ok - its empty)
- Object data (bad - corruption)

2.
The original reported case when BlueFS is compacting in the background.
Our deferred made a snapshot of BlueFS allocations at some point,
but it does not track the changes.
The target of deferred could be:
- BlueFS as originally allocated (ok - we skip this)
- empty space (ok - its empty)
- Object data (ok - its what we are supposed to do)
- BlueFS newly allocated (bad! corrupting the files)

Actions #4

Updated by Adam Kupczyk over 1 year ago

  • Pull request ID set to 60753
Actions #5

Updated by Adam Kupczyk over 1 year ago

  • Status changed from New to Fix Under Review
Actions #6

Updated by Igor Fedotov 11 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to squid, reef
Actions #9

Updated by Upkeep Bot 11 months ago

  • Tags (freeform) set to backport_processed
Actions #10

Updated by Upkeep Bot 9 months ago

  • Merge Commit set to 9ef9c124511eeafb09eb4cbdfea083dc00cae106
  • Fixed In set to v20.0.0-232-g9ef9c124511
  • Upkeep Timestamp set to 2025-07-08T18:45:00+00:00
Actions #11

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v20.0.0-232-g9ef9c124511 to v20.0.0-232-g9ef9c124511e
  • Upkeep Timestamp changed from 2025-07-08T18:45:00+00:00 to 2025-07-14T15:45:20+00:00
Actions #12

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v20.0.0-232-g9ef9c124511e to v20.0.0-232-g9ef9c12451
  • Upkeep Timestamp changed from 2025-07-14T15:45:20+00:00 to 2025-07-14T21:09:48+00:00
Actions #13

Updated by Igor Fedotov 8 months ago

  • Status changed from Pending Backport to New
Actions #14

Updated by Upkeep Bot 7 months ago

  • Status changed from New to Pending Backport
  • Upkeep Timestamp changed from 2025-07-14T21:09:48+00:00 to 2025-08-13T14:01:32+00:00
Actions #15

Updated by Patrick Donnelly 7 months ago ยท Edited

Igor Fedotov wrote in #note-13:

Reverted by https://github.com/ceph/ceph/pull/62214

Should the backports be Rejected then? The bot is confused.

Actions #16

Updated by Igor Fedotov 7 months ago

  • Status changed from Pending Backport to New
  • Tags (freeform) deleted (backport_processed)
Actions #17

Updated by Igor Fedotov 7 months ago

Patrick Donnelly wrote in #note-15:

Igor Fedotov wrote in #note-13:

Reverted by https://github.com/ceph/ceph/pull/62214

Should the backports be Rejected then? The bot is confused.

Done, thanks for pointing.

Actions #18

Updated by Upkeep Bot 7 months ago

  • Status changed from New to Pending Backport
  • Upkeep Timestamp changed from 2025-08-13T14:01:32+00:00 to 2025-08-21T11:39:21+00:00
Actions #19

Updated by Upkeep Bot 7 months ago

  • Tags (freeform) set to backport_processed
Actions #20

Updated by Upkeep Bot 7 months ago

  • Status changed from Pending Backport to Resolved
  • Upkeep Timestamp changed from 2025-08-21T11:39:21+00:00 to 2025-08-25T19:43:40+00:00
Actions #21

Updated by Igor Fedotov 7 months ago

  • Status changed from Resolved to New
Actions #22

Updated by Upkeep Bot 7 months ago

  • Status changed from New to Pending Backport
  • Upkeep Timestamp changed from 2025-08-25T19:43:40+00:00 to 2025-08-29T00:56:41+00:00
Actions #23

Updated by Upkeep Bot 7 months ago

  • Status changed from Pending Backport to Resolved
  • Upkeep Timestamp changed from 2025-08-29T00:56:41+00:00 to 2025-08-29T01:36:33+00:00
Actions #24

Updated by Patrick Donnelly 7 months ago

  • Status changed from Resolved to New
  • Pull request ID deleted (60753)
  • Tags (freeform) deleted (backport_processed)
  • Merge Commit deleted (9ef9c124511eeafb09eb4cbdfea083dc00cae106)
  • Fixed In deleted (v20.0.0-232-g9ef9c12451)
  • Upkeep Timestamp deleted (2025-08-29T01:36:33+00:00)

Igor, in this situation just remove the PR number.

Actions #25

Updated by Igor Fedotov 7 months ago

Patrick Donnelly wrote in #note-24:

Igor, in this situation just remove the PR number.

ah, ok. Good to know.

Actions #26

Updated by Yuma Ogami 7 months ago

I have two questions about this issue.

Q1: The following my understanding correct?

- If object data is corrupted: If there are other healthy replicas, the corruption will be detected and fixed on scrub or read.
- If BlueFS is corrupted: The OSD would crash during operation or fail to start at next BlueFS's open process.

Q2: And is there a new fix being developed after the following PR was reverted?
https://github.com/ceph/ceph/pull/60753

Actions #27

Updated by Yuma Ogami 6 months ago

Any updates on this?

Actions #28

Updated by Stefan Kooman 6 months ago

I'm also interested in how this will be fixed. This issue got a severity of major. Since it involves data corruption I would expect a high priority. Is the likelihood of hitting this issue really low?

Actions

Also available in: Atom PDF