Skip to content

os/bluestore: Fix corruption in BlueFS allocator caused by No-Column-B #43583

Merged
liewegas merged 4 commits intoceph:masterfrom
benhanokh:ncb_fixes
Oct 28, 2021
Merged

os/bluestore: Fix corruption in BlueFS allocator caused by No-Column-B #43583
liewegas merged 4 commits intoceph:masterfrom
benhanokh:ncb_fixes

Conversation

@benhanokh
Copy link
Contributor

os/bluestore/BlueStore::NCB - Fix corruption in BlueFS allocator

On startup NCB code will load extents from the allocation-file into shared_alloc.a while walking the allocation file it might find a corruption and the process will be aborted.
We will then run a full recovery building a temp allocator from RocksDB::ONodes and finally copying it into shared_alloc.a which has still some allocation from the first attempt

This issue was fixed by changing restore-allocator code to load allocation into a temp-allocator (lie we already do in the recovery flow) and only when everything was found to be valid copy the allocations into shared_alloc.a.

Fixes: https://tracker.ceph.com/issues/52399
Signed-off-by: Gabriel Benhanokh gbenhano@redhat.com

… only copy the allocation to the shared-allocator after the file was verified and all extents were cleared

Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
bluefs->sync_metadata(false);
dout(1) << "Remove Allocation File ret_code=" << ret << dendl;
}
return ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interesting case ! (ret == 0) did not get any douts.

Copy link
Contributor

@aclamk aclamk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code fixes problem with interrupted allocation restoring.
Log/err messages need polishing.

Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
@benhanokh benhanokh requested a review from aclamk October 20, 2021 07:29
@ceph ceph deleted a comment from aclamk Oct 20, 2021
@benhanokh
Copy link
Contributor Author

jenkins test api

@ceph ceph deleted a comment from neha-ojha Oct 21, 2021
@benhanokh
Copy link
Contributor Author

jenkins test api

@liewegas liewegas changed the title Fix corruption in BlueFS allocator caused by No-Column-B os/bluestore: Fix corruption in BlueFS allocator caused by No-Column-B Oct 27, 2021
@liewegas
Copy link
Member

jenkins test api

@liewegas liewegas merged commit 3298e49 into ceph:master Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants