Skip to content

os/bluestore: Fix race in BlueFS truncate / remove#62588

Merged
aclamk merged 2 commits intoceph:mainfrom
aclamk:wip-aclamk-bluefs-remove-truncate
Apr 9, 2025
Merged

os/bluestore: Fix race in BlueFS truncate / remove#62588
aclamk merged 2 commits intoceph:mainfrom
aclamk:wip-aclamk-bluefs-remove-truncate

Conversation

@aclamk
Copy link
Contributor

@aclamk aclamk commented Mar 31, 2025

Commit 2:
Fixes race between BlueFS::truncate and BlueFS::unlink.

Fixes: https://tracker.ceph.com/issues/70747

Commit 1:
Created test that exposes race between BlueFS::truncate and BlueFS::unlink.
Test requires injection of 1ms sleep to BlueFS::truncate.
Therefore, in this form, it is unsuitable for merge.

Running:

./bin/ceph_test_objectstore --gtest_filter=\*truncate_remove_race\*

It will assert.

Observe results:

$ ./bin/ceph-bluestore-tool --path bluestore.test_temp_dir/ fsck
2025-03-31T20:53:54.241+0000 7efd9b60ccc0 -1 bluefs _check_allocations OP_FILE_UPDATE_INC invalid extent 1: 0x1ec2000~20000: duplicate reference, ino 28
2025-03-31T20:53:54.241+0000 7efd9b60ccc0 -1 bluefs mount failed to replay log: (14) Bad address

But if we skip checking allocations:

$ ./bin/ceph-bluestore-tool --path bluestore.test_temp_dir/ --bluefs_log_replay_check_allocations=false fsck
2025-03-31T20:55:26.522+0000 7f0a81e18cc0 -1 bluefs _replay file with link count 0: file(ino 27 size 0xa mtime 2025-03-31T20:53:41.666479+0000 allocated 10000 alloc_commit 10000 extents [1:0x1ec2000~10000])
2025-03-31T20:55:26.522+0000 7f0a81e18cc0 -1 bluefs mount failed to replay log: (5) Input/output error

And look at BlueFS log produced:

$ ./bin/ceph-bluestore-tool --path bluestore.test_temp_dir/ bluefs-log-dump
...
 0x50000: txn(seq 81 len 0x20 crc 0x1c4ecd6b)
 0x50000:  op_dir_unlink  dir/test-file-0
 0x50017:  op_file_remove 27
 0x51000: txn(seq 82 len 0x9e crc 0x66a45296)
 0x51000:  op_file_update  file(ino 27 size 0xa mtime 2025-03-31T20:53:41.666479+0000 allocated 10000 alloc_commit 10000 extents [1:0x1ec2000~10000])
...

We can see "op_file_remove 27" and then resurrection in "op_file_update file(ino 27...) "

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@aclamk aclamk requested a review from a team as a code owner March 31, 2025 20:59
@aclamk aclamk marked this pull request as draft March 31, 2025 21:00
Copy link
Contributor

@rzarzynski rzarzynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

if (string(GetParam()) != "bluestore")
GTEST_SKIP();

BlueStore* bstore = dynamic_cast<BlueStore*> (store.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this suggests there might a better, already BlueStore-specific test to accommodate this testcase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but we do not have ceph-test-bluestore.
But with the amount of BlueStore specific tests in ceph-test-objectstore there are, maybe we should have.

ceph_abort_msg("truncate up not supported");
}

usleep(1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks there is a need for instrumentation infrastructure in BlueStore.
I can imagine something like a dedicated build of libos.a for tests carrying an extra compile-time flag supplemented with dynamic hooks to allow specific test cases to injecting some code (like sleep) along the paths.

#ifdef BLUESTORE_INSTRUMENTED
if (g_bluefs_truncate_hook) {
  *g_bluefs_truncate_hook();
}
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/THINKING LOUDLY
Perhaps we want randomized-delays-in-strategic-places also in debug builds to let race conditions like the replicated one to pop-up in our Teuthology testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already something that could be used: debug_point_t
https://github.com/ceph/ceph/blob/main/src/os/bluestore/BlueFS.h#L219-L244

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think random delays will be very useful.
But I have in (short) backlog a Synthetic tests that actually kills BlueStore mid-flight.
We will have much better testing of recovery this way.

@aclamk aclamk changed the title test/store_test: Expose race in BlueFS truncate / remove os/bluestore: Fix race in BlueFS truncate / remove Apr 1, 2025
@aclamk aclamk marked this pull request as ready for review April 1, 2025 14:12
@aclamk aclamk requested a review from ifed01 April 1, 2025 14:18
Created test that exposes race between BlueFS::truncate and BlueFS::unlink.
Test requires injection of 1ms sleep to BlueFS::truncate.
Therefore, in this form, it is unsuitable for merge.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
@aclamk aclamk force-pushed the wip-aclamk-bluefs-remove-truncate branch from 3335907 to 7abf99d Compare April 1, 2025 15:50
logger->set(l_bluefs_num_files, nodes.file_map.size());
file->deleted = true;

{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

funaction name to get _F substr in the name here and there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should and it will.

@aclamk aclamk force-pushed the wip-aclamk-bluefs-remove-truncate branch 2 times, most recently from b4b33bd to 80c9e3b Compare April 2, 2025 19:38
@aclamk aclamk added the aclamk-testing-ganymede bluestore testing label Apr 2, 2025
@aclamk
Copy link
Contributor Author

aclamk commented Apr 7, 2025

Approved by https://tracker.ceph.com/issues/70787.

It was possible for unlink() to interrupt ongoing truncate().
As the result, unlink() finishes properly, but truncate() is not aware
of it and does:
1) updates file that is already removed
2) releases same allocations again

Now fixed by checking if file is deleted under FILE lock.

https://tracker.ceph.com/issues/70747

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
@aclamk aclamk force-pushed the wip-aclamk-bluefs-remove-truncate branch from 80c9e3b to e5f8892 Compare April 8, 2025 15:52
@aclamk
Copy link
Contributor Author

aclamk commented Apr 9, 2025

jenkins test make check arm64

@aclamk
Copy link
Contributor Author

aclamk commented Apr 9, 2025

jenkins test make check

@aclamk
Copy link
Contributor Author

aclamk commented Apr 9, 2025

jenkins test make check arm64

@aclamk aclamk merged commit f826318 into ceph:main Apr 9, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants