Skip to content

os/bluestore: Fix problem with deferred writes replay#60753

Merged
aclamk merged 7 commits intoceph:mainfrom
aclamk:wip-aclamk-more-deferred-overwrite-fix
Mar 10, 2025
Merged

os/bluestore: Fix problem with deferred writes replay#60753
aclamk merged 7 commits intoceph:mainfrom
aclamk:wip-aclamk-more-deferred-overwrite-fix

Conversation

@aclamk
Copy link
Contributor

@aclamk aclamk commented Nov 15, 2024

Fixes known problems with deferred write replay.

Based on unittest for deferred writes #60732

Fixes: https://tracker.ceph.com/issues/68060

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@aclamk aclamk requested a review from a team as a code owner November 15, 2024 15:24
@aclamk aclamk force-pushed the wip-aclamk-more-deferred-overwrite-fix branch from 602b98c to deef599 Compare November 15, 2024 16:26
if (debug_deferred_replay_end) debug_deferred_replay_end();
if (fake_ch) {
new_coll_map.clear();
bdev->aio_submit(&ioctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that it's safe enough to try to accumulate all the deferred writes in a single IO/DB txc batch? Couldn't it go beyond some bounds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is limited by amount of deferred writes pending. It is limited by amount of deferred writes BlueStore queued before crash.
Plus ones that are stale.

I will take a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added batching of bdev ops.

@github-actions
Copy link

github-actions bot commented Dec 6, 2024

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Added debug hooks for:
- init_alloc
- deferred start/stop/operation

This created framework for specific unit tests.
These functions are as name suggests debug level, only for unittests.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Two possible cases of deferred write related corruption are replicated.
1. The case when L entries remain.
2. The case when deferred overwrites newly created BlueFS files

Provides unittest for:
https://tracker.ceph.com/issues/68060

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Modify _deferred_replay to execute it directly, without involving
BlueStore state machine.
In result, kv-sync thread is not necessary.
RocksDB L entries (deferred writes) are removed directly.

Fixes: https://tracker.ceph.com/issues/68060,
  part responsible for stale L entries.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
@aclamk aclamk force-pushed the wip-aclamk-more-deferred-overwrite-fix branch from deef599 to 3e49cd4 Compare December 18, 2024 11:53
@aclamk
Copy link
Contributor Author

aclamk commented Jan 23, 2025

jenkins test make check

@aclamk
Copy link
Contributor Author

aclamk commented Jan 23, 2025

jenkins test make check arm64

};
}

// small deferred writes over object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment isn't correct now, is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doch, it checks out.
16x IO by 8 bytes and then one 64K write.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant we don't overwrite the previous object any more

}
}
} else {
delete deferred_txn;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like deferred_txn is leaking if not getting here.
Previously TransContext cared about that...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact we don't need it to be allocated from heap any more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer allocated.

Modify _deferred_replay to separate:
- applying IO to the disk
- DB transaction to remove keys

Changed _open_db_and_around. It now calls _deferred_replay.
Adapted callers, including fsck.

Fixed: https://tracker.ceph.com/issues/68060, original report.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
@aclamk aclamk force-pushed the wip-aclamk-more-deferred-overwrite-fix branch from 3e49cd4 to f2b33bc Compare January 24, 2025 11:39
@aclamk
Copy link
Contributor Author

aclamk commented Feb 7, 2025

jenkins test make check arm64

@aclamk
Copy link
Contributor Author

aclamk commented Feb 11, 2025

jenkins test make check

1 similar comment
@aclamk
Copy link
Contributor Author

aclamk commented Feb 12, 2025

jenkins test make check

@aclamk
Copy link
Contributor Author

aclamk commented Feb 25, 2025

jenkins test make check arm64

aclamk added 3 commits March 1, 2025 09:32
Add code to print times; this is for make check arm64.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
@aclamk
Copy link
Contributor Author

aclamk commented Mar 4, 2025

jenkins test make check arm64

2 similar comments
@aclamk
Copy link
Contributor Author

aclamk commented Mar 4, 2025

jenkins test make check arm64

@aclamk
Copy link
Contributor Author

aclamk commented Mar 4, 2025

jenkins test make check arm64

bool read_only,
bool to_repair,
bool apply_deferred,
bool remove_deferred)
Copy link
Contributor

@ronen-fr ronen-fr Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a must for this specific PR, but can we create types for these 4 boolean values? having a function accept 4 boolean params is an invitation for problems...

@ronen-fr
Copy link
Contributor

Please note - this issue is affecting multiple CI runs, and is a real nuisance...

@aclamk aclamk merged commit 9ef9c12 into ceph:main Mar 10, 2025
11 checks passed
@Matan-B
Copy link
Contributor

Matan-B commented Mar 11, 2025

Could this possibly break Crimson w/ Bluestore? See dead jobs, all due to unable to monut Bluestore:
https://pulpito.ceph.com/matan-2025-03-10_15:16:52-crimson-rados-wip-matanb-backfill-scan-distro-crimson-smithi/

ERROR 2025-03-10 15:39:32,855 [shard 0:main] none - bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x7c7a2766, expected 0x2ce10c3d, device location [0x432000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
ERROR 2025-03-10 15:39:32,855 [shard 0:main] none - bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x7c7a2766, expected 0x2ce10c3d, device location [0x432000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
ERROR 2025-03-10 15:39:32,855 [shard 0:main] none - bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x7c7a2766, expected 0x2ce10c3d, device location [0x432000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
ERROR 2025-03-10 15:39:32,855 [shard 0:main] none - bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x7c7a2766, expected 0x2ce10c3d, device location [0x432000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants