Skip to content

os/bluestore: get rid off statfs update on each txc#46036

Merged
ljflores merged 10 commits intoceph:mainfrom
ifed01:wip-ifed-new-statfs-update
Nov 10, 2022
Merged

os/bluestore: get rid off statfs update on each txc#46036
ljflores merged 10 commits intoceph:mainfrom
ifed01:wip-ifed-new-statfs-update

Conversation

@ifed01
Copy link
Contributor

@ifed01 ifed01 commented Apr 26, 2022

Instead this relies on NCB stuff to recover bluestore stats in case of non-graceful shutdown.
This functionality is a prerequisite for upcoming new WAL implementation. Additionally this might provide some performance improvements as DB gets less load.
And even more - redesigned NCB recovery is more performant too.

Signed-off-by: Igor Fedotov igor.fedotov@croit.io

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@ifed01
Copy link
Contributor Author

ifed01 commented Apr 26, 2022

jenkins test make check

Copy link
Contributor

@aclamk aclamk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great stuff.
I think refactor of decode_some should be made differently, but I am open to discussion.

} catch (ceph::buffer::error& e) {
derr << "fsck error: failed to decode Pool StatFS record"
<< pretty_binary_string(key) << dendl;
string key;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we can make check/repair statfs much simpler.
We could create a map of {key->stat} from the stats we just scanned.
This should be ideally reflected by state of DB.
If DB does not have something - we complain and add (in repair mode).
If DB has different value - we complain and fix (in repair mode).
If DB has extra key - we complain and delete (in repair mode).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky thing is that now we might have statfs persisted at DB in two different ways depending on the previous shutdown: graceful or not. This implementation checks in-memory stats (loaded either from relevant DB records or via recovery) vs. stats loaded by fsck.
For your suggestion we'll need to perform DB update immediately after the recovery and then de-serialize it back for fsck (and then finally update DB once again on a shutdown). Which doesn't make much sense to me...

@ifed01 ifed01 force-pushed the wip-ifed-new-statfs-update branch 4 times, most recently from 7032f8a to 3301422 Compare April 28, 2022 20:36
@ifed01 ifed01 force-pushed the wip-ifed-new-statfs-update branch from 3301422 to d088026 Compare May 12, 2022 12:54
@ifed01
Copy link
Contributor Author

ifed01 commented May 12, 2022

@aclamk - I think I resolved most of you comments and responded to the rest, would you take another look?

@djgalloway djgalloway changed the base branch from master to main May 25, 2022 20:00
Copy link
Contributor

@aclamk aclamk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strangely, I found no errors.
I wonder how many times you recheck it yourself Igor.

@ifed01 ifed01 force-pushed the wip-ifed-new-statfs-update branch from d088026 to 54ec858 Compare June 17, 2022 09:35
@ifed01 ifed01 requested a review from a team as a code owner June 17, 2022 09:35
@ifed01
Copy link
Contributor Author

ifed01 commented Jun 17, 2022

jenkins test make check

3 similar comments
@ifed01
Copy link
Contributor Author

ifed01 commented Jun 17, 2022

jenkins test make check

@ifed01
Copy link
Contributor Author

ifed01 commented Jun 20, 2022

jenkins test make check

@ifed01
Copy link
Contributor Author

ifed01 commented Jun 20, 2022

jenkins test make check

@ifed01 ifed01 force-pushed the wip-ifed-new-statfs-update branch 2 times, most recently from 7286b73 to 350bbf0 Compare July 4, 2022 11:49
@ifed01
Copy link
Contributor Author

ifed01 commented Jul 8, 2022

jenkins test make check

@ifed01
Copy link
Contributor Author

ifed01 commented Aug 11, 2022

@aclamk - mind taking another look?

Copy link
Contributor

@aclamk aclamk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I already approved this.
I already have some cleanup work that depends on this one.

@sseshasa
Copy link
Contributor

sseshasa commented Sep 5, 2022

Teuthology Test Result
http://pulpito.front.sepia.ceph.com/?branch=wip-yuri5-testing-2022-08-18-0812

@ifed01 Across both the runs from the link above, the following failed jobIDs appear to be related to this PR.

The tests are thrashosds related and in each case the failure is due to an osd not coming up after being restarted due to fsck errors being reported by _fsck_check_statfs(). Could you please take a look?

Some logs from JobID: 6978863

...
2022-08-18T18:28:09.247+0000 7f83ac952440 10 bluefs _read h 0x5580b7c03500 0x7faa~8000 from file(ino 117 size 0xb79d mtime 2022-08-18T18:27:19.033067+0000 allocated 10000 alloc_commit 10000 extents [1:0x60000~10000]) prefetch
2022-08-18T18:28:09.247+0000 7f83ac952440 20 bluefs _read reaching (or past) eof, len clipped to 0x37f3
2022-08-18T18:28:09.247+0000 7f83ac952440 20 bluefs _read left 0x4056 len 0x37f3
2022-08-18T18:28:09.247+0000 7f83ac952440 20 bluefs _read got 14323
2022-08-18T18:28:09.248+0000 7f83902d9700 20 bluestore.MempoolThread(0x5580b7c26b38) _resize_shards cache_size: 563370167 kv_alloc: 197132288 kv_used: 204864 kv_onode_alloc: 142606336 kv_onode_used: 7497280 meta_alloc: 130023424 meta_used: 136 data_alloc: 75497472 data_used: 0
2022-08-18T18:28:09.256+0000 7f83ac952440  1 bluestore(/var/lib/ceph/osd/ceph-3) _fsck_on_open checking pool_statfs
2022-08-18T18:28:09.256+0000 7f83ac952440 -1 bluestore(/var/lib/ceph/osd/ceph-3) _fsck_check_statfs::fsck error: pool 3 has got no statfs to match against: store_statfs(0x0/0x0/0x0, data 0xeb9d000/0x975a000, compress 0x0/0x0/0x0, omap 0x0, meta 0x0)

...
...

2022-08-18T18:28:09.264+0000 7f83ac952440  1 bluestore(/var/lib/ceph/osd/ceph-3) _fsck_on_open checking deferred events
2022-08-18T18:28:09.264+0000 7f83ac952440  2 bluestore(/var/lib/ceph/osd/ceph-3) _fsck_on_open 464 objects, 0 of them sharded.
2022-08-18T18:28:09.264+0000 7f83ac952440  2 bluestore(/var/lib/ceph/osd/ceph-3) _fsck_on_open 692 extents to 692 blobs, 0 spanning, 0 shared.
2022-08-18T18:28:09.264+0000 7f83ac952440  1 bluestore(/var/lib/ceph/osd/ceph-3) _fsck_on_open <<<FINISH>>> with 1 errors, 0 warnings, 0 repaired, 1 remaining in 0.036685 seconds

...

2022-08-18T18:28:09.265+0000 7f83ac952440  1 bluefs umount
2022-08-18T18:28:09.265+0000 7f83ac952440 10 bluefs sync_metadata - no pending log events
2022-08-18T18:28:09.265+0000 7f83ac952440 10 bluefs _drain_writer 0x5580b7c40600 type 0
2022-08-18T18:28:09.265+0000 7f83ac952440 20 bluefs _stop_alloc
2022-08-18T18:28:09.265+0000 7f83ac952440  1 bdev(0x5580b89a1c00 /var/lib/ceph/osd/ceph-3/block) close
2022-08-18T18:28:09.462+0000 7f83ac952440 10 bluestore(/var/lib/ceph/osd/ceph-3) _close_fm
2022-08-18T18:28:09.462+0000 7f83ac952440  1 freelist shutdown
2022-08-18T18:28:09.462+0000 7f83ac952440  1 bdev(0x5580b89a0400 /var/lib/ceph/osd/ceph-3/block) close
2022-08-18T18:28:09.681+0000 7f83ac952440 -1 bluestore(/var/lib/ceph/osd/ceph-3) _mount fsck found 1 errors
2022-08-18T18:28:09.681+0000 7f83ac952440 -1 osd.3 0 OSD:init: unable to mount object store
2022-08-18T18:28:09.681+0000 7f83ac952440 -1 ^[[0;31m ** ERROR: osd init failed: (5) Input/output error^[[0m

@yuriw
Copy link
Contributor

yuriw commented Sep 5, 2022

Per @sseshasa

"@yuriweinstein I looked into the pending jobIDs (as mentioned by @Nehaojha ) and all the other failures reported in the re-run (http://pulpito.front.sepia.ceph.com/lflores-2022-08-19_21:39:29-rados-wip-yuri5-testing-2022-08-18-0812-distro-default-smithi/).

Except for PR: os/bluestore: get rid off statfs update on each txc, the rest
of the PRs are Rados approved."

ifed01 and others added 10 commits October 3, 2022 16:09
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
This implements a basis for statfs recovery from persistent Onode metadata.
Plus some redesign to make this procedure more lightweight and performant
 - via avoiding full Onode rebuild.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Refine actions taken in close_db_environment. Its role is to close
db handle and environment when db was used in special modes - repair/reshard,
and is not actually open to typical r/w.

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
Signed-off-by: Igor Fedotov <ifedotov@croit.io>
@ifed01 ifed01 force-pushed the wip-ifed-new-statfs-update branch from bef157e to 18cc766 Compare October 4, 2022 13:33
@ifed01 ifed01 removed the TESTED label Oct 4, 2022
@ljflores
Copy link
Member

Rados suite review: http://pulpito.front.sepia.ceph.com/?branch=wip-yuri7-testing-2022-10-17-0814

Failures:, unrelated
1. https://tracker.ceph.com/issues/57311
2. https://tracker.ceph.com/issues/52321
3. https://tracker.ceph.com/issues/52657
4. https://tracker.ceph.com/issues/57935

Details:
1. rook: ensure CRDs are installed first - Ceph - Orchestrator
2. qa/tasks/rook times out: 'check osd count' reached maximum tries (90) after waiting for 900 seconds - Ceph - Orchestrator
2. rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+laggy" pgs - Ceph - RADOS
3. MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)' - Ceph - RADOS
4. all test jobs get stuck at "Running task ansible.cephlab..." - Infrstructure - Sepia

@ljflores ljflores merged commit b0d73c5 into ceph:main Nov 10, 2022
@ifed01 ifed01 deleted the wip-ifed-new-statfs-update branch November 10, 2022 21:01
ifed01 added a commit to ifed01/ceph that referenced this pull request Feb 10, 2023
readability.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 3df4a8d)

 Conflicts:
	src/os/bluestore/BlueStore.cc
	src/os/bluestore/BlueStore.h
 <lack of ceph#46036 backporting>
ifed01 added a commit to ifed01/ceph that referenced this pull request Feb 10, 2023
This should eliminate duplicate onode releases that could happen before.
Additionally onode pinning is performed during cache trimming not onode
ref count increment.

[Hopefully] fixes: https://tracker.ceph.com/issues/53002

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit a3057f4)

 Conflicts:
	src/os/bluestore/BlueStore.cc
	src/os/bluestore/BlueStore.h
 <lack of ceph#46036 and
  ceph#43299 backporting>
mkogan1 pushed a commit to mkogan1/ceph that referenced this pull request Mar 25, 2024
readability.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 3df4a8d)

 Conflicts:
	src/os/bluestore/BlueStore.cc
	src/os/bluestore/BlueStore.h
 <lack of ceph#46036 backporting>

(cherry picked from commit 816b4fc)

Resolves: rhbz#2218445
mkogan1 pushed a commit to mkogan1/ceph that referenced this pull request Mar 25, 2024
This should eliminate duplicate onode releases that could happen before.
Additionally onode pinning is performed during cache trimming not onode
ref count increment.

[Hopefully] fixes: https://tracker.ceph.com/issues/53002

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit a3057f4)

 Conflicts:
	src/os/bluestore/BlueStore.cc
	src/os/bluestore/BlueStore.h
 <lack of ceph#46036 and
  ceph#43299 backporting>

(cherry picked from commit 4a80641)

Resolves: rhbz#2218445
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants