Skip to content

os/bluestore: improve cache_onode and cache_buffer mempool accounting#40066

Merged
yuriw merged 1 commit intoceph:mainfrom
ideepika:wip-mempool-buffers
Dec 15, 2022
Merged

os/bluestore: improve cache_onode and cache_buffer mempool accounting#40066
yuriw merged 1 commit intoceph:mainfrom
ideepika:wip-mempool-buffers

Conversation

@ideepika
Copy link
Member

  • move physical extent to mempool bluestore_extent from cache_other
  • move bluestore_shared_blob_t from cache_other to shared_blob mempool

Signed-off-by: Deepika Upadhyay dupadhya@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Copy link
Contributor

@aclamk aclamk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice stuff.

@ideepika ideepika force-pushed the wip-mempool-buffers branch from 27c1d56 to 15c7c87 Compare March 22, 2021 13:19
@ideepika ideepika requested a review from neha-ojha March 22, 2021 13:19
@tchaikov
Copy link
Contributor

What is a mempool?
-----------------
/home/jenkins-build/build/workspace/ceph-pr-docs/doc/dev/mempool_accounting.rst:52: WARNING: Unknown directive type "see_more".

.. see_more:: https://github.com/ceph/ceph/blob/fccbdc0905e3868666fbb10803bac6b73f687cb1/src/include/mempool.h

Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


/home/jenkins-build/build/workspace/ceph-pr-docs/doc/dev/mempool_accounting.rst:32: WARNING: Unexpected indentation.
looking for now-outdated files... none found
pickling environment... done

@ideepika ideepika force-pushed the wip-mempool-buffers branch 3 times, most recently from f1db451 to f5ce81f Compare March 29, 2021 10:21
@ideepika ideepika requested a review from tchaikov March 30, 2021 09:57
@tchaikov tchaikov dismissed their stale review March 30, 2021 11:06

dismissed

@ideepika
Copy link
Member Author

ideepika commented Apr 6, 2021

@tchaikov @neha-ojha should we merge this?

@neha-ojha
Copy link
Member

@tchaikov @neha-ojha should we merge this?

barring one nit, looks good to go with tests passing

@ideepika ideepika force-pushed the wip-mempool-buffers branch 2 times, most recently from 92e719f to a57b0cb Compare April 7, 2021 11:30
@ifed01
Copy link
Contributor

ifed01 commented Sep 12, 2022

jenkins test make check

@kamoltat
Copy link
Member

kamoltat commented Sep 30, 2022

@ideepika please confirm if this PR caused this failure
/a/yuriw-2022-09-27_23:37:28-rados-wip-yuri2-testing-2022-09-27-1455-distro-default-smithi/7046379/

2022-09-28T07:11:23.610 INFO:teuthology.orchestra.run.smithi050.stdout:[  FAILED  ] ObjectStore/StoreTestSpecificAUSize.OnodeSizeTracking/2, where GetParam() = "bluestore"
2022-09-28T07:11:23.611 INFO:teuthology.orchestra.run.smithi050.stdout:[  FAILED  ] ObjectStore/StoreTestSpecificAUSize.BlobReuseOnOverwrite/2, where GetParam() = "bluestore"
2022-09-28T07:11:23.611 INFO:teuthology.orchestra.run.smithi050.stdout:
2022-09-28T07:11:23.611 INFO:teuthology.orchestra.run.smithi050.stdout: 2 FAILED TESTS
2022-09-28T07:11:23.619 DEBUG:teuthology.orchestra.run:got remote process result: 1
2022-09-28T07:11:23.621 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_a773b0612ada07695bef890b85208fa310b79d30/teuthology/run_tasks.py", line 103, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_a773b0612ada07695bef890b85208fa310b79d30/teuthology/run_tasks.py", line 82, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_a773b0612ada07695bef890b85208fa310b79d30/teuthology/task/exec.py", line 72, in task
    c],
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_a773b0612ada07695bef890b85208fa310b79d30/teuthology/orchestra/remote.py", line 525, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_a773b0612ada07695bef890b85208fa310b79d30/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_a773b0612ada07695bef890b85208fa310b79d30/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_a773b0612ada07695bef890b85208fa310b79d30/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi050 with status 1: 'sudo TESTDIR=/home/ubuntu/cephtest bash -c \'mkdir $TESTDIR/archive/ostest && cd $TESTDIR/archive/ostest && ulimit -Sn 16384 && CEPH_ARGS="--no-log-to-stderr --log-file $TESTDIR/archive/ceph_test_objectstore.log --debug-bluestore 20" ceph_test_objectstore --gtest_filter=*/2:-*SyntheticMatrixC* --gtest_catch_exceptions=0\''
2022-09-28T07:11:23.801 ERROR:teuthology.run_tasks: Sentry event: https://sentry.ceph.com/organizations/ceph/?query=4f2e242dc2fa43f381e7e73370ef102f
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_a773b0612ada07695bef890b85208fa310b79d30/teuthology/run_tasks.py", line 103, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_a773b0612ada07695bef890b85208fa310b79d30/teuthology/run_tasks.py", line 82, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_a773b0612ada07695bef890b85208fa310b79d30/teuthology/task/exec.py", line 72, in task
    c],
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_a773b0612ada07695bef890b85208fa310b79d30/teuthology/orchestra/remote.py", line 525, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)

@ideepika
Copy link
Member Author

@kamoltat looks related to me, i will test and work on a fix for this during the weekend, thanks for the mention!

@kamoltat
Copy link
Member

@kamoltat looks related to me, i will test and work on a fix for this during the weekend, thanks for the mention!

Thank you, Deepika 😊

@ideepika
Copy link
Member Author

ideepika commented Oct 3, 2022

sorry, have to delay on this a bit, i will try to finish by this week, thanks!

@ideepika
Copy link
Member Author

ASSERT_TRUE(bl_eq(expected, bl));
ASSERT_EQ(logger->get(l_bluestore_blobs), 1u);
ASSERT_EQ(logger->get(l_bluestore_extents), 2u);
ASSERT_EQ(mempool::bluestore_blob::allocated_items(), 1u);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ideepika - do you have any ideas how this modification could fix the following issue @kamoltat shared before:
g057e8043/rpm/el8/BUILD/ceph-18.0.0-88-g057e8043/src/test/objectstore/store_test.cc:7165: Failure
2022-09-28T06:47:07.984 INFO:teuthology.orchestra.run.smithi050.stdout:Expected equality of these values:
2022-09-28T06:47:07.984 INFO:teuthology.orchestra.run.smithi050.stdout: mempool::bluestore_blob::allocated_items()
2022-09-28T06:47:07.984 INFO:teuthology.orchestra.run.smithi050.stdout: Which is: 9
2022-09-28T06:47:07.985 INFO:teuthology.orchestra.run.smithi050.stdout: 1u
2022-09-28T06:47:07.985 INFO:teuthology.orchestra.run.smithi050.stdout: Which is: 1
2022-09-28T06:47:08.498 INFO:teuthology.orchestra.run.smithi050.stdout:==> rm -r bluestore.test_temp_dir

To me this rather looks like some volatile behavior which causes mempool stats to vary. Wouldn't be more reliable to get rid of these stats instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is due to onode_cache_shard vector which is where we are accounting for onodes in this PR
I checked as well this seem to be true:
https://github.com/ceph/ceph/pull/40066/files#diff-157b0c4b175141aba18fe37a72d49ca9175e01f69a266595d495df86b5240e86R7010
@ifed01 is there a way we can shift this accounting to some point later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ideepika - not sure I understand your point/question :( The above error was referring to bluestore_blob mempool. onode_cache_shard looks irrelevant in this case...
Anyway my suggestion is just not introduce these two new assertions for bluestore_blob and
bluestore_extent mempools. And hence avoid this unclear failure which we don't know root cause for yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, updated the changes

* move physical extent to mempool bluestore_extent from cache_other
* move bluestore_shared_blob_t from cache_other to shared_blob mempool

Signed-off-by: Deepika Upadhyay <deepika@koor.tech>
@ideepika ideepika force-pushed the wip-mempool-buffers branch from 7d82664 to 0bf3519 Compare November 1, 2022 07:03
@ideepika
Copy link
Member Author

ideepika commented Nov 8, 2022

jenkins make check arm64

@ideepika
Copy link
Member Author

ideepika commented Nov 8, 2022

jenkins test make check arm64

@ljflores
Copy link
Member

Rados suite review:

https://pulpito.ceph.com/?branch=wip-yuri-testing-2022-12-06-1204
https://pulpito.ceph.com/?branch=wip-yuri-testing-2022-12-12-1136

Failures, unrelated:
1. https://tracker.ceph.com/issues/58096
2. https://tracker.ceph.com/issues/52321
3. https://tracker.ceph.com/issues/58173
4. https://tracker.ceph.com/issues/52129
5. https://tracker.ceph.com/issues/58097
6. https://tracker.ceph.com/issues/57546
7. https://tracker.ceph.com/issues/58098
8. https://tracker.ceph.com/issues/57731
9. https://tracker.ceph.com/issues/55606
10. https://tracker.ceph.com/issues/58256
11. https://tracker.ceph.com/issues/58258

Details:
1. test_cluster_set_reset_user_config: NFS mount fails due to missing ceph directory - Ceph - Orchestrator
2. qa/tasks/rook times out: 'check osd count' reached maximum tries (90) after waiting for 900 seconds - Ceph - Orchestrator
3. api_aio_pp: failure on LibRadosAio.SimplePoolEIOFlag and LibRadosAio.PoolEIOFlag - Ceph - RADOS
4. LibRadosWatchNotify.AioWatchDelete failed - Ceph - RADOS
5. qa/workunits/post-file.sh: kex_exchange_identification: read: Connection reset by peer - Ceph - RADOS
6. rook: ensure CRDs are installed first - Ceph - Orchestrator
7. qa/workunits/rados/test_crash.sh: crashes are never posted - Ceph - RADOS
8. Problem: package container-selinux conflicts with udica < 0.2.6-1 provided by udica-0.2.4-1 - Infrastructure
9. [ERR] Unhandled exception from module ''devicehealth'' while running on mgr.y: unknown - Ceph - CephSqlite
10. ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2: Expected: (logger->get(l_bluefs_slow_used_bytes)) >= (16 * 1024 * 1024), actual: 0 vs 16777216 - Ceph - Bluestore
11. rook: kubelet fails from connection refused - Ceph - Orchestrator

@yuriw yuriw merged commit 4c3212d into ceph:main Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.