Skip to content

Add memory usage logging during the tick for both OSD and MON components#54698

Closed
shreekarSS wants to merge 3 commits intoceph:mainfrom
shreekarSS:log-memory-during-tick-fix
Closed

Add memory usage logging during the tick for both OSD and MON components#54698
shreekarSS wants to merge 3 commits intoceph:mainfrom
shreekarSS:log-memory-during-tick-fix

Conversation

@shreekarSS
Copy link

@shreekarSS shreekarSS commented Nov 28, 2023

Description

This pull request adds memory usage logging during the tick operation for OSD and MON components. The aim is to improve visibility into memory consumption trends, facilitating monitoring and debugging.

Changes

  • Implemented memory usage logging with the MemoryModel class.
  • Established a baseline for reference.
  • Included detailed memory metrics (total, rss, heap) in the tick log output.

Context

This enhancement addresses the need for better memory insights in OSD and MON components, enhancing overall system health and performance monitoring.

Related Issue

Fixes: https://tracker.ceph.com/issues/54525

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@shreekarSS shreekarSS requested a review from a team as a code owner November 28, 2023 19:44
@shreekarSS shreekarSS force-pushed the log-memory-during-tick-fix branch from 0ba6a23 to 92681bd Compare November 28, 2023 20:54
@shreekarSS
Copy link
Author

From OSD log:
2023-11-29T05:03:42.283+0000 ffff7cbec140 2 osd.0 56 OSD Memory usage: total 748500, rss 194400, heap 518416, baseline 518416

From MON log:
2023-11-29T14:29:08.984+0000 ffff9394bb00 2 mon.a@0(leader) e1 MON Memory usage: total 718084, rss 417584, heap 199104, baseline 190912

static MemoryModel mm(g_ceph_context);
static MemoryModel::snap last;
mm.sample(&last);
static MemoryModel::snap baseline = last;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does appear to identical to the existing code in MDCache, but there are a few things I don't understand:

  1. Won't last and baseline always match?
  2. Why do mm, last, and baseline need to be static? I'm not really ok with making these static without a specific reason.

It seems like mm and last can simply be local variables. I presume that the intent is for baseline to contain the last recorded value so that the log line can contain the last value and the current value, so baseline should be a member of Monitor.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @athanatos,
I have implemented the recommended changes. The 'baseline' variable is now set as a private member in the Monitor, OSD, and MDS classes. This enhancement streamlines our approach to memory tracking and analysis. Thank you for your constructive feedback.

@shreekarSS shreekarSS force-pushed the log-memory-during-tick-fix branch 2 times, most recently from 92681bd to 997f35d Compare January 10, 2024 05:52
shreekarSS added a commit to shreekarSS/ceph that referenced this pull request Jan 12, 2024
…sion in ceph#54698 (comment).

- 'baseline' is now a member of the Monitor class for enhanced tracking and logging.

Signed-off-by: ShreekarSS <Shreekara.ss@gmail.com>
@shreekarSS shreekarSS force-pushed the log-memory-during-tick-fix branch from eecd2d9 to 850983a Compare January 12, 2024 20:52
@github-actions github-actions bot added the cephfs Ceph File System label Jan 22, 2024
shreekarSS added a commit to shreekarSS/ceph that referenced this pull request Jan 23, 2024
…sion in ceph#54698 (comment).

- 'baseline' is now a member of the Monitor class for enhanced tracking and logging.

Signed-off-by: ShreekarSS <Shreekara.ss@gmail.com>
@shreekarSS shreekarSS force-pushed the log-memory-during-tick-fix branch 5 times, most recently from a23ff89 to 6185b9f Compare January 26, 2024 23:24
@shreekarSS shreekarSS force-pushed the log-memory-during-tick-fix branch from 6185b9f to e7a0d23 Compare January 27, 2024 06:38
@athanatos athanatos requested review from a team and athanatos and removed request for athanatos January 31, 2024 23:19
@athanatos
Copy link
Contributor

@ceph/cephfs There's a minor cleanup to the MDCache in this PR -- can you take a look? @Ceph/core I ended up rewriting several of the commits here, so I probably shouldn't review it.

Copy link
Contributor

@dparmar18 dparmar18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MDCache code changes LGTM.

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from cephfs POV.

@shreekarSS shreekarSS force-pushed the log-memory-during-tick-fix branch from b613075 to 3cc340e Compare July 31, 2024 21:52
@github-actions
Copy link

github-actions bot commented Aug 5, 2024

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@yuriw
Copy link
Contributor

yuriw commented Aug 8, 2024

@shreekarSS pls rebase

We're going to add this into the OSD and the Mon.

Signed-off-by: Samuel Just <sjust@redhat.com>
Fixes: https://tracker.ceph.com/issues/54525
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Shreekara <sshreeka@redhat.com>
Fixes: https://tracker.ceph.com/issues/54525
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Shreekara <sshreeka@redhat.com>
@shreekarSS shreekarSS force-pushed the log-memory-during-tick-fix branch from 3cc340e to 0e6d46d Compare August 9, 2024 05:34
void new_tick();

// Memory usage baseline snapshot for monitoring
MemoryModel::snap baseline;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also fix MDCache to not have a static duration variable: #56812 (comment)

<< " total " << mm.last.get_total()
<< ", rss " << mm.last.get_rss()
<< ", heap " << mm.last.get_heap()
<< ", baseline " << baseline.get_heap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add void MemoryModel::mem_snap_t::print(std::ostream&) and make the printing consistent for all of these douts so you can:

dout(2) << "MON memory usage: " << mm.last << dendl;

and add units (i.e. kiB IIRC). The unit-less prints have been a source of confusion in the past for support engineers.

@yuriw
Copy link
Contributor

yuriw commented Oct 1, 2024

jenkins test this please

@ljflores
Copy link
Member

ljflores commented Oct 3, 2024

@shreekarSS ping

@mchangir mchangir removed the wip-mchangir-testing not yet production ready label Nov 22, 2024
@yuriw
Copy link
Contributor

yuriw commented Jan 17, 2025

@shreekarSS do we still want this?

@ceph-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions github-actions bot removed the stale label Apr 16, 2025
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jun 15, 2025
@ronen-fr
Copy link
Contributor

@shreekarSS ping

@github-actions github-actions bot removed the stale label Jun 16, 2025
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Aug 15, 2025
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Sep 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.