Add memory usage logging during the tick for both OSD and MON components#54698
Add memory usage logging during the tick for both OSD and MON components#54698shreekarSS wants to merge 3 commits intoceph:mainfrom
Conversation
0ba6a23 to
92681bd
Compare
|
From OSD log: From MON log: |
src/mon/Monitor.cc
Outdated
| static MemoryModel mm(g_ceph_context); | ||
| static MemoryModel::snap last; | ||
| mm.sample(&last); | ||
| static MemoryModel::snap baseline = last; |
There was a problem hiding this comment.
This does appear to identical to the existing code in MDCache, but there are a few things I don't understand:
- Won't last and baseline always match?
- Why do mm, last, and baseline need to be static? I'm not really ok with making these static without a specific reason.
It seems like mm and last can simply be local variables. I presume that the intent is for baseline to contain the last recorded value so that the log line can contain the last value and the current value, so baseline should be a member of Monitor.
There was a problem hiding this comment.
hi @athanatos,
I have implemented the recommended changes. The 'baseline' variable is now set as a private member in the Monitor, OSD, and MDS classes. This enhancement streamlines our approach to memory tracking and analysis. Thank you for your constructive feedback.
92681bd to
997f35d
Compare
…sion in ceph#54698 (comment). - 'baseline' is now a member of the Monitor class for enhanced tracking and logging. Signed-off-by: ShreekarSS <Shreekara.ss@gmail.com>
eecd2d9 to
850983a
Compare
…sion in ceph#54698 (comment). - 'baseline' is now a member of the Monitor class for enhanced tracking and logging. Signed-off-by: ShreekarSS <Shreekara.ss@gmail.com>
a23ff89 to
6185b9f
Compare
6185b9f to
e7a0d23
Compare
|
@ceph/cephfs There's a minor cleanup to the MDCache in this PR -- can you take a look? @Ceph/core I ended up rewriting several of the commits here, so I probably shouldn't review it. |
dparmar18
left a comment
There was a problem hiding this comment.
MDCache code changes LGTM.
b613075 to
3cc340e
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
@shreekarSS pls rebase |
We're going to add this into the OSD and the Mon. Signed-off-by: Samuel Just <sjust@redhat.com>
Fixes: https://tracker.ceph.com/issues/54525 Signed-off-by: Samuel Just <sjust@redhat.com> Signed-off-by: Shreekara <sshreeka@redhat.com>
Fixes: https://tracker.ceph.com/issues/54525 Signed-off-by: Samuel Just <sjust@redhat.com> Signed-off-by: Shreekara <sshreeka@redhat.com>
3cc340e to
0e6d46d
Compare
| void new_tick(); | ||
|
|
||
| // Memory usage baseline snapshot for monitoring | ||
| MemoryModel::snap baseline; |
There was a problem hiding this comment.
Please also fix MDCache to not have a static duration variable: #56812 (comment)
| << " total " << mm.last.get_total() | ||
| << ", rss " << mm.last.get_rss() | ||
| << ", heap " << mm.last.get_heap() | ||
| << ", baseline " << baseline.get_heap() |
There was a problem hiding this comment.
Please add void MemoryModel::mem_snap_t::print(std::ostream&) and make the printing consistent for all of these douts so you can:
dout(2) << "MON memory usage: " << mm.last << dendl;
and add units (i.e. kiB IIRC). The unit-less prints have been a source of confusion in the past for support engineers.
|
jenkins test this please |
|
@shreekarSS ping |
|
@shreekarSS do we still want this? |
|
Can one of the admins verify this patch? |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
@shreekarSS ping |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
Description
This pull request adds memory usage logging during the tick operation for OSD and MON components. The aim is to improve visibility into memory consumption trends, facilitating monitoring and debugging.
Changes
Context
This enhancement addresses the need for better memory insights in OSD and MON components, enhancing overall system health and performance monitoring.
Related Issue
Fixes: https://tracker.ceph.com/issues/54525
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e