Conversation
0698f24 to
0b72ffa
Compare
|
Excellent job diagnosing this one Nitzan! |
There was a problem hiding this comment.
The 'finish' function previously retained memory, leading to extended holding periods by the garbage collector.
i am not sure i understand the connection between "cPython" function and memory retained by "finish" function.
could you please update the commit message to explain why we don't need to release the reference obtained by PyObject_GetAttrString()?
@tchaikov ,I'll add back the set_fn and the Py_DECREF of it, yes, its also leaking memory if we don't decrement it as well. |
0b72ffa to
66e4220
Compare
66e4220 to
202f3b3
Compare
202f3b3 to
d31c03c
Compare
There was a problem hiding this comment.
This prolonged retention caused certain modules, such as 'rest'
to encounter out-of-memory (OOM) errors when the returned results consumed
substantial memory.
prolonged retention of what? and why is it connected to the rest module? why does this issue manifest itself in pacific? is it a regression introduced by pacific?
The suggested improvement involves employing cPython calls with more efficient
memory management, addressing the issue of prolonged memory retention during
the 'finish' function execution.
please note, if we want to use capitalised character in the name, we should make it right. CPython is a Python implementation written in the C language and Python. where C should be capitalised.
and i don't quite understand how this change addresses the "prolonged memory retention during the 'finish' function execution". what is the memory use for? and how you make the GC more efficient? or shorten the life cycle of what ?
and did you manage to reproduce this issue? how is this fix connected to https://tracker.ceph.com/issues/59580 ?
EDIT, i see you managed to reproduce this issue. but i cannot find the connection between MonCommandCompletion::finish and the splot.
src/pybind/mgr/restful/module.py
Outdated
| with self.requests_lock: | ||
| request = CommandsRequest(_request) | ||
| self.requests.append(request) | ||
| self.requests = self.requests[-self.max_requests:] |
There was a problem hiding this comment.
i'd like to see the analysis of the impact of this change.
There was a problem hiding this comment.
I ran few thousand pg dump command with and without the trimming of the requests array, there is a big improvement with the trimming, users that already checked that fix as patch mentioned that it helped a bit, but the manger was still consuming more memory and finally OOM and killed.
There was a problem hiding this comment.
@NitzanMordhai could you please extract this commit into another PR? the massif output is great. but it is not analysis. by analysis, i mean, the pros and cons of having an unbounded trim limit, and pros and cons of having a bounded trim limit. why 500 is good enough. and explain the reason why we need to keep self.requests to reviewers.
There was a problem hiding this comment.
I think I have the same question as @tchaikov -- as a reviewer, I don't actually know what the purpose of self.requests is. If it's safe to trim it, do we need it at all? Why is it mandatory to maintain 500 of them, but not 501?
so this is but a guess. but the commit message does not reflect the fact that it is a guess. and it closes the issue. what i can find in the plot and in the backtrace is that
in your original version you were still constructing Python strings and feeding them to what i'd suggest is
|
|
It's not obvious to me what the leak being fixed by "mgr/BaseMgrModule: Optimize cPython Call in Finish Function" is. Your change does plausibly reduce the time during which the arguments are retained, but it doesn't obviously explain why users are observing monotonically growing memory as I don't see a path where any of those objects are leaked. Perhaps I'm missing something? |
|
Is it possible that the main benefit of this PR is the second commit limiting the size of that requests array? |
d31c03c to
9ac4c2c
Compare
|
@tchaikov @athanatos i split the PR as you suggested: #54984 |
9ac4c2c to
8b3b77d
Compare
Remove CPython overhead packing tuple during the 'finish' function to improve memory consumption when we deal with long-string outputs. When modules like Restful return large amounts of output the use of PyObject_CallFunction without createing PyObject will reduce the time the memory held by the mgr. Fixes: https://tracker.ceph.com/issues/59580 Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
8b3b77d to
247ace1
Compare
|
lgtm |
|
i am not sure if we should close this issue with this change though. Better off "Refs" it. |
|
jenkins retest this please |




Remove CPython overhead packing tuple during the 'finish' function to
improve memory consumption when we deal with long-string outputs.
When modules like Restful return large amounts of output the use
of PyObject_CallFunction without createing PyObject will reduce the
time the memory held by the mgr.
Fixes: https://tracker.ceph.com/issues/59580
Validation Process:
During the validation process, I conducted tests on a cluster consisting of 12 OSDs and approximately 5000 PGs. As a result, the PG DUMP commands returned extensive outputs. To enhance memory usage and facilitate conversion, I utilized the "pg dump_json" command.
The script, when run against the Manager (mgr) without the fix, generated the following Massif output:

In this output, the last snapshot indicated a memory usage of 1.1GB. Notably, one of the function calls contributing to this memory usage was identified as follows:
from 722.3 MiB: MonCommandCompletion::finish(int) (BaseMgrModule.cc:97)
Subsequently, I ran the same script against the Manager with the optimized code in BaseMgrModule, resulting in the following Massif output:

In this optimized scenario, the maximum allocated memory was approximately 400MB, and it did not exceed this threshold. This demonstrates a significant improvement in memory usage compared to the version without the fix.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e