mgr/PyFormatter: prevent segfault on non-printable characters#67022
mgr/PyFormatter: prevent segfault on non-printable characters#67022NitzanMordhai wants to merge 1 commit intoceph:mainfrom
Conversation
src/mgr/PyFormatter.cc
Outdated
| void PyFormatter::dump_string(std::string_view name, std::string_view s) | ||
| { | ||
| dump_pyobject(name, PyUnicode_FromString(s.data())); | ||
| PyObject *p = PyUnicode_FromString(s.data()); |
There was a problem hiding this comment.
std::string::data() is guaranteed to be null terminated (https://en.cppreference.com/w/cpp/string/basic_string/data.html), but std::string_view::data() is not (https://en.cppreference.com/w/cpp/string/basic_string_view/data.html). Thus, PyUnicode_FromString is inherently inappropriate here, use PyUnicode_FromStringAndSize instead. std::string_view doesn't guarantee null termination because it can be a slice into an arbitrary portion of another string and forcing null termination would require reallocating memory for the slice. We likely got away with it before because the string_view was likely an entire string and was therefore null-terminated anyway or something similar. What was it called on that caused a problem? Is it actually something that should really be a python string?
There was a problem hiding this comment.
crash module trying to load the crash entries from the store during restart:
def _load_crashes(self) -> None:
raw = self.get_store_prefix('crash/')
self.crashes = {k[6:]: json.loads(m) for (k, m) in raw.items()}
There was a problem hiding this comment.
f.dump_string(p->first.c_str() + base_prefix.size(), p->second);
Oddly, both of those string_view instances would be null terminated since they extend to the end of a std::string (by accident, to be clear, never rely on that behavior), so while we should still switch to using PyUnicode_FromStringAndSize I don't think that's the actual reason for the crash. Have you been able to force this crash in vstart? If it's an encoding problem, it should be pretty easy to force?
There was a problem hiding this comment.
Latin-1 still has invalid codepoints and can also fail to decode, I'm skeptical of blindly trying to decode with Latin-1 if Unicode failed unless we have some specific reason to do so. If the strings in question really aren't valid Unicode or Latin-1, it's going to be a problem putting them into a python string. Likely wise to work out why this is happening in the first place. Is the crash module putting invalid things into the crash/ prefix?
There was a problem hiding this comment.
Other options would include skipping the undecodable entry or raising an exception.
There was a problem hiding this comment.
@athanatos, yes, I was able to enforce the crash by having non-printable characters in the value, that cause the utf8 conversion to fail and Latin-1 to be able to decode.
i agree that we need to use PyUnicode_FromStringAndSize for string_view, i'll change that.
Skipping the undecodable is better, raising an exception can cause the mgr to not be able to initialize (like in crash module starting issue)
There was a problem hiding this comment.
Likely wise to work out why this is happening in the first place. Is the crash module putting invalid things into the crash/ prefix?
I think it happened because of debugging code that was sent to the user, the matedata contained part of hex dumpped decoded message that it was tried to catch before crash happened, during the restart of the ceph-mgr, the crash module tried to load the crash matedata and failed .
There was a problem hiding this comment.
I think it's a coincidence that Latin-1 worked -- it's just harder to construct invalid Latin-1 sequences. What does the crash module actually use this for? If the crash module needs to be able to deal with arbitrary byte sequences (particularly if it needs to pass it back in...) without a particular encoding, the python side should be a bytes object, not a string.
There was a problem hiding this comment.
@athanatos crash module expect return of valid JSON (stored via json.dumps).
We can skip the value in case of any encoding issue, but then I won't be able to issue a message in PyFormatter (unless i add return code from dump_string that will tell me that something went wrong
If PyFormatter::dump_string attempting to format non-printable characters can result in a nullptr being passed to dump_pyobject, causing a segfault. It occured in the following case - crash module tried to load the crash entries containing matedata with non-printable characters. when these were passed to PyFormatter, the resulting crash prevented ceph-mgr from starting. Fix this by attempting to create a python object using PyUnicode_FromString. If it returns nullptr (indicating a decoding failure), the code now falls back to decoding using Latin1. Additionally, add a guard in dump_pyobject to prevent processing nullptr objects, providing a secondary layer of protection against segfaults. Fixes: https://tracker.ceph.com/issues/74379 Signed-off-by: Nitzan Mordechai <nmordec@ibm.com>
bb5d536 to
a7ad271
Compare
| */ | ||
| void PyFormatter::dump_pyobject(std::string_view name, PyObject *p) | ||
| { | ||
| if (!p) { |
There was a problem hiding this comment.
| if (!p) { | |
| ceph_assert(p); |
is fine IMO.
|
jenkins test windows |
If PyFormatter::dump_string attempting to format non-printable characters can result in a nullptr being passed to dump_pyobject, causing a segfault.
It occured in the following case - crash module tried to load the crash entries containing matedata with non-printable characters. when these were passed to PyFormatter, the resulting crash prevented ceph-mgr from starting.
Fix this by attempting to create a python object using PyUnicode_FromString. If it returns nullptr (indicating a decoding failure), the code now falls back to decoding using Latin1.
Additionally, add a guard in dump_pyobject to prevent processing nullptr objects, providing a secondary layer of protection against segfaults.
Fixes: https://tracker.ceph.com/issues/74379
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job DefinitionYou must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.