Skip to content

mgr/PyFormatter: prevent segfault on non-printable characters#67022

Open
NitzanMordhai wants to merge 1 commit intoceph:mainfrom
NitzanMordhai:wip-nitzan-pyformatter-dump-string-non-utf8
Open

mgr/PyFormatter: prevent segfault on non-printable characters#67022
NitzanMordhai wants to merge 1 commit intoceph:mainfrom
NitzanMordhai:wip-nitzan-pyformatter-dump-string-non-utf8

Conversation

@NitzanMordhai
Copy link
Contributor

@NitzanMordhai NitzanMordhai commented Jan 21, 2026

If PyFormatter::dump_string attempting to format non-printable characters can result in a nullptr being passed to dump_pyobject, causing a segfault.

It occured in the following case - crash module tried to load the crash entries containing matedata with non-printable characters. when these were passed to PyFormatter, the resulting crash prevented ceph-mgr from starting.

Fix this by attempting to create a python object using PyUnicode_FromString. If it returns nullptr (indicating a decoding failure), the code now falls back to decoding using Latin1.
Additionally, add a guard in dump_pyobject to prevent processing nullptr objects, providing a secondary layer of protection against segfaults.

Fixes: https://tracker.ceph.com/issues/74379

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

void PyFormatter::dump_string(std::string_view name, std::string_view s)
{
dump_pyobject(name, PyUnicode_FromString(s.data()));
PyObject *p = PyUnicode_FromString(s.data());
Copy link
Contributor

@athanatos athanatos Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::string::data() is guaranteed to be null terminated (https://en.cppreference.com/w/cpp/string/basic_string/data.html), but std::string_view::data() is not (https://en.cppreference.com/w/cpp/string/basic_string_view/data.html). Thus, PyUnicode_FromString is inherently inappropriate here, use PyUnicode_FromStringAndSize instead. std::string_view doesn't guarantee null termination because it can be a slice into an arbitrary portion of another string and forcing null termination would require reallocating memory for the slice. We likely got away with it before because the string_view was likely an entire string and was therefore null-terminated anyway or something similar. What was it called on that caused a problem? Is it actually something that should really be a python string?

Copy link
Contributor Author

@NitzanMordhai NitzanMordhai Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crash module trying to load the crash entries from the store during restart:

def _load_crashes(self) -> None:
        raw = self.get_store_prefix('crash/')
        self.crashes = {k[6:]: json.loads(m) for (k, m) in raw.items()}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    f.dump_string(p->first.c_str() + base_prefix.size(), p->second);

Oddly, both of those string_view instances would be null terminated since they extend to the end of a std::string (by accident, to be clear, never rely on that behavior), so while we should still switch to using PyUnicode_FromStringAndSize I don't think that's the actual reason for the crash. Have you been able to force this crash in vstart? If it's an encoding problem, it should be pretty easy to force?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latin-1 still has invalid codepoints and can also fail to decode, I'm skeptical of blindly trying to decode with Latin-1 if Unicode failed unless we have some specific reason to do so. If the strings in question really aren't valid Unicode or Latin-1, it's going to be a problem putting them into a python string. Likely wise to work out why this is happening in the first place. Is the crash module putting invalid things into the crash/ prefix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other options would include skipping the undecodable entry or raising an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@athanatos, yes, I was able to enforce the crash by having non-printable characters in the value, that cause the utf8 conversion to fail and Latin-1 to be able to decode.
i agree that we need to use PyUnicode_FromStringAndSize for string_view, i'll change that.

Skipping the undecodable is better, raising an exception can cause the mgr to not be able to initialize (like in crash module starting issue)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely wise to work out why this is happening in the first place. Is the crash module putting invalid things into the crash/ prefix?

I think it happened because of debugging code that was sent to the user, the matedata contained part of hex dumpped decoded message that it was tried to catch before crash happened, during the restart of the ceph-mgr, the crash module tried to load the crash matedata and failed .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a coincidence that Latin-1 worked -- it's just harder to construct invalid Latin-1 sequences. What does the crash module actually use this for? If the crash module needs to be able to deal with arbitrary byte sequences (particularly if it needs to pass it back in...) without a particular encoding, the python side should be a bytes object, not a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@athanatos crash module expect return of valid JSON (stored via json.dumps).
We can skip the value in case of any encoding issue, but then I won't be able to issue a message in PyFormatter (unless i add return code from dump_string that will tell me that something went wrong

If PyFormatter::dump_string attempting to format non-printable characters
can result in a nullptr being passed to dump_pyobject, causing a segfault.

It occured in the following case - crash module tried to load the crash entries
containing matedata with non-printable characters.
when these were passed to PyFormatter, the resulting crash prevented ceph-mgr
from starting.

Fix this by attempting to create a python object using PyUnicode_FromString.
If it returns nullptr (indicating a decoding failure), the code now falls
back to decoding using Latin1.
Additionally, add a guard in dump_pyobject to prevent processing nullptr objects,
providing a secondary layer of protection against segfaults.

Fixes: https://tracker.ceph.com/issues/74379
Signed-off-by: Nitzan Mordechai <nmordec@ibm.com>
@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-pyformatter-dump-string-non-utf8 branch from bb5d536 to a7ad271 Compare January 26, 2026 12:57
*/
void PyFormatter::dump_pyobject(std::string_view name, PyObject *p)
{
if (!p) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!p) {
ceph_assert(p);

is fine IMO.

@rzarzynski
Copy link
Contributor

jenkins test windows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants