common/ceph_context.h: reserve space for breakpad in CephContext#64829
common/ceph_context.h: reserve space for breakpad in CephContext#64829
Conversation
rzarzynski
left a comment
There was a problem hiding this comment.
Generally LGTM apart a tiny nit.
src/common/ceph_context.h
Outdated
| std::unique_ptr<google_breakpad::ExceptionHandler> _ex_handler; | ||
| static_assert(sizeof(std::unique_ptr<google_breakpad::ExceptionHandler>) == sizeof(std::unique_ptr<char>)); | ||
| #else | ||
| std::unique_ptr<char> _ex_handler; |
There was a problem hiding this comment.
I would comment it's has its purpose. Perhaps `[[maybe_unused]]`` would be enough to communicate "do not remove!" to humans.
For cases when HAVE_BREAKPAD is off, supply exactly the same space in
CephContext struct.
While it should happen, jenkins seems to link binaries with different variants.
The noticeable artefacts of this misbehaviour are:
208 - unittest_bluefs (Bus error)
209 - unittest_bluefs_ex (Failed)
211 - unittest_bdev (Bus error)
Above mentioned unittests are failing because
ceph_context.h :
ceph::PluginRegistry *get_plugin_registry() {
return _plugin_registry;
}
^ _plugin_registry returned is at !!!offset off by 8 bytes!!! to the location of _plugin_registry as constructed at
ceph_context.cc :
743: _plugin_registry = new PluginRegistry(this);
This causes fatal error in
src/extblkdev/ExtBlkDevPlugin.cc :
227 auto registry = cct->get_plugin_registry();
228 std::lock_guard l(registry->lock);
Sometimes lock_guard hangs, sometimes lock_guard segfaults.
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
f4f9808 to
39af0bc
Compare
|
@aclamk is it possible to reproduce this issue?
|
|
in the commit message:
might want to put "While it should not happen". but still, i'd like to understand why it happens. in the title of the commit:
i'd suggest put: "common/ceph_context.h: reserve space for breakpad in CephContext` |
@tchaikov |
@aclamk Just curious, if you weren't able to reproduce, how did you arrive at this conclusion from
Is there a way to extract a core dump along with the matching binaries from Jenkins? |
|
It certainly makes me very nervous not to have a root cause. Who knows what will break next? |
There was a problem hiding this comment.
this issue is tracked by https://tracker.ceph.com/issues/71547. and i have already a pull request addressing it, see #64273.
recently, the test failure surfaced three times in a row when testing another pull request at #65075
- https://jenkins.ceph.com/job/ceph-pull-requests/165332/
- https://jenkins.ceph.com/job/ceph-pull-requests/165281/
- https://jenkins.ceph.com/job/ceph-pull-requests/165325/
after including #64273 in #65075, the test result is now green again. see
based on the observation above, i'm inclined to reject this change.
All of these jobs are x86, but #64273 talks exclusively about arm64. Are you sure the green job isn't just a coincidence? |
I logged in to specific jenkins host that was doing "make check" hanged doing one of the unittests, and did:
|
Ah, I didn't realize this was allowed/possible. Thanks for the explanation! |
Not really, one needs blessing (and access) from David Galloway. |
these load libceph-common.so as a shared library. so if something is installing ceph system packages, it's possible that the (pre-breakpad) system version gets loaded instead. but surely that would break lots of other tests - any theories why only these targets seem to be effected? |
Like |
@djgalloway i was thinking https://packages.ubuntu.com/jammy/main/ceph-common, which provides a libceph-common.so from the quincy release long before the WITH_BREAKPAD stuff was added |
|
The installed package version and size/etc. of libcommon can certainly be logged |
I remember I checked shared libraries. I think I would have noticed linking against system ceph libraries. |
|
@cbodley So it fails on libceph-common.so it compiled itself. |
|
This PR was tested as part of QA Run: Unfortunately there were quite a few new failures as documented in the wiki here: New Issues raised: @rzarzynski @ljflores fyi |
|
New trackers analysed, new issues unrelated Rados approved: https://tracker.ceph.com/projects/rados/wiki/MAIN#httpstrackercephcomissues72627 |
|
@aclamk pls merge at will when @tchaikov comments are resolved |
The mentioned race on ARM64 cannot affect underlying offset mismatch.
For cases when HAVE_BREAKPAD is off, supply exactly the same space in CephContext struct.
While it should not happen, jenkins seems to link binaries with different variants.
The noticeable artefacts of this misbehaviour are:
208 - unittest_bluefs (Bus error)
209 - unittest_bluefs_ex (Failed)
211 - unittest_bdev (Bus error)
Above mentioned unittests are failing because
ceph_context.h :
ceph::PluginRegistry *get_plugin_registry() {
return _plugin_registry;
}
^ _plugin_registry returned is at !!!offset off by 8 bytes!!! to the location of _plugin_registry as constructed at ceph_context.cc :
743: _plugin_registry = new PluginRegistry(this);
This causes fatal error in
src/extblkdev/ExtBlkDevPlugin.cc :
227 auto registry = cct->get_plugin_registry();
228 std::lock_guard l(registry->lock);
Sometimes lock_guard hangs, sometimes lock_guard segfaults.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job Definition