osdc/Journaler: derr log when header update get (possibly) overwritten out-of-order#55265
osdc/Journaler: derr log when header update get (possibly) overwritten out-of-order#55265
Conversation
|
jenkins test make check |
|
@rzarzynski ping? |
batrick
left a comment
There was a problem hiding this comment.
BTW, the mds is the only user of this Journaler (rbd uses src/journal). It's in our scope to maintain it, not RADOS.
src/osdc/Journaler.cc
Outdated
| if (wrote.write_pos < last_committed.write_pos || | ||
| wrote.expire_pos < last_committed.expire_pos || | ||
| wrote.trimmed_pos < last_committed.trimmed_pos) { | ||
| lderr(cct) << __func__ << ": not updating last_committed" << dendl; |
There was a problem hiding this comment.
I'm wary of just skipping the header update. Any objection to an actual assertion?
There was a problem hiding this comment.
Fair enough - I'll add the assert.
There was a problem hiding this comment.
@batrick why are you wary of skipping the header update here ?
There was a problem hiding this comment.
@batrick why are you wary of skipping the header update here ?
Because the Journaler had previously, successfully written a header with last_committed beyond the header write that just succeeded. This may indicate that we have clobbered the newest update to head. We cannot happily continue after having done so.
Looking at this code again, there's little to prevent this kind of error. It really should be harnessing the smarts of the OSD by encoding a sequence code -- the current write_pos -- in the write (probably stored in the omap) which fails the op if the write's sequence code is behind what the OSD has.
There was a problem hiding this comment.
Looking at this code again, there's little to prevent this kind of error. It really should be harnessing the smarts of the OSD by encoding a sequence code -- the current write_pos -- in the write (probably stored in the omap) which fails the op if the write's sequence code is behind what the OSD has.
I agree and IMO this is something we would want to add at least when flushing journal header.
In that case we should get the journaler sources in src/mds (just as an logical dependency). |
| if (wrote.write_pos < last_committed.write_pos || | ||
| wrote.expire_pos < last_committed.expire_pos || | ||
| wrote.trimmed_pos < last_committed.trimmed_pos) { | ||
| lderr(cct) << __func__ << ": not updating last_committed: " | ||
| << "(wrote.write_pos/last_committed.write_pos=" | ||
| << wrote.write_pos << "," << last_committed.write_pos < "), " | ||
| << "(wrote.expire_pos/last_committed.expire_pos=" | ||
| << wrote.expire_pos << "," << last_committed.expire_pos < "), " | ||
| << "(wrote.trimmed_pos/last_committed.trimmed_pos=" | ||
| << wrote.trimmed_pos << "," << last_committed.trimmed_pos < ")" | ||
| << dendl; | ||
| ceph_abort(); | ||
| } else { | ||
| last_committed = wrote; | ||
| } |
There was a problem hiding this comment.
With this change, the old values are already written to the disk and we have no clue whether the write op is out of sync or the just the update notification or both. But the log should help as a reference for setting the header field values, using cephfs-journal-tool, when the assertion failure ceph_assert(trim_to >= trimming_pos) is hit.
There was a problem hiding this comment.
but the assertion failure in MDLog::_trim() will not happen due to the ceph_abort() here
There was a problem hiding this comment.
Yeh, right. This is just a hunch and right now we have no idea what could have happened that caused the journal pointer values.
|
jenkins retest this please |
|
@ceph/cephfs gentle nudge on this... |
|
@ceph/cephfs gentle nudge (needs approval to put this through testing) |
|
jenkins retest this please |
batrick
left a comment
There was a problem hiding this comment.
LGTM but we'll want a follow-up ticket for
|
I'll fix that up real quick. |
…overwritten out-of-order See: https://tracker.ceph.com/issues/58878#note-4 Signed-off-by: Venky Shankar <vshankar@redhat.com>
|
jenkins test make check |
|
jenkins test make check arm64 |
|
jenkins test make check arm64 |
|
This PR is under test in https://tracker.ceph.com/issues/66521. |
osdc/Journaler: derr log when header update get (possibly) overwritten out-of-order Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
See: https://tracker.ceph.com/issues/58878#note-4
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e