osd/../client_request: Fix clone replicated reads#62535
Conversation
| pg->get_perf_logger().inc(l_osd_replica_read_redirect_conflict); | ||
| co_await reply_op_error(pg, -EAGAIN); | ||
| co_return; | ||
| } |
There was a problem hiding this comment.
Same idea -- let's instead change can_serve_replica_read to simply check for any head or clone of the object and leave it where it was.
There was a problem hiding this comment.
IIUC we have 2 options here:
- Move
can_serve_replica_readto check the resolved object. - Check any head or clone of the object
For 2, we would need to iterate over each possible head/clone combination (similar to the previous version of this PR). Am I right?
IMO, checking only the resolved object seems more elegant, unless I'm missing a better way to implement the 2nd option above.
Note: Classic seems to call can_serve_replica_read for the unresolved object. I'll address this in a followup PR.
There was a problem hiding this comment.
One downside with moving can_serve_replica_read is that we check for writes much later in the flow, meaning we could have bounce this read earlier.
There was a problem hiding this comment.
I'm suggesting 2, but you can do it with one pass by simply ignoring the clone field, right?
There was a problem hiding this comment.
Looking again, checking writes on the clone actually checks writes on the head only. This actually makes sense as clone objects can't have any independent log entries. In any log entry of a clone (which is a creation of it or deletion/update of it when trimming) there must be also a separate log entry of the head object.
I kept the can_resolve in the original place of it - see last commit for clarification notes. What do you think?
bb65713 to
3673f06
Compare
e9ce2fc to
a1ae3b7
Compare
|
jenkins test make check |
|
jenkins test make check arm64 |
When checking if a client requested object is missing, we must first obtain the actual (resolved) object from the OSD prespective. See how Crimson handles this logic with resolve_oid (Classic does the same). For us to resolve an object we *must* obtain its object context. There are some scenarios (e.g serving replicated reads) where checking if the object (head or clone) is missing prior to obtaining the obc could be useful to avoid further lookups or unnecessarily complicated/ bug-prone pathways. With this function, we can check if any head or clone of this object is missing, without resolving the (client) requested object. Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
When handling a client request on a clone object, m->get_hobj() might not actually represent the "actual" object from the OSD's prespective. See: ObjectContext::resolve_oid for how this logic is handled. If this replica has any missing clone objects that are either this object head or any of this object's clones - drop this replicted read and bounce back to primary. We could possibly check if the head is not missing and then wait for resolve_oid to lookup for the actual clone object (Which might exist!). However, this seems to not be worth the complexity as the niche scenario of being able to serve reads while having (possibly) releated missing objects. Fixes (2/2): https://tracker.ceph.com/issues/70639 Signed-off-by: Matan Breizman <mbreizma@redhat.com>
For every clone object operation, we call recover_missing_snaps to make sure that the resolved object is not missing. For us to resolve_oid we need to lock the obc for RWREAD. See ClientRequest::recover_missing_snaps. Let's avoid locking the object if I don't have any possible missing clone objects. Otherwise, try to recover as before. This should positively affect reads even in an undegraded cluster state. Signed-off-by: Matan Breizman <mbreizma@redhat.com>
can_serve_replica_read uses PGLog::has_write_since, when checking for writes we actually check if any pglog entry belongs to the head object. The only two pglog entries that are of a clone object are: 1) At creation (pg_log_entry_t::CLONE) 2) At trimming (See remove_or_update) In both cases, the there would be another pg log entry of the head. --- Add assertion in prepare_head_update to assert that the above is true. The obc passed to prepare_head_update (by OpsExecuter) could also be a clone object (after being resolved). However, write operations should only occur to head - so let's verify that. Signed-off-by: Matan Breizman <mbreizma@redhat.com>
a1ae3b7 to
480b7c6
Compare
|
jenkins test make check |
|
jenkins test api |
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job Definition