Skip to content

crimson/osd: execute submit_error_log as an ExclusivePhase#54287

Merged
Matan-B merged 13 commits intoceph:mainfrom
Matan-B:wip-matanb-crimson-do_osd_ops_execute-v3
Nov 29, 2023
Merged

crimson/osd: execute submit_error_log as an ExclusivePhase#54287
Matan-B merged 13 commits intoceph:mainfrom
Matan-B:wip-matanb-crimson-do_osd_ops_execute-v3

Conversation

@Matan-B
Copy link
Contributor

@Matan-B Matan-B commented Nov 1, 2023

Blocked by: #54040

Introduced: #39772

Fixes: https://tracker.ceph.com/issues/61651

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@Matan-B Matan-B force-pushed the wip-matanb-crimson-do_osd_ops_execute-v3 branch from 3d7efa6 to 1aad6da Compare November 1, 2023 10:43
@Matan-B
Copy link
Contributor Author

Matan-B commented Nov 1, 2023

@Matan-B Matan-B force-pushed the wip-matanb-crimson-do_osd_ops_execute-v3 branch 11 times, most recently from 9a7618a to cc0b061 Compare November 6, 2023 14:19
@Matan-B Matan-B requested a review from xxhdx1985126 November 6, 2023 14:25
@cyx1231st cyx1231st self-requested a review November 7, 2023 01:14
@Matan-B Matan-B force-pushed the wip-matanb-crimson-do_osd_ops_execute-v3 branch from cc0b061 to b02676b Compare November 7, 2023 11:12
@Matan-B Matan-B requested a review from athanatos November 7, 2023 11:16
@Matan-B
Copy link
Contributor Author

Matan-B commented Nov 7, 2023

@athanatos, the call to error_log_fut was moved to do_osd_ops_execute and the original split between do_osd_ops_execute and submitted_fut is reverted. The diff is much cleaner now. However, IIUC, it still doesn't solve the issue described in b02676b.

@Matan-B Matan-B force-pushed the wip-matanb-crimson-do_osd_ops_execute-v3 branch from b02676b to 73016ca Compare November 7, 2023 16:48
Comment on lines +936 to +956
return (*error_func_ptr)(e, rep_tid).then(
[failure_func_ptr, e, rep_tid] {
return PG::do_osd_ops_iertr::make_ready_future<pg_rep_op_fut_t<Ret>>(
std::move(seastar::now()),
std::move((*failure_func_ptr)(e, rep_tid))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/2: This part lets submit_error_log run from within do_osd_ops_execute which resolves the original issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is chained after the future returned from submit_transaction, which would otherwise resolve into the submitted and completed futures. This one, I think, as you observe is the one that resolves the bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that error_func_ptr could return an eversion_t, pass it into failure_func_ptr, and thereby avoid the log_entry_version map.

If we retain the log_entry_version map, it probably needs to be cleared during interval change.

Copy link
Contributor Author

@Matan-B Matan-B Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that error_func_ptr could return an eversion_t, pass it into failure_func_ptr, and thereby avoid the log_entry_version map.

After the last commit, it is no longer possible. (It's still possible in a way, but much less elegant than before)

If we retain the log_entry_version map, it probably needs to be cleared during interval change.

Fixed.

Comment on lines +910 to +915
[this, e, failure_func_ptr, error_func_ptr] {
auto rep_tid = shard_services.get_tid();
return (*error_func_ptr)(e, rep_tid)
.then([failure_func_ptr, e, rep_tid] {
return (*failure_func_ptr)(e , rep_tid);
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2/2: The second submit_error_log call, since it's being run from within do_osd_ops_execute it should also be safe . Even though it's packed inside all_completed_fut below (in case of an error), it will be executed before returning pg_rep_op_fut_t to the caller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this handler is actually chained after _all_complete_fut and could therefore happen after we leave the exclusive phase. On the other hand, I think this case is actually impossible -- any error here would have to come from the _all_completed_fut returned from submit_transaction, which would have to be an error with the actual transaction submitted to the backing store or somehow with the messages received from peers. I think any error here could be treated as fatal to the OSD.

Copy link
Contributor Author

@Matan-B Matan-B Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this handler is actually chained after _all_complete_fut and could therefore happen after we leave the exclusive phase. On the other hand, I think this case is actually impossible -- any error here would have to come from the _all_completed_fut returned from submit_transaction, which would have to be an error with the actual transaction submitted to the backing store or somehow with the messages received from peers. I think any error here could be treated as fatal to the OSD.

This case was possible since there were non-fatal errors that could have been returned as well. I added an assert to verify that we handle only those.
Moreover, I moved the call to error_func_ptr earlier - before actually returning the non-fatal errors. That way, it will be executed during the exclusive phase.

@Matan-B Matan-B marked this pull request as ready for review November 7, 2023 16:52
@Matan-B Matan-B requested a review from a team as a code owner November 7, 2023 16:52
@Matan-B Matan-B force-pushed the wip-matanb-crimson-do_osd_ops_execute-v3 branch 2 times, most recently from a1bf4b8 to 4692a09 Compare November 8, 2023 16:21
@Matan-B
Copy link
Contributor Author

Matan-B commented Nov 8, 2023

https://pulpito.ceph.com/matan-2023-11-08_11:46:17-crimson-rados-wip-matanb-crimson-do_osd_ops_execute-v3-distro-crimson-smithi/

30/34 passed.
Results looks good (!)
The issue was previously seen consistently in the suite and seems to be resolved.

Failures look unrelated and are new since they were previously hidden by the issue resolved in this PR:

…func_t

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Payload is already decoded in IOHandler::read_message (decode_message).

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
* '!log_entries.empty()' assert instead of if-case.
  log_entries entry is inserted right before.

* 'version != eversion_t()' assert instead of if-case.
  since op_info.may_write() is true, we should have a non-empty version.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
`submit_error_log()` was returning `version` to be used later in
`failure_func` call to `complete_write()`.

Maintain the version returned from `submit_error_log()` in a dedicated map
to avoid handling the lifetime of 'version'.

Note: This change is crucial to the following change that will
      return 'error_fut' separately.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
This change is crucial for the next commits,
submit_error_log and failure_func should share the same
rep_tid.

to be shared later with error_log call

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
@Matan-B
Copy link
Contributor Author

Matan-B commented Nov 16, 2023

@Matan-B Matan-B force-pushed the wip-matanb-crimson-do_osd_ops_execute-v3 branch from 740ce28 to 38a6b7c Compare November 16, 2023 10:06
@Matan-B Matan-B added the TESTED label Nov 16, 2023
Use chained futurized `send_to_osd()` instead of voided `send_cluster_message()`.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
```
submit_error_log records the result of an IO into the pg log so that we can return
the same error code if the client resends the request.
This should only be relevant for logical errors resulting from the target object state
-- for example, EEXIST returned on an exclusive create -- because there is application
logic built to rely on them.
In classic, the only such site is if the return value from do_osd_ops is negative
(or the transaction is empty) -- see PrimaryLogPG::prepare_transaction,
specifically where we set update_log_only to true.

We do not want to record space usage errors or errors specific to conditions on the primary
OSD such as IO errors -- submit_error_log isn't a catch-all error path.
```

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
@Matan-B Matan-B removed the TESTED label Nov 19, 2023
Previously, submit_error_log was chained to failure_func
returned future.
Now submit_error_log is called from within do_osd_ops_execute

Fixes: https://tracker.ceph.com/issues/61651

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
@Matan-B Matan-B force-pushed the wip-matanb-crimson-do_osd_ops_execute-v3 branch from 38a6b7c to 2c5fbee Compare November 19, 2023 12:27
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
@Matan-B Matan-B force-pushed the wip-matanb-crimson-do_osd_ops_execute-v3 branch from 2c5fbee to 4531290 Compare November 19, 2023 12:31
@Matan-B Matan-B requested a review from athanatos November 19, 2023 12:32
@Matan-B Matan-B added the TESTED label Nov 20, 2023
@Matan-B
Copy link
Contributor Author

Matan-B commented Nov 20, 2023

Copy link
Contributor

@athanatos athanatos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's more simplification to be done, but I'd be ok with doing it in a subsequent PR if this one is passing tests. Up to you.

return maybe_rollback_fut.then_interruptible(
[error_func_ptr, e, rep_tid, failure_func_ptr] {
// record error log
return (*error_func_ptr)(e, rep_tid).then(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be the only actual user of error_func_ptr -- why not simply move the declaration of maybe_submit_error_log here and avoid needing error_func_ptr entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I preferred capturing all of the dependencies here instead of moving them later on

peering_state.complete_write(log_entry_version[rep_tid], last_complete);
log_entry_version.erase(rep_tid);
logger().debug("do_osd_ops_execute::failure_func write complete,"
" erasing rep_tid {}", rep_tid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This else branch seems like it should have been moved into do_osd_ops_execute as well. I'd probably factor it into a helper (complete_error_log) and chain it after the error_func_ptr invocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Matan-B Matan-B merged commit 3d760f1 into ceph:main Nov 29, 2023
@Matan-B
Copy link
Contributor Author

Matan-B commented Nov 29, 2023

There's more simplification to be done, but I'd be ok with doing it in a subsequent PR if this one is passing tests. Up to you.

I prefer addressing the cleanups in a subsequent PR to let main branch not to have this issue anymore. Thanks!

@Matan-B Matan-B mentioned this pull request Dec 4, 2023
14 tasks
return maybe_rollback_fut.then_interruptible(
[error_func_ptr, e, rep_tid, failure_func_ptr] {
// record error log
return (*error_func_ptr)(e, rep_tid).then(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, submit_error_log() is moved from the concurrent wait_repop phase to the exclusive process phase, which should address the ordering issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not encountered since merge yet :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants