Out with blkin, in with OpenTelemetry#59365
Conversation
Signed-off-by: Omri Zeneva <ozeneva@redhat.com> Signed-off-by: Adam Emerson <aemerson@redhat.com>
We create a static default span context for funcitons that do not have trace info, but must pass it forward. so with this object we won't create and copy the context each time Signed-off-by: Omri Zeneva <ozeneva@redhat.com> Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Omri Zeneva <ozeneva@redhat.com> Signed-off-by: Adam Emerson <aemerson@redhat.com>
this commit replaces the blkin param from api methods by opentelemetry span context Signed-off-by: Omri Zeneva <ozeneva@redhat.com> Co-authored-by: Adam Emerson <aemerson@redhat.com> Signed-off-by: Adam Emerson <aemerson@redhat.com>
instead of blkin trace info that passes with messages, we replace it with o-tel trace info we must support backward compatibility so we need to keep decode the old trace info, but new messages will have o-tel trace, so version increase is mandatory Signed-off-by: Omri Zeneva <ozeneva@redhat.com> Signed-off-by: Adam Emerson <aemerson@redhat.com>
since we removed the blkin submodule, we can't use it in file/bluestore also Signed-off-by: Omri Zeneva <ozeneva@redhat.com> Signed-off-by: Adam Emerson <aemerson@redhat.com>
|
This probably needs some kind of upgrade testing due to a new Message version. |
This commit removes the blkin submodule and references to it. The only part remaining is the dummy `blkin_trace_info` data structure and its coded, now moved into `Message.cc`. Signed-off-by: Adam Emerson <aemerson@redhat.com>
bb52cfc to
dc1bfa5
Compare
|
jenkins retest this please |
| want_to_read(std::move(_want_to_read)), | ||
| to_read(std::move(_to_read)) { | ||
| to_read(std::move(_to_read)), | ||
| otel_trace(tracing::osd::tracer.add_span("EC ReadOp", op->osd_trace)) { |
There was a problem hiding this comment.
is this a new span that did not exists in blkin?
| _op->get_nonconst_req()); | ||
| parent->maybe_preempt_replica_scrub(op->op.soid); | ||
| handle_sub_write(op->op.from, _op, op->op, _op->pg_trace, *get_parent()->get_eclistener()); | ||
| auto sub_span = tracing::osd::tracer.add_span("handle_sub_write", |
There was a problem hiding this comment.
are the span names the original names used in blkin?
There was a problem hiding this comment.
@yuvalif I've struggled to understand this for a while so let me know if I'm talking rubbish here.
It seems that in blkin you could add an "event" to a remote span reference which is what this code was doing.
In OTEL that isn't possible and instead you have to create a new local span and call AddEvent on that. In this case, the local span uses the span_context (retrieved from the message) to reference the remote span.
So I don't think that there is an "original name" that I could compare this against?
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
unstale please |
|
@adamemerson / @yuvalif - is this something we still plan to do? |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
i hope so... |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
@adamemerson I have rebased the commits in this PRt on the latest head. I am now working my way through the comments from @yuvalif trying to address them, and have still got some more work to do there. In the meantime I was wondering what would be the best way of delivering these fixes? Is the normal approach to update this PR with the new commits or to create a new PR with references to the original? |
You should close this and make a new PR, that way you can be in charge of it. Thank you! |
| static constexpr int HEAD_VERSION = 3; | ||
| static constexpr int COMPAT_VERSION = 1; | ||
| static constexpr int HEAD_VERSION = 4; | ||
| static constexpr int COMPAT_VERSION = 2; |
There was a problem hiding this comment.
why did you change the compat version?
if the trace is the last encoded item, and we were had support 1-3, why can't we have 1-4 support?
There was a problem hiding this comment.
I didn't understand why that change was made either. Have changed it back to 1 in the latest set of commits
There was a problem hiding this comment.
@jagombar was this originally your change? or was it done in the original PR by @adamemerson ?
There was a problem hiding this comment.
@yuvalif I haven't pushed any of my changes to this PR (I assume that I don't have permission to do so either). I'm not sure if it was Adam or Omri Zeneva that made this change originally.
You can see the latest state of my changes at https://github.com/jagombar/ceph/tree/wip-agombar-old-otel-without-zipkin
There was a problem hiding this comment.
ok, got it. in this case, probably best to create a new PR based on your branch (we will close this one, and reference the new PR).
| static constexpr int HEAD_VERSION = 2; | ||
| static constexpr int COMPAT_VERSION = 1; | ||
| static constexpr int HEAD_VERSION = 3; | ||
| static constexpr int COMPAT_VERSION = 2; |
There was a problem hiding this comment.
| static constexpr int HEAD_VERSION = 2; | ||
| static constexpr int COMPAT_VERSION = 1; | ||
| static constexpr int HEAD_VERSION = 3; | ||
| static constexpr int COMPAT_VERSION = 2; |
There was a problem hiding this comment.
| @@ -52,6 +52,9 @@ class MOSDECSubOpWriteReply : public MOSDFastDispatchOp { | |||
| decode(map_epoch, p); | |||
| decode(op, p); | |||
| if (header.version >= 2) { | |||
There was a problem hiding this comment.
| if (header.version >= 2) { | |
| if (header.version >= 3) { |
| static constexpr int HEAD_VERSION = 2; | ||
| static constexpr int COMPAT_VERSION = 1; | ||
| static constexpr int HEAD_VERSION = 3; | ||
| static constexpr int COMPAT_VERSION = 2; |
There was a problem hiding this comment.
| static constexpr int HEAD_VERSION = 8; | ||
| static constexpr int COMPAT_VERSION = 2; | ||
| static constexpr int HEAD_VERSION = 9; | ||
| static constexpr int COMPAT_VERSION = 3; |
There was a problem hiding this comment.
| header.version = HEAD_VERSION; | ||
| encode(min_epoch, payload); | ||
| encode_trace(payload, features); | ||
| encode_otel_trace(payload, features); |
There was a problem hiding this comment.
is this controlled by the features bitmap?
(i guess that otherwise, it will be breaking backward compatibility)
| // the heartbeat sessions. | ||
| bool require_authorizer = true; | ||
|
|
||
| tracing::Tracer tracer{cct, "io.ceph.Messenger."}; |
There was a problem hiding this comment.
why should the messanger have its own tracer?
There was a problem hiding this comment.
Removed. The only span that it was tracing wasn't traced by blkin
| decode(map_epoch, p); | ||
| decode(op, p); | ||
| if (header.version >= 2) { | ||
| if (header.version >= 4) { |
There was a problem hiding this comment.
I think that this should be >= 3. Fixed in latest set of commits
| static constexpr int HEAD_VERSION = 3; | ||
| static constexpr int COMPAT_VERSION = 1; | ||
| static constexpr int HEAD_VERSION = 4; | ||
| static constexpr int COMPAT_VERSION = 2; |
There was a problem hiding this comment.
reverted COMPAT_VERSION change
| static constexpr int HEAD_VERSION = 2; | ||
| static constexpr int COMPAT_VERSION = 1; | ||
| static constexpr int HEAD_VERSION = 3; | ||
| static constexpr int COMPAT_VERSION = 2; |
There was a problem hiding this comment.
reverted COMPAT_VERSION change here too
|
I have opened #67902 with an updated version of the changes in this PR. |
Remove all uses of the blkin library, replacing them with OpenTelemetry and preserving traces.
This PR is a partial reworking of #48971 .
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e