Skip to content

Out with blkin, in with OpenTelemetry#59365

Open
adamemerson wants to merge 8 commits intoceph:mainfrom
adamemerson:wip-blkin-cleanup
Open

Out with blkin, in with OpenTelemetry#59365
adamemerson wants to merge 8 commits intoceph:mainfrom
adamemerson:wip-blkin-cleanup

Conversation

@adamemerson
Copy link
Contributor

@adamemerson adamemerson commented Aug 21, 2024

Remove all uses of the blkin library, replacing them with OpenTelemetry and preserving traces.

This PR is a partial reworking of #48971 .

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

Omri Zeneva and others added 7 commits August 20, 2024 20:47
Signed-off-by: Omri Zeneva <ozeneva@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
We create a static default span context for funcitons
that do not have trace info, but must pass it forward.
so with this object we won't create and copy the context
each time

Signed-off-by: Omri Zeneva <ozeneva@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Omri Zeneva <ozeneva@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
this commit replaces the blkin param from api methods by opentelemetry
span context

Signed-off-by: Omri Zeneva <ozeneva@redhat.com>

Co-authored-by: Adam Emerson <aemerson@redhat.com>

Signed-off-by: Adam Emerson <aemerson@redhat.com>
instead of blkin trace info that passes with messages,
we replace it with o-tel trace info
we must support backward compatibility so we need
to keep decode the old trace info, but new messages
will have o-tel trace, so version increase is mandatory

Signed-off-by: Omri Zeneva <ozeneva@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
since we removed the blkin submodule, we can't use it
in file/bluestore also

Signed-off-by: Omri Zeneva <ozeneva@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
@adamemerson
Copy link
Contributor Author

This probably needs some kind of upgrade testing due to a new Message version.

This commit removes the blkin submodule and references to it. The only
part remaining is the dummy `blkin_trace_info` data structure and its
coded, now moved into `Message.cc`.

Signed-off-by: Adam Emerson <aemerson@redhat.com>
@adamemerson adamemerson requested a review from a team as a code owner August 21, 2024 03:59
@adamemerson
Copy link
Contributor Author

jenkins retest this please

want_to_read(std::move(_want_to_read)),
to_read(std::move(_to_read)) {
to_read(std::move(_to_read)),
otel_trace(tracing::osd::tracer.add_span("EC ReadOp", op->osd_trace)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a new span that did not exists in blkin?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

_op->get_nonconst_req());
parent->maybe_preempt_replica_scrub(op->op.soid);
handle_sub_write(op->op.from, _op, op->op, _op->pg_trace, *get_parent()->get_eclistener());
auto sub_span = tracing::osd::tracer.add_span("handle_sub_write",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the span names the original names used in blkin?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuvalif I've struggled to understand this for a while so let me know if I'm talking rubbish here.
It seems that in blkin you could add an "event" to a remote span reference which is what this code was doing.
In OTEL that isn't possible and instead you have to create a new local span and call AddEvent on that. In this case, the local span uses the span_context (retrieved from the message) to reference the remote span.
So I don't think that there is an "original name" that I could compare this against?

@github-actions github-actions bot removed the stale label Nov 4, 2024
@github-actions
Copy link

github-actions bot commented Jan 3, 2025

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jan 3, 2025
@yuvalif
Copy link
Contributor

yuvalif commented Jan 6, 2025

unstale please

@github-actions github-actions bot removed the stale label Jan 6, 2025
@jmundack
Copy link
Contributor

@adamemerson / @yuvalif - is this something we still plan to do?

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Apr 25, 2025
@yuvalif
Copy link
Contributor

yuvalif commented May 5, 2025

@adamemerson / @yuvalif - is this something we still plan to do?

i hope so...

@github-actions github-actions bot removed the stale label May 5, 2025
@github-actions
Copy link

github-actions bot commented Jul 4, 2025

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jul 4, 2025
@adamemerson adamemerson added pinned Use this label if you want to exempt a PR from being stalled and removed stale labels Jul 4, 2025
@jagombar
Copy link
Contributor

jagombar commented Mar 4, 2026

@adamemerson I have rebased the commits in this PRt on the latest head.  I am now working my way through the comments from @yuvalif trying to address them, and have still got some more work to do there. In the meantime I was wondering what would be the best way of delivering these fixes?  Is the normal approach to update this PR with the new commits or to create a new PR with references to the original?

@adamemerson
Copy link
Contributor Author

@adamemerson I have rebased the commits in this PRt on the latest head. I am now working my way through the comments from @yuvalif trying to address them, and have still got some more work to do there. In the meantime I was wondering what would be the best way of delivering these fixes? Is the normal approach to update this PR with the new commits or to create a new PR with references to the original?

You should close this and make a new PR, that way you can be in charge of it. Thank you!

@adamemerson adamemerson removed the pinned Use this label if you want to exempt a PR from being stalled label Mar 5, 2026
static constexpr int HEAD_VERSION = 3;
static constexpr int COMPAT_VERSION = 1;
static constexpr int HEAD_VERSION = 4;
static constexpr int COMPAT_VERSION = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change the compat version?
if the trace is the last encoded item, and we were had support 1-3, why can't we have 1-4 support?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand why that change was made either. Have changed it back to 1 in the latest set of commits

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jagombar was this originally your change? or was it done in the original PR by @adamemerson ?

Copy link
Contributor

@jagombar jagombar Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuvalif I haven't pushed any of my changes to this PR (I assume that I don't have permission to do so either). I'm not sure if it was Adam or Omri Zeneva that made this change originally.
You can see the latest state of my changes at https://github.com/jagombar/ceph/tree/wip-agombar-old-otel-without-zipkin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, got it. in this case, probably best to create a new PR based on your branch (we will close this one, and reference the new PR).

static constexpr int HEAD_VERSION = 2;
static constexpr int COMPAT_VERSION = 1;
static constexpr int HEAD_VERSION = 3;
static constexpr int COMPAT_VERSION = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

static constexpr int HEAD_VERSION = 2;
static constexpr int COMPAT_VERSION = 1;
static constexpr int HEAD_VERSION = 3;
static constexpr int COMPAT_VERSION = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -52,6 +52,9 @@ class MOSDECSubOpWriteReply : public MOSDFastDispatchOp {
decode(map_epoch, p);
decode(op, p);
if (header.version >= 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (header.version >= 2) {
if (header.version >= 3) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

static constexpr int HEAD_VERSION = 2;
static constexpr int COMPAT_VERSION = 1;
static constexpr int HEAD_VERSION = 3;
static constexpr int COMPAT_VERSION = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

static constexpr int HEAD_VERSION = 8;
static constexpr int COMPAT_VERSION = 2;
static constexpr int HEAD_VERSION = 9;
static constexpr int COMPAT_VERSION = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

header.version = HEAD_VERSION;
encode(min_epoch, payload);
encode_trace(payload, features);
encode_otel_trace(payload, features);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this controlled by the features bitmap?
(i guess that otherwise, it will be breaking backward compatibility)

// the heartbeat sessions.
bool require_authorizer = true;

tracing::Tracer tracer{cct, "io.ceph.Messenger."};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should the messanger have its own tracer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. The only span that it was tracing wasn't traced by blkin

decode(map_epoch, p);
decode(op, p);
if (header.version >= 2) {
if (header.version >= 4) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this should be >= 3. Fixed in latest set of commits

static constexpr int HEAD_VERSION = 3;
static constexpr int COMPAT_VERSION = 1;
static constexpr int HEAD_VERSION = 4;
static constexpr int COMPAT_VERSION = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted COMPAT_VERSION change

static constexpr int HEAD_VERSION = 2;
static constexpr int COMPAT_VERSION = 1;
static constexpr int HEAD_VERSION = 3;
static constexpr int COMPAT_VERSION = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted COMPAT_VERSION change here too

@jagombar
Copy link
Contributor

I have opened #67902 with an updated version of the changes in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants