Skip to content

Out with blkin, in with OpenTelemetry#67902

Draft
jagombar wants to merge 12 commits intoceph:mainfrom
jagombar:wip-agombar-old-otel-without-zipkin
Draft

Out with blkin, in with OpenTelemetry#67902
jagombar wants to merge 12 commits intoceph:mainfrom
jagombar:wip-agombar-old-otel-without-zipkin

Conversation

@jagombar
Copy link
Contributor

Remove all uses of the blkin library, replacing them with OpenTelemetry and preserving traces.

This PR is a partial reworking of #59365.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

Omri Zeneva and others added 12 commits March 4, 2026 15:22
Signed-off-by: Omri Zeneva <ozeneva@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
We create a static default span context for funcitons
that do not have trace info, but must pass it forward.
so with this object we won't create and copy the context
each time

Signed-off-by: Omri Zeneva <ozeneva@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Omri Zeneva <ozeneva@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
this commit replaces the blkin param from api methods by opentelemetry
span context

Signed-off-by: Omri Zeneva <ozeneva@redhat.com>

Co-authored-by: Adam Emerson <aemerson@redhat.com>

Signed-off-by: Adam Emerson <aemerson@redhat.com>
instead of blkin trace info that passes with messages,
we replace it with o-tel trace info
we must support backward compatibility so we need
to keep decode the old trace info, but new messages
will have o-tel trace, so version increase is mandatory

Signed-off-by: Omri Zeneva <ozeneva@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
since we removed the blkin submodule, we can't use it
in file/bluestore also

Signed-off-by: Omri Zeneva <ozeneva@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
This commit removes the blkin submodule and references to it. The only
part remaining is the dummy `blkin_trace_info` data structure and its
coded, now moved into `Message.cc`.

Signed-off-by: Adam Emerson <aemerson@redhat.com>
When start_read_op() is called without a valid OpRequestRef (e.g., from
RMWPipeline::try_state_to_reads at line 726), the otel_trace was not
being initialized, leading to potential null pointer issues.

Changes:
- Initialize otel_trace in ReadOp constructor using noop_span_ctx when
  op is null, ensuring trace is always valid
- Add fallback trace creation in start_read_op() when _op is null
- Add comment noting that passing the original op through would be
  a better long-term solution
- Fix MOSDECSubOpWrite header version check (3 instead of 4) for
  decode_otel_trace compatibility
- Add necessary tracer.h includes to ECCommonL.h
- End the OpRequest span when the operation completes

This ensures OpenTelemetry tracing works correctly for all EC read
operations, including those initiated without an associated client
operation request.

Signed-off-by: John Agombar <agombar@uk.ibm.com>
…ementation

- Add const qualifiers to jspan methods: AddEvent(), UpdateName(), IsRecording()
- Add missing End() method to jspan class
- Add default and parameterized constructors to Tracer struct

These changes ensure the no-op tracing implementation maintains API
consistency with the actual Jaeger tracing implementation when tracing
is disabled.

Signed-off-by: John Agombar <agombar@uk.ibm.com>
- Use thrift::libthrift target instead of plain thrift for system thrift
  (thrift >= 0.17) to properly reference the CMake imported target
- Add libarrow_bundled_dependencies.a to byproducts and link libraries
  when using bundled thrift (thrift < 0.17)

This ensures proper linking of Arrow's bundled thrift dependencies and
uses the correct CMake target for system thrift installations.

This change is required to build ceph if WITH_JAEGER is false.

Signed-off-by: John Agombar <agombar@uk.ibm.com>
…ders

Check if a real tracer provider is already configured before initializing
a new one. This prevents libraries (neorados, librbd when used as a library)
from overwriting the tracer provider set up by daemons (OSD, RGW, etc.).

The change detects if the current provider is the default no-op provider.
If it is, we initialize a new Jaeger exporter as before.
If a real provider already exists, we reuse it instead of replacing it.

This allows daemons to properly initialize tracing while libraries can
safely call init() without disrupting the daemon's tracing configuration.

Signed-off-by: John Agombar <agombar@uk.ibm.com>
@github-actions
Copy link

Config Diff Tool Output

+ added: rbd_otel_trace_all (rbd.yaml.in)
- removed: osdc_blkin_trace_all (global.yaml.in)
- removed: osd_blkin_trace_all (global.yaml.in)
- removed: rbd_blkin_trace_all (rbd.yaml.in)

The above configuration changes are found in the PR. Please update the relevant release documentation if necessary.
Ignore this comment if docs are already updated. To make the "Check ceph config changes" CI check pass, please comment /config check ok and re-run the test.

@jagombar
Copy link
Contributor Author

I have addressed the comment in #59365 apart from the ones about multiple tracers hence this is still a WIP.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants