Skip to content

squid: tracer/osd/librados/build/rgw: rgw and osd end2end tracing using open…#55625

Merged
yuriw merged 3 commits intoceph:squidfrom
rzarzynski:wip-end2end-tracing-squid
Mar 26, 2024
Merged

squid: tracer/osd/librados/build/rgw: rgw and osd end2end tracing using open…#55625
yuriw merged 3 commits intoceph:squidfrom
rzarzynski:wip-end2end-tracing-squid

Conversation

@rzarzynski
Copy link
Contributor

…telemetry

  • build: add opentelemetry to cmake system crimson targets that uses Message.cc/h are built before opentelemetry (o-tel), so we need to build o-tel eralier so we also add the library to the include path earlier this shoud work for WITH_JAEGER flag both the ON/OFF cases, and for librados where the compilation flag is ignored

  • msg/tracer: add o-tel trace to Messages with decode/encode function in tracer.h some files that uses Message.cc/h just need the encode/decode functions and not all others functions. some crimson targets does not link with ceph_context (common) which is required for tracer.cc file. so we just need to include that functions

  • librados: Add opentelemtry trace param for aio_operate and operate methods in order to propagate the trace info I added the otel-trace as an extra param. in some places, there already was a blkin trace info, and since it is not used in other places we can safely change it to o-tel trace info. this will be done in another commit, so the cleanup of blkin trace will be in a dedicated commit

  • osd: use the o-tel trace of the msg as a parent span of the osd trace if there is a valid span in the msg, we will add this op to the request trace, otherwise it will start a new trace for the OSD op

  • rgw: pass put obj trace info to librados in order to make it possible, I saved the trace info inside the sal::Object, so we can use it later when writing the object to rados it could be used also later for read ops.
    note the trace field of req_state is initalized only in rgw_process, so it's also required in librgw request flow

  • prevent breaking channges to kSize. make sure that changes between components built with different versions of OTEL do not break message compatibility

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@rzarzynski
Copy link
Contributor Author

rzarzynski commented Feb 17, 2024

This is the squid backport of #52114.

@rzarzynski
Copy link
Contributor Author

Known make check failure:

The following tests FAILED:
	240 - unittest-seastar-socket (Subprocess aborted)

@rzarzynski
Copy link
Contributor Author

jenkins test make check

Omri Zeneva and others added 2 commits February 20, 2024 17:24
…telemetry

* build: add opentelemetry to cmake system
crimson targets that uses Message.cc/h are built before opentelemetry (o-tel), so we need to build o-tel eralier so we also add the library to the include path earlier
this shoud work for WITH_JAEGER flag both the ON/OFF cases, and for librados where the compilation flag is ignored

* msg/tracer: add o-tel trace to Messages with decode/encode function in tracer.h
some files that uses Message.cc/h just need the encode/decode functions  and not all others functions.
some crimson targets does not link with ceph_context (common) which is required for tracer.cc file. so we just need to include that functions

* librados: Add opentelemtry trace param for aio_operate and operate methods
in order to propagate the trace info I added the otel-trace as an extra param.
in some places, there already was a blkin trace info, and since it is not used in other places we can safely change it to o-tel trace info.
this will be done in another commit, so the cleanup of blkin trace will be in a dedicated commit

* osd: use the o-tel trace of the msg as a parent span of the osd trace
if there is a valid span in the msg, we will add this op to the request
trace, otherwise it will start a new trace for the OSD op

* rgw: pass put obj trace info to librados
in order to make it possible, I saved the trace info inside the sal::Object, so we can use it later when writing the object to rados
it could be used also later for read ops.
note the trace field of req_state is initalized only in rgw_process, so it's also required in librgw request flow

* prevent breaking channges to kSize. make sure that changes between components built with
different versions of OTEL do not break message compatibility

Signed-off-by: Omri Zeneva <ozeneva@redhat.com>
(cherry picked from commit 320a217)
We aren't currently using jaeger tracing on Windows. The issue is
that Windows hosts (or any other host that doesn't use jaeger)
are experiencing message decoding failures after a recent change [1].

This change updates the tracer encoding so that messages from
non-jaeger hosts may be decoded by services that use jaeger.

[1] ceph#47457

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>

This commit rebrings 3701ffa which
got reverted due to an implicit dependency with other revert. Please
see ceph#52114 (comment).

Conflicts:
	src/common/tracer.h
	  formatting conflict with 7179ac0

(cherry picked from commit e145264)
@rzarzynski rzarzynski force-pushed the wip-end2end-tracing-squid branch from 71bd31e to c5c7d26 Compare February 20, 2024 16:26
@rzarzynski rzarzynski requested a review from yuvalif February 20, 2024 16:26
@rzarzynski
Copy link
Contributor Author

rzarzynski commented Feb 20, 2024

Linking the original PR: #52114.

@cbodley
Copy link
Contributor

cbodley commented Feb 22, 2024

@rzarzynski i really appreciate your help pushing this one through ❤️

i'm tracking an rgw regression from this pr on main in https://tracker.ceph.com/issues/64543. we'll need the fix from #55722 to get a clean rgw run for squid

a recent regression from 320a217 causes
aio_abstract() to recurse when given an empty optional_yield. this is
exposed by the librgw_file tests

Fixes: https://tracker.ceph.com/issues/64543

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 0e223fd)
@cbodley
Copy link
Contributor

cbodley commented Feb 23, 2024

@rzarzynski i've added that commit to this PR, i hope you don't mind

@cbodley
Copy link
Contributor

cbodley commented Feb 28, 2024

@cbodley
Copy link
Contributor

cbodley commented Feb 28, 2024

jenkins test api

@cbodley
Copy link
Contributor

cbodley commented Feb 28, 2024

@rzarzynski @ljflores does this need a rados suite run too?

@rzarzynski
Copy link
Contributor Author

@cbodley: yes, I would prefer to not skip the workflow.

ping @yuriw, @ljflores.

@cbodley
Copy link
Contributor

cbodley commented Mar 20, 2024

@yuriw please make sure this one gets a rados suite (see https://tracker.ceph.com/issues/64973#note-14)

@ljflores
Copy link
Member

@yuriw yuriw merged commit 32fe2b9 into ceph:squid Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants