Skip to content

common/tracer: fix decoding when jaeger tracing is disabled#51043

Merged
idryomov merged 1 commit intoceph:mainfrom
petrutlucian94:fix_jaeger_encoding
Apr 17, 2023
Merged

common/tracer: fix decoding when jaeger tracing is disabled#51043
idryomov merged 1 commit intoceph:mainfrom
petrutlucian94:fix_jaeger_encoding

Conversation

@petrutlucian94
Copy link
Contributor

@petrutlucian94 petrutlucian94 commented Apr 12, 2023

We aren't currently using jaeger tracing on Windows. The issue is that Windows hosts (or any other host that doesn't use jaeger) are experiencing message decoding failures after a recent change [1]:

2023-04-12T08:46:56.182+0000 7fce68a4f640 -1 failed to decode message of type 42 v9: void tracing::decode(jspan_context&, ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
2023-04-12T08:46:56.162+0000 7fce68a4f640  1 --2- [v2:13.13.13.13:6818/985688,v1:13.13.13.13:6819/985688] >> 13.13.13.10:0/721620687 conn(0x556e00704800 0x556dfdb4db80 crc :-1 s=THROTTLE_DONE pgs=284 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_message decode message failed

This change updates the tracer encoding so that messages from non-jaeger hosts may be decoded by services that use jaeger.

[1] #47457

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@petrutlucian94 petrutlucian94 changed the title commno/tracer: fix decoding when jaeger tracing is disabled common/tracer: fix decoding when jaeger tracing is disabled Apr 12, 2023
@petrutlucian94 petrutlucian94 mentioned this pull request Apr 12, 2023
14 tasks
@petrutlucian94
Copy link
Contributor Author

jenkins test api

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, assuming the array sizes are correct

@petrutlucian94
Copy link
Contributor Author

jenkins test windows

@petrutlucian94
Copy link
Contributor Author

jenkins test make check

@petrutlucian94 petrutlucian94 mentioned this pull request Apr 12, 2023
3 tasks
@idryomov
Copy link
Contributor

looks good to me, assuming the array sizes are correct

@cbodley This comment made me look at the "jaeger is present" branch and I would like to raise a concern regarding taking in on-wire format definitions from third party headers (TraceId::kSize and SpanId::kSize constants). It doesn't seem robust.

I think it would be better to define two constants in one of our namespaces to the corresponding integers (16 and 8), use these constants in encode and decode methods and employ static_assert under HAVE_JAEGER to ensure that they are equal to TraceId::kSize and SpanId::kSize. This way, if opentelemetry ever changes these, our encoding wouldn't break (or at least we would know about it).

@petrutlucian94
Copy link
Contributor Author

looks good to me, assuming the array sizes are correct

@cbodley This comment made me look at the "jaeger is present" branch and I would like to raise a concern regarding taking in on-wire format definitions from third party headers (TraceId::kSize and SpanId::kSize constants). It doesn't seem robust.

I think it would be better to define two constants in one of our namespaces to the corresponding integers (16 and 8), use these constants in encode and decode methods and employ static_assert under HAVE_JAEGER to ensure that they are equal to TraceId::kSize and SpanId::kSize. This way, if opentelemetry ever changes these, our encoding wouldn't break (or at least we would know about it).

Makes sense. I went with your previous suggestion, so we no longer need to use those constants:

inline void decode(jspan_context& span_ctx, bufferlist::const_iterator& bl) {
  DECODE_START(254, bl)
  // jaeger is missing, consume the buffer but do not decode it.
  DECODE_FINISH(bl);
  span_ctx = jspan_context();
}

@idryomov
Copy link
Contributor

I went with your previous suggestion, so we no longer need to use those constants

+1 -- they are still used in the "jaeger is present" branch but that is out of scope of this PR.

We aren't currently using jaeger tracing on Windows. The issue is
that Windows hosts (or any other host that doesn't use jaeger)
are experiencing message decoding failures after a recent change [1].

This change updates the tracer encoding so that messages from
non-jaeger hosts may be decoded by services that use jaeger.

[1] ceph#47457

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
@petrutlucian94
Copy link
Contributor Author

jenkins test make check

@petrutlucian94
Copy link
Contributor Author

jenkins test api

@petrutlucian94
Copy link
Contributor Author

jenkins test windows

@yuvalif
Copy link
Contributor

yuvalif commented Apr 13, 2023

SpanId::kSize

@cbodley, this makes sense. we should do that regrdless of this PR. we use these constants in the existing code

@idryomov
Copy link
Contributor

jenkins test api

1 similar comment
@idryomov
Copy link
Contributor

jenkins test api

@idryomov
Copy link
Contributor

I don't see a point in running any of the suites on this PR since we don't seem to have non-jaeger package builds, at least not on main. Merging after running some manual tests covering jaeger-enabled OSDs <-> non-jaeger client and vice versa.

@idryomov idryomov merged commit 7f6c674 into ceph:main Apr 17, 2023
@idryomov
Copy link
Contributor

SpanId::kSize

@cbodley, this makes sense. we should do that regrdless of this PR. we use these constants in the existing code

@yuvalif Not sure if you quoted what you meant to quote in that comment. I'm choosing to interpret it as you agreeing with my suggestion to not use constants from opentelemetry headers directly in encoding code ;)

yuvalif pushed a commit to yuvalif/ceph that referenced this pull request Apr 30, 2023
common/tracer: fix decoding when jaeger tracing is disabled

Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Yuval Lifshitz <ylifshit@redhat.com>
@yuvalif yuvalif mentioned this pull request Apr 30, 2023
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants