revert PRs #47457 and #51031#51293
Conversation
src/rgw/rgw_aio.cc
Outdated
| @@ -58,7 +58,7 @@ Aio::OpFunc aio_abstract(librados::IoCtx ctx, Op&& op, jspan_context* trace_ctx | |||
| (void)trace_ctx; // suppress unused trace_ctx warning. until we will support the read op trace | |||
src/rgw/rgw_aio.cc
Outdated
| @@ -58,7 +58,7 @@ Aio::OpFunc aio_abstract(librados::IoCtx ctx, Op&& op, jspan_context* trace_ctx | |||
| (void)trace_ctx; // suppress unused trace_ctx warning. until we will support the read op trace | |||
There was a problem hiding this comment.
Why to revert the merge commit? Wouldn't be enough to simply:
$ git revert 9f160e4
[detached HEAD c3b26d99f2c] Revert "rgw/aio: fix recursion from tracing changes"
1 file changed, 2 insertions(+), 2 deletions(-)
$ git revert 5d4c5e6
Auto-merging src/rgw/rgw_op.cc
Auto-merging src/osd/OSD.cc
Auto-merging src/msg/Message.h
Auto-merging src/msg/Message.cc
Auto-merging src/messages/MOSDOp.h
Auto-merging src/common/tracer.h
[detached HEAD 147f743d2c0] Revert "tracer/osd/librados/build/rgw: rgw and osd end2end tracing using opentelemetry"
33 files changed, 139 insertions(+), 233 deletions(-)
?
Reverting regular, individual commits is straightforward and results in pretty readable history:
$ git log --oneline -n3
147f743d2c0 (HEAD) Revert "tracer/osd/librados/build/rgw: rgw and osd end2end tracing using opentelemetry"
c3b26d99f2c Revert "rgw/aio: fix recursion from tracing changes"
92ba1c08c83 (upstream/main) Merge pull request #50324 from pkalever/scrub_incomplete
Reverting a merge commit is, well, far more complex underneath and can be quirky: https://mirrors.edge.kernel.org/pub/software/scm/git/docs/howto/revert-a-faulty-merge.txt
There was a problem hiding this comment.
agree. will revert the actual commit
There was a problem hiding this comment.
pushed a version reverting the actual commit and not the merge
| class MOSDOp final : public MOSDFastDispatchOp { | ||
| private: | ||
| static constexpr int HEAD_VERSION = 9; | ||
| static constexpr int HEAD_VERSION = 8; |
There was a problem hiding this comment.
This commit creates as short-living version 9 which resembles a similar situation in RocksDB. This seems fine to me as it's all about main but I want to be explicit here: we provide NO GUARANTEES on dencoding interoperability between two pure-main revisions, right?
CC: @gregsfortytwo, @athanatos.
There was a problem hiding this comment.
@rzarzynski I think the main issue is unblocking failing tests where we have older versions of OSDs
There was a problem hiding this comment.
Right @rzarzynski, no guarantees on versioning in main. I mean, we don’t love it, but we can’t guarantee it for precisely situations like this, where we discover an issue in integration tests.
8ebc0c5 to
e39ca6b
Compare
| auto is_valid = span_ctx.IsValid(); | ||
| encode(is_valid, bl); | ||
| if (is_valid) { | ||
| encode_nohead(std::string_view(reinterpret_cast<const char*>(span_ctx.trace_id().Id().data()), TraceId::kSize), bl); |
There was a problem hiding this comment.
After some drilling in the encoding stuff, I started worrying there is (and was; not a new thing) a rather minor problem with the tracker's format. By using encode_nohead an encoder enforces decoders to use their own, local instance of the TraceId::kSize and SpanId::kSize constants.
The current assumption is they match between e.g. revisions of jspan_context and architectures. If this invariant breaks out, we could run into:
- if decoder's constant
<encoder's constant, then decoder's compat-preserving machinery will silently eat the extra bytes (bl += struct_end - bl.get_off()) and what happen later is unknown:
/**
* finish decode block
*
* @param bl bufferlist::iterator we were decoding from
*/
#define DECODE_FINISH(bl) \
} while (false); \
if (struct_end) { \
if (bl.get_off() > struct_end) \
throw ::ceph::buffer::malformed_input(DECODE_ERR_PAST(__PRETTY_FUNCTION__)); \
if (bl.get_off() < struct_end) \
bl += struct_end - bl.get_off(); \
}
- if decoder's constant
>encoder's constant –end_of_bufferormalformed_inputwill be thrown.
I doubt it's a severe defect as it can happen only on the tracing (is_valid) path but if we consider a feature bit, can we fix this minor as well? Is it worth? Is a note in the doc an alternative?
There was a problem hiding this comment.
thanks @rzarzynski, this issue was raised by @idryomov here: #51043
will look at that once the end2end work is re-merged
There was a problem hiding this comment.
@rzarzynski What is worse is that these constants are taken directly from a third-party library header. My suggestion was to define them in Ceph, use build-time asserts to check them against third-party headers, and bump the encoding if they ever change. "No head" encoding isn't amenable to that, of course...
There was a problem hiding this comment.
Yeah, let's use the opportunity to clean this up.
Asserting + avoiding the same-size-everywhere assumption seems fine.
| bool is_valid; | ||
| decode(is_valid, bl); | ||
| if (is_valid) { | ||
| std::array<uint8_t, TraceId::kSize> trace_id; |
There was a problem hiding this comment.
Can TraceId::kSize and SpanId::kSize be different between an encoder and decoders?
|
Jenkins test this please |
|
I just rebased a branch onto main and I got the following build errors: https://paste.sh/s12G-Pcd#GF7QZnoU4Vum3QiQ3QCsqSzE @gregsfortytwo Perhaps you were right about needing to revert #51043 as well. |
@alimaredia, I will revert, but i'm not sure why you see it on your branch and it is not seen in main? |
Likely because Ali is building with |
|
this was performed using the following commands:
git revert -s -m 1 5609383git revert -s 9f160e4
git revert -s 5d4c5e6
the first revert is needed necause PR #51031 is a subsequent fix to PR #47457
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows