Skip to content

revert PRs #47457 and #51031#51293

Merged
gregsfortytwo merged 2 commits intoceph:mainfrom
yuvalif:wip-yuval-revert-47457
Apr 30, 2023
Merged

revert PRs #47457 and #51031#51293
gregsfortytwo merged 2 commits intoceph:mainfrom
yuvalif:wip-yuval-revert-47457

Conversation

@yuvalif
Copy link
Contributor

@yuvalif yuvalif commented Apr 30, 2023

this was performed using the following commands:
git revert -s -m 1 5609383
git revert -s 9f160e4
git revert -s 5d4c5e6

the first revert is needed necause PR #51031 is a subsequent fix to PR #47457

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@@ -58,7 +58,7 @@ Aio::OpFunc aio_abstract(librados::IoCtx ctx, Op&& op, jspan_context* trace_ctx
(void)trace_ctx; // suppress unused trace_ctx warning. until we will support the read op trace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reverts 9f160e4 has bee made on top of (depends on) 5d4c5e6.

NOTE for the future (please don't solve this comment): we can't forget abort bringing it back after fixing the encoding issue.

@@ -58,7 +58,7 @@ Aio::OpFunc aio_abstract(librados::IoCtx ctx, Op&& op, jspan_context* trace_ctx
(void)trace_ctx; // suppress unused trace_ctx warning. until we will support the read op trace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why to revert the merge commit? Wouldn't be enough to simply:

$ git revert 9f160e4
[detached HEAD c3b26d99f2c] Revert "rgw/aio: fix recursion from tracing changes"
 1 file changed, 2 insertions(+), 2 deletions(-)
$ git revert 5d4c5e6
Auto-merging src/rgw/rgw_op.cc
Auto-merging src/osd/OSD.cc
Auto-merging src/msg/Message.h
Auto-merging src/msg/Message.cc
Auto-merging src/messages/MOSDOp.h
Auto-merging src/common/tracer.h
[detached HEAD 147f743d2c0] Revert "tracer/osd/librados/build/rgw: rgw and osd end2end tracing using opentelemetry"
 33 files changed, 139 insertions(+), 233 deletions(-)

?

Reverting regular, individual commits is straightforward and results in pretty readable history:

$ git log --oneline -n3
147f743d2c0 (HEAD) Revert "tracer/osd/librados/build/rgw: rgw and osd end2end tracing using opentelemetry"
c3b26d99f2c Revert "rgw/aio: fix recursion from tracing changes"
92ba1c08c83 (upstream/main) Merge pull request #50324 from pkalever/scrub_incomplete

Reverting a merge commit is, well, far more complex underneath and can be quirky: https://mirrors.edge.kernel.org/pub/software/scm/git/docs/howto/revert-a-faulty-merge.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. will revert the actual commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed a version reverting the actual commit and not the merge

class MOSDOp final : public MOSDFastDispatchOp {
private:
static constexpr int HEAD_VERSION = 9;
static constexpr int HEAD_VERSION = 8;
Copy link
Contributor

@rzarzynski rzarzynski Apr 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit creates as short-living version 9 which resembles a similar situation in RocksDB. This seems fine to me as it's all about main but I want to be explicit here: we provide NO GUARANTEES on dencoding interoperability between two pure-main revisions, right?

CC: @gregsfortytwo, @athanatos.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rzarzynski I think the main issue is unblocking failing tests where we have older versions of OSDs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right @rzarzynski, no guarantees on versioning in main. I mean, we don’t love it, but we can’t guarantee it for precisely situations like this, where we discover an issue in integration tests.

yuvalif added 2 commits April 30, 2023 13:29
This reverts commit 9f160e4.

Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
…ing opentelemetry"

This reverts commit 5d4c5e6.

Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
@yuvalif yuvalif force-pushed the wip-yuval-revert-47457 branch from 8ebc0c5 to e39ca6b Compare April 30, 2023 13:37
auto is_valid = span_ctx.IsValid();
encode(is_valid, bl);
if (is_valid) {
encode_nohead(std::string_view(reinterpret_cast<const char*>(span_ctx.trace_id().Id().data()), TraceId::kSize), bl);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some drilling in the encoding stuff, I started worrying there is (and was; not a new thing) a rather minor problem with the tracker's format. By using encode_nohead an encoder enforces decoders to use their own, local instance of the TraceId::kSize and SpanId::kSize constants.

The current assumption is they match between e.g. revisions of jspan_context and architectures. If this invariant breaks out, we could run into:

  1. if decoder's constant < encoder's constant, then decoder's compat-preserving machinery will silently eat the extra bytes (bl += struct_end - bl.get_off()) and what happen later is unknown:
/**
 * finish decode block
 *
 * @param bl bufferlist::iterator we were decoding from
 */
#define DECODE_FINISH(bl)                                               \
  } while (false);                                                      \
  if (struct_end) {                                                     \
    if (bl.get_off() > struct_end)                                      \
      throw ::ceph::buffer::malformed_input(DECODE_ERR_PAST(__PRETTY_FUNCTION__)); \
    if (bl.get_off() < struct_end)                                      \
      bl += struct_end - bl.get_off();                                  \
  }
  1. if decoder's constant > encoder's constant – end_of_buffer or malformed_input will be thrown.

I doubt it's a severe defect as it can happen only on the tracing (is_valid) path but if we consider a feature bit, can we fix this minor as well? Is it worth? Is a note in the doc an alternative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @rzarzynski, this issue was raised by @idryomov here: #51043
will look at that once the end2end work is re-merged

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rzarzynski What is worse is that these constants are taken directly from a third-party library header. My suggestion was to define them in Ceph, use build-time asserts to check them against third-party headers, and bump the encoding if they ever change. "No head" encoding isn't amenable to that, of course...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's use the opportunity to clean this up.
Asserting + avoiding the same-size-everywhere assumption seems fine.

bool is_valid;
decode(is_valid, bl);
if (is_valid) {
std::array<uint8_t, TraceId::kSize> trace_id;
Copy link
Contributor

@rzarzynski rzarzynski Apr 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can TraceId::kSize and SpanId::kSize be different between an encoder and decoders?

@gregsfortytwo
Copy link
Member

Doesn’t this also need to revert #51043 then @yuvalif ?

@yuvalif
Copy link
Contributor Author

yuvalif commented Apr 30, 2023

Doesn’t this also need to revert #51043 then @yuvalif ?

not really. encoding/decoding of traces, as long as it does not happen in OSD messaging, is ok

Copy link
Member

@gregsfortytwo gregsfortytwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s fix testing

@gregsfortytwo
Copy link
Member

Jenkins test this please

@gregsfortytwo gregsfortytwo merged commit 7179c0e into ceph:main Apr 30, 2023
@alimaredia
Copy link
Contributor

I just rebased a branch onto main and I got the following build errors: https://paste.sh/s12G-Pcd#GF7QZnoU4Vum3QiQ3QCsqSzE

@gregsfortytwo Perhaps you were right about needing to revert #51043 as well.

@yuvalif
Copy link
Contributor Author

yuvalif commented May 2, 2023

I just rebased a branch onto main and I got the following build errors: https://paste.sh/s12G-Pcd#GF7QZnoU4Vum3QiQ3QCsqSzE

@gregsfortytwo Perhaps you were right about needing to revert #51043 as well.

@alimaredia, I will revert, but i'm not sure why you see it on your branch and it is not seen in main?

@idryomov
Copy link
Contributor

idryomov commented May 2, 2023

@alimaredia, I will revert, but i'm not sure why you see it on your branch and it is not seen in main?

Likely because Ali is building with HAVE_JAEGER disabled.

@Matan-B
Copy link
Contributor

Matan-B commented May 3, 2023

I just rebased a branch onto main and I got the following build errors: https://paste.sh/s12G-Pcd#GF7QZnoU4Vum3QiQ3QCsqSzE

@gregsfortytwo Perhaps you were right about needing to revert #51043 as well.

#51331

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants