Skip to content

nautilus: New msgr2 crc and secure modes (msgr2.1)#35733

Merged
yuriw merged 22 commits intoceph:nautilusfrom
idryomov:wip-msgr21-nautilus
Jul 31, 2020
Merged

nautilus: New msgr2 crc and secure modes (msgr2.1)#35733
yuriw merged 22 commits intoceph:nautilusfrom
idryomov:wip-msgr21-nautilus

Conversation

@idryomov
Copy link
Contributor

Backport of #35078 (changes from #34927 which got merged via #35078 already in nautilus).

Moderate conflicts, all documented inline.

@idryomov idryomov added feature messenger Issues involving one of the Ceph messenger implementations labels Jun 23, 2020
@idryomov idryomov added this to the nautilus milestone Jun 23, 2020
@tchaikov tchaikov added nautilus-batch-1 nautilus point releases needs-qa labels Jun 24, 2020
@yuriw
Copy link
Contributor

yuriw commented Jul 1, 2020

@idryomov
Copy link
Contributor Author

idryomov commented Jul 1, 2020

@yuriw Please drop it for now, it suffers from the same issue as the octopus branch. I'll add the fix from #35816 and ping you.

@idryomov
Copy link
Contributor Author

idryomov commented Jul 8, 2020

@yuriw This is still DNM. There is a subtle issue with nautilus that I'm currently tracking down.

@yuriw
Copy link
Contributor

yuriw commented Jul 8, 2020

@yuriw
Copy link
Contributor

yuriw commented Jul 8, 2020

@idryomov we suspect this PR is failing builds
http://hastebin.com/relivoceci
@neha-ojha FYI

@idryomov
Copy link
Contributor Author

idryomov commented Jul 9, 2020

@yuriw Correct, I haven't bothered pushing the fix for the build failure as there is a deeper issue here that is being investigated.

Jianpeng Ma and others added 14 commits July 13, 2020 14:44
This will cause unbalance between workes.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
(cherry picked from commit 5cf027d)
Provide an iterator-like interface as initializer lists cannot be
formed dynamically.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1fc5cc2)
Add asserts to avoid bugs like the one fixed in 1a975fb
("msg/async: fix unnecessary 4 kB allocation in secure mode.").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 4894e59)
It is unused and doesn't make much sense in TxHandler.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit b3e39b1)
OpenSSL supports in-place decryption so we can avoid
allocating potentially multi-megabyte and strictly aligned
buffer for each decryption operation.

ProtocolV2 actually gets the alignment wrong: after
read_frame_segment() allocates with cur_rx_desc.alignment,
handle_read_frame_segment() effectively replaces that with
segment_t::DEFAULT_ALIGNMENT.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit fe97a00)
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit e1d1f61)
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 6b6d405)
Use it in ProtocolV2.h and later in unit tests.

While at it, drop the unused len struct.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit c081f3c)

Conflicts:
	src/crimson/net/ProtocolV2.cc [ crimson doesn't support
	  msgr2 in nautilus ]
	src/crimson/net/ProtocolV2.h [ ditto ]
Start separating frame assembly and disassembly code from
frame sending, receiving and handling code, so that assembly
and disassembly pieces can be unit tested and hopefully also
shared between different messengers (e.g. crimson).

This commit factors out the assembly code from Frame.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 872b125)

Conflicts:
	src/crimson/CMakeLists.txt [ crimson doesn't support msgr2
	  in nautilus
	src/crimson/net/ProtocolV2.cc [ ditto ]
	src/crimson/net/ProtocolV2.h [ ditto ]
	src/msg/async/frames_v2.h [ commits f1cf408
	  ("msg/async/frames_v2.h: fix warning"), c70f779
	  ("headers: Make ceph_le member private"), 9908f0e
	  ("msg: Add optimizing move") and 1a975fb ("msg/async:
	  fix unnecessary 4 kB allocation in secure mode.") not in
	  nautilus; fmt include adjusted as in rgw_lc.cc ]
Factor out the disassembly code from ProtocolV2 and switch
ProtocolV2 to FrameAssembler.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit b9e0cfe)

Conflicts:
	src/msg/async/ProtocolV2.cc [ context: commit d3ec4c01d17
	  ("msg: Build target 'common' without using namespace in
	  headers") not in nautilus ]
l_msgr_recv_bytes calculation was never updated from msgr1.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit dcf30f5)
In preparation for msgr2,1, rename epilogue structs:
epilogue_plain_block_t to epilogue_crc_rev0_block_t and
epilogue_secure_block_t to epilogue_secure_rev0_block_t
(rev0 stands for revision 0).

Also, get rid of size constants that just disguise the
struct type.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 712915b)

Conflicts:
	src/crimson/net/ProtocolV2.cc [ crimson doesn't support
	  msgr2 in nautilus ]
Clarify that the frame can be aborted at any point after the
preamble and the first segment are put on the wire.  When that
happens, the remaining segments (including the data segment)
may be filled with zeros.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit bc835b8)

Conflicts:
	src/crimson/net/ProtocolV2.cc [ crimson doesn't support
	  msgr2 in nautilus ]
	src/msg/async/frames_v2.h [ context: commit c70f779
	  ("headers: Make ceph_le member private") not in nautilus ]
Implement msgr2.1-crc and msgr2.1-secure modes.

Issues with existing msgr2.0-crc and msgr2.0-secure modes and
their resolution will be described in doc/dev/msgr2.rst.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 2966b2a)

Conflicts:
	src/crimson/net/ProtocolV2.cc [ crimson doesn't support
	  msgr2 in nautilus ]
idryomov added 2 commits July 13, 2020 14:45
Currently it's a mix of hex and dec, making it hard to grep for.
Converge on hex to match client_cookie.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 779a8df)
reuse_connection() can be called on exproto in BANNER_CONNECTING
(i.e. without peer_supported_features and with tx/rx_frame_asm set to
msgr2.0), but this state isn't carried over.  If the donor connection
is msgr2.1, this leads to repeated connection faults on crc or auth tag
mismatches because we end up assembling 2.0 frames while the peer is
expecting 2.1 frames.

Fixes: https://tracker.ceph.com/issues/46180
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit baff20a)

Conflicts:
	src/test/msgr/test_msgr.cc [ commits 5e3aa5d
	  ("ceph_test_msgr: remove simple") and bfb8c74
	  ("test/msgr: s/Mutex/ceph::mutex/") not in nautilus ]
@idryomov idryomov force-pushed the wip-msgr21-nautilus branch from adaac38 to b9db4ab Compare July 13, 2020 12:48
@idryomov
Copy link
Contributor Author

-s rados/singleton-nomsgr --filter 'all/health-warnings rados' -N 20:

https://pulpito.ceph.com/dis-2020-07-13_14:08:26-rados:singleton-nomsgr-wip-msgr21-nautilus-distro-basic-smithi/

@idryomov
Copy link
Contributor Author

@yuriw @neha-ojha Taking DNM off.

@idryomov idryomov removed the DNM label Jul 13, 2020
@idryomov
Copy link
Contributor Author

idryomov commented Jul 13, 2020

@yuriw Correct, I haven't bothered pushing the fix for the build failure as there is a deeper issue here that is being
investigated.

I tracked it down to the interaction of INTERCEPT framework and NetworkStack::get_worker(). Pulled in commit 5cf027d ("msg/async: don't forget dec(Worker::references) when met error.") from #31929.

@yuriw
Copy link
Contributor

yuriw commented Jul 13, 2020

@idryomov ok testing this alone

@yuriw
Copy link
Contributor

yuriw commented Jul 13, 2020

@idryomov
Copy link
Contributor Author

-s rados/singleton-nomsgr --filter 'all/health-warnings rados' -N 200:

https://pulpito.ceph.com/dis-2020-07-13_16:40:12-rados:singleton-nomsgr-wip-msgr21-nautilus-distro-basic-smithi/

@neha-ojha One suspicious failure, but it doesn't appear to be related to the messenger. osd.1 couldn't come up because osd.9 kept telling it that it had died (MOSDPing with YOU_DIED). Not sure why it never moved to a newer osdmap...

@yuriw
Copy link
Contributor

yuriw commented Jul 15, 2020

@idryomov it passed tests and @neha-ojha reviewed/approved it https://trello.com/c/1RQihhoW
So pls merge is all is done here.

@neha-ojha
Copy link
Member

-s rados/singleton-nomsgr --filter 'all/health-warnings rados' -N 200:

https://pulpito.ceph.com/dis-2020-07-13_16:40:12-rados:singleton-nomsgr-wip-msgr21-nautilus-distro-basic-smithi/

@neha-ojha One suspicious failure, but it doesn't appear to be related to the messenger. osd.1 couldn't come up because osd.9 kept telling it that it had died (MOSDPing with YOU_DIED). Not sure why it never moved to a newer osdmap...

doesn't look related

@idryomov
Copy link
Contributor Author

@yuriw What about upgrade tests?

@yuriw
Copy link
Contributor

yuriw commented Jul 16, 2020

@yuriw
Copy link
Contributor

yuriw commented Jul 23, 2020

@idryomov @neha-ojha what's next ? do we want this in next release?

@idryomov
Copy link
Contributor Author

Yes, we do. I was waiting for Neha to look at the upgrade runs.

Radek should review shortly.

@neha-ojha
Copy link
Member

neha-ojha commented Jul 23, 2020

Yes, we do. I was waiting for Neha to look at the upgrade runs.

@yuriw @idryomov There seem to be a lot of failures in the nautilus baseline upgrade run, which can mask real bugs. I'll triage those and decide whether we need to redo upgrade tests for this PR.

Radek should review shortly.

@yuriw
Copy link
Contributor

yuriw commented Jul 27, 2020

@yuriw
Copy link
Contributor

yuriw commented Jul 31, 2020

@yuriw yuriw merged commit 408b1be into ceph:nautilus Jul 31, 2020
@idryomov idryomov deleted the wip-msgr21-nautilus branch August 10, 2020 10:15
@smithfarm
Copy link
Contributor

I find it odd that no PendingReleaseNote was added for this... would it make sense to write one and add it to the 14.2.11 blog post and the 14.2.11 release notes in master?

(If someone comes up with such a release note, I could take care of getting it into the right places.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature messenger Issues involving one of the Ceph messenger implementations nautilus-batch-1 nautilus point releases needs-qa TESTED wip-yuri-testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants