Skip to content

New msgr2 crc and secure modes (msgr2.1)#35078

Merged
tchaikov merged 24 commits intoceph:masterfrom
idryomov:wip-msgr21
Jun 20, 2020
Merged

New msgr2 crc and secure modes (msgr2.1)#35078
tchaikov merged 24 commits intoceph:masterfrom
idryomov:wip-msgr21

Conversation

@idryomov
Copy link
Contributor

This is based on @theanalyst's upstream-rgw-msgr-fixes branch for now (#34927). I'll rebase once it merges.

idryomov and others added 8 commits May 6, 2020 09:54
The secure mode uses AES-128-GCM with 96-bit nonces consisting of a
32-bit counter followed by a 64-bit salt.  The counter is incremented
after processing each frame, the salt is fixed for the duration of
the session.  Both are initialized from the session key generated
during session negotiation, so the counter starts with essentially
a random value.  It is allowed to wrap, and, after 2**32 frames, it
repeats, resulting in nonce reuse (the actual sequence numbers that
the messenger works with are 64-bit, so the session continues on).

Because of how GCM works, this completely breaks both confidentiality
and integrity aspects of the secure mode.  A single nonce reuse reveals
the XOR of two plaintexts and almost completely reveals the subkey
used for producing authentication tags.  After a few nonces get used
twice, all confidentiality and integrity goes out the window and the
attacker can potentially encrypt-authenticate plaintext of their
choice.

We can't easily change the nonce format to extend the counter to
64 bits (and possibly XOR it with a longer salt).  Instead, just
remember the initial nonce and cut the session before it repeats,
forcing renegotiation.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
As a AES-GCM IV, nonce_t is implicitly shared between server and
client.  Currently, if their endianness doesn't match, they are unable
to communicate in secure mode because each gets its own idea of what
the next nonce should be after the counter is incremented.

Several RFCs state that the nonce counter should be BE, but since we
use LE for everything on-disk and on-wire, make it LE.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit d8dd5e513c0c62bbd7d3044d7e2eddcd897bd400)
As per Robin's comments and S3 spec

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
S3 GetObject permits overriding response header values, but those inputs
need to be validated to insure only characters that are valid in an HTTP
header value are present.

Credit: Initial vulnerability discovery by William Bowling (@wcbowling)
Credit: Further vulnerability discovery by Robin H. Johnson <rjohnson@digitalocean.com>
Signed-off-by: Robin H. Johnson <rjohnson@digitalocean.com>
Provide an iterator-like interface as initializer lists cannot be
formed dynamically.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Add asserts to avoid bugs like the one fixed in 1a975fb
("msg/async: fix unnecessary 4 kB allocation in secure mode.").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
It is unused and doesn't make much sense in TxHandler.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
@idryomov idryomov requested a review from a team as a code owner May 14, 2020 21:35
@idryomov idryomov added feature messenger Issues involving one of the Ceph messenger implementations labels May 15, 2020
@tchaikov tchaikov requested a review from cyx1231st May 15, 2020 03:08
@tchaikov
Copy link
Contributor

@cyx1231st just in case.

@idryomov idryomov requested a review from rzarzynski May 15, 2020 08:26
@idryomov
Copy link
Contributor Author

cc @jdurgin @liewegas

@idryomov
Copy link
Contributor Author

jenkins test make check

Copy link
Contributor

@rzarzynski rzarzynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments from 1st part of the review. So far looks good.

Copy link
Contributor

@rzarzynski rzarzynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In progress. Generally looks good so far!

FrameAssembler(const ceph::crypto::onwire::rxtx_t* crypto)
: m_crypto(crypto) {}
FrameAssembler(const ceph::crypto::onwire::rxtx_t* crypto, bool is_r1)
: m_crypto(crypto), m_is_r1(is_r1) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: one per line.

Copy link
Contributor

@rzarzynski rzarzynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes generally look good! Some nits and questions.
I'm moving forward to a brief performance comparison.

return bl;
}

bool disassemble_frame(FrameAssembler* frame_asm, bufferlist* frame_bl,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: one param per line.

@rzarzynski
Copy link
Contributor

I made a really brief performance evaluation. For small chunk we don't need to worry; for bigger ones there is a nice improvement in latency.

I think we can move forward.

idryomov added 4 commits June 12, 2020 14:37
OpenSSL supports in-place decryption so we can avoid
allocating potentially multi-megabyte and strictly aligned
buffer for each decryption operation.

ProtocolV2 actually gets the alignment wrong: after
read_frame_segment() allocates with cur_rx_desc.alignment,
handle_read_frame_segment() effectively replaces that with
segment_t::DEFAULT_ALIGNMENT.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Use it in ProtocolV2.h and later in unit tests.

While at it, drop the unused len struct.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
@idryomov
Copy link
Contributor Author

@rzarzynski Implemented reserving before appending and addressed most of the style nits (including using mutable references, so the diff is quite large but mostly mechanical). reserve calls is pretty much the only functional change, any testing you have done should still be valid.

@idryomov
Copy link
Contributor Author

I think we can move forward.

Since I haven't received any other comments on the wire format itself, I'm taking off the RFC.

@idryomov idryomov changed the title [RFC] New msgr2 crc and secure modes (msgr2.1) New msgr2 crc and secure modes (msgr2.1) Jun 12, 2020
idryomov added 5 commits June 14, 2020 11:56
Start separating frame assembly and disassembly code from
frame sending, receiving and handling code, so that assembly
and disassembly pieces can be unit tested and hopefully also
shared between different messengers (e.g. crimson).

This commit factors out the assembly code from Frame.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Factor out the disassembly code from ProtocolV2 and switch
ProtocolV2 to FrameAssembler.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
l_msgr_recv_bytes calculation was never updated from msgr1.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
In preparation for msgr2,1, rename epilogue structs:
epilogue_plain_block_t to epilogue_crc_rev0_block_t and
epilogue_secure_block_t to epilogue_secure_rev0_block_t
(rev0 stands for revision 0).

Also, get rid of size constants that just disguise the
struct type.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Clarify that the frame can be aborted at any point after the
preamble and the first segment are put on the wire.  When that
happens, the remaining segments (including the data segment)
may be filled with zeros.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
@idryomov
Copy link
Contributor Author

@rzarzynski One more fixup: moved the assert into get_buffer().

idryomov added 7 commits June 17, 2020 21:56
Implement msgr2.1-crc and msgr2.1-secure modes.

Issues with existing msgr2.0-crc and msgr2.0-secure modes and
their resolution will be described in doc/dev/msgr2.rst.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Move to a 64-bit counter to avoid wrapping and having to reset
the session before the counter repeats.  This is in line with NIST
Recommendation for GCM [1]:

  "... this Recommendation suggests, but does not require, that
  the leading (i.e., leftmost) 32 bits of the IV hold the fixed
  field; and that the trailing (i.e., rightmost) 64 bits hold the
  invocation field."

See commit bb61e6a ("msg/async/ProtocolV2: avoid AES-GCM nonce
reuse vulnerabilities").

[1] https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38d.pdf

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
In both msgr2.0 and msgr2.1, segments can be empty.  In msgr2.1,
epilogue can be empty as well.

Handle both by calling the respective handler function directly
instead of allocating a buffer::ptr_node for an empty buffer and
passing that through READ[_RXBUF].

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
We aren't interested in peer_required_features anywhere outside
_handle_peer_banner_payload() -- once we know there is no mismatch,
it's all about peer_supported_features.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Use msgr2.1 if the peer supports it and fall back to msgr2.0
otherwise.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
}

preamble_block_t preamble;
fill_preamble(tag, preamble);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks this is the sole caller of fill_preamble(), so returning preamble_block_t seems pretty straightforward. Anyway, this is really a minor as the method is internal to FrameAssembler.

Copy link
Contributor

@rzarzynski rzarzynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

// more than one segment (i.e. at least one of second to fourth
// segments is not empty). In crc mode, it stores crcs for
// second to fourh segments; the preamble and the first segment
// are covered by their own crcs. In secure mode, the epilogue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the stuff got more complex. It might be worth adding here a single sentence about the krbd-driven rationale for having the first segment available (we have something like that in msgr2.rst but the comment would be much closer).
Still, not a blocker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it is true that this change is driven by the kernel client, we may want to resurrect the option of receiving into preallocated buffers in userspace at some point too. Having the header crc on the header itself is what msgr1 did (i.e. this goes back to the dawn of ceph), so I don't think the rationale needs stressing here. It's really just a regression fix.

@idryomov
Copy link
Contributor Author

A critical follow-on fix for the issue exposed by msgr2.1: #35816.

@idryomov idryomov deleted the wip-msgr21 branch July 8, 2020 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature messenger Issues involving one of the Ceph messenger implementations needs-qa wip-kefu-testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants