common: extract buffer and thrift proxy refactors to a separate commit#4344
common: extract buffer and thrift proxy refactors to a separate commit#4344zuercher merged 4 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Jason Jian <jason.jian@airbnb.com>
zuercher
left a comment
There was a problem hiding this comment.
Cool. Thanks again for taking this on.
include/envoy/buffer/buffer.h
Outdated
| * Copy a little endian integer out of the buffer and drain the read data. | ||
| * @param Size how many bytes to read out of the buffer. | ||
| */ | ||
| template <typename T, size_t Size = sizeof(T)> T drainLEIntOut() { |
source/common/common/byte_order.h
Outdated
| } | ||
|
|
||
| // convenience function that converts a host byte-order integer to little endian byte-order | ||
| template <typename T> inline T toLittleEndian(T value) { |
There was a problem hiding this comment.
This doesn't seem to be used anywhere. (Same for fromLittleEndian/toBigEndian/fromBigEndian.)
There was a problem hiding this comment.
it was used in the initial versions of the MySQL health checker, prior to moving this functionality to the buffer implementation
it can still prove itself to be useful, but I have no strong opinion on whether to keep it or ditch it
your call entirely
There was a problem hiding this comment.
I think if you don't have a use for it in the mysql health checker, it should be removed.
| /** | ||
| * Copy an integer out of the buffer. | ||
| * @param start supplies the buffer index to start copying from. | ||
| * @param Size how many bytes to read out of the buffer. |
There was a problem hiding this comment.
If we're going to allow Size < sizeof(T), there should be a comment explaining what that does.
There was a problem hiding this comment.
great point. @Jason-Jian can you provide some context in the inline docs based on the review thread? I can make crude attempt here, but you'd need to edit it big time
--- BEGIN COMMENT SNIPPET ---
Some protocols have integer fields whose size in bytes won't match the size in bytes of C++'s integer types.
Take a 3-byte integer field for example, which we want to represent as a 32-bit (4 bytes) integer. One option to deal with that situation is to read 4 bytes from the buffer and ignore 1. There are a few problems with that solution, though.
The first problem is buffer underflow: there may not be more than Size bytes available (say, last field in the payload), so that's an edge case to take into consideration.
The second problem is draining the buffer after reading. With the above solution we cannot read and discard in one go. We'd need to peek 4 bytes, ignore 1 and then drain 3. That not only looks hacky since the sizes don't match, but also produces less terse code and requires the caller to propagate that logic to all call sites.
Things complicate even further when endianness is taken into consideration: should the most or least-significant bytes be discarded? Dealing with this situation requires a high level of care and attention to detail. Properly calculating which bytes to discard and how to displace the data is not only error prone, but also shifts to the caller a burden that could be solved in a much more generic, transparent and well tested manner.
To make matters easier, the optional Size parameter can be specified in those situations where there's a need to read less bytes than a C++'s integer size in bytes.
For the most common case when one needs to read exactly as many bytes as the size of C++'s integer, this parameter can simply be omitted and it will be automatically deduced from the size of the type T.
--- END COMMENT SNIPPET ---
There was a problem hiding this comment.
Ok. I still think it needs at least a brief comment. (Sorry. I thought the comment above was copied from the thread where I originally asked about this and didn't see the note at the top.)
Does the peekBEInt implementation need to be modified? As I read it, invoking peekBEInt<uint32_t, 3>() on a buffer containing (hex bytes) 01 02 03 04 will return 0x020304. If the goal is to support uncommon integer sizes, I think it should return 0x010203.
Also, if you call peekBEInt<uint32_t, 3>() on a buffer containing only 3 bytes, the length check will succeed, but it will try to read past the end of the buffer.
There was a problem hiding this comment.
I also wonder if this code should handle signed integers differently? If I invoke peekLEInt<int32_t, 3>() on a buffer containing FF FF FF do I get -1 or 0x00FFFFFF in my int32_t?
There was a problem hiding this comment.
You're right on all 3 remarks, including the sign extension. We need to add these as test cases.
There was a problem hiding this comment.
updated comment snippet reflecting sign extension:
--- BEGIN COMMENT SNIPPET ---
Some protocols have integer fields whose size in bytes won't match the size in bytes of C++'s integer types.
Take a 3-byte integer field for example, which we want to represent as a 32-bit (4 bytes) integer. One option to deal with that situation is to read 4 bytes from the buffer and ignore 1. There are a few problems with that solution, though.
The first problem is buffer underflow: there may not be more than Size bytes available (say, last field in the payload), so that's an edge case to take into consideration.
The second problem is draining the buffer after reading. With the above solution we cannot read and discard in one go. We'd need to peek 4 bytes, ignore 1 and then drain 3. That not only looks hacky since the sizes don't match, but also produces less terse code and requires the caller to propagate that logic to all call sites.
Things complicate even further when endianness is taken into consideration: should the most or least-significant bytes be padded? Dealing with this situation requires a high level of care and attention to detail. Properly calculating which bytes to discard and how to displace the data is not only error prone, but also shifts to the caller a burden that could be solved in a much more generic, transparent and well tested manner.
The last problem in the list is sign extension, which should be properly handled when reading signed types with negative values.
To make matters easier, the optional Size parameter can be specified in those situations where there's a need to read less bytes than a C++'s integer size in bytes.
For the most common case when one needs to read exactly as many bytes as the size of C++'s integer, this parameter can simply be omitted and it will be automatically deduced from the size of the type T.
--- END COMMENT SNIPPET ---
There was a problem hiding this comment.
Sorry, yes I misread how the displacement was used. It does take the correct bytes for big-endian peek.
For sign extension, I don't think you need the ternary expression. You want to all_bits_enabled << (Size * CHAR_BIT). Otherwise for little endian you'll be modifying the low-order bits.
Also, perhaps static_cast<T>(~static_cast<T>(0)) can be reduced to just ~static_cast<T>(0)?
There was a problem hiding this comment.
I had problems with type promotion and the complement operator in the past. C and C++ have this annoying tendency to turn bitwise results into an int sometimes. Honestly, I can never remember whether it's needed before or after ~ so I just choose to err on the safe side since static_cast are compile-time evaluated anyway.
With regards to the ternary, I'm not so sure, but you may be right.
My thinking was this (3 bytes read into a 32-bit signed integer from buffer with f1 f2 f3 f4):
Big Endian:
resultis initialized with0;- bytes are read at displacement
1,result = 00 f1 f2 f3; - we need to fill the leftmost bytes due to big-endianness (most significant byte is to the left), therefore
extensionshould beff 00 00 00, so shiftall_bits_enabledleft bySize - 1bytes; - merge bits and final
result = ff f1 f2 f3.
Little Endian:
resultis initialized with0;- bytes are read at displacement
0,result = f1 f2 f3 00; - we need to fill the righttmost bytes due to little-endianness (most significant byte is to the right), therefore
extensionshould be00 00 00 ff, so shiftall_bits_enabledright bySize - 1bytes; - merge bits and final
result = f1 f2 f3 ff.
The tests should surface that anyway, so I'm not too concerned.
There was a problem hiding this comment.
I wrote this quick test with regards to the casts: https://ideone.com/IyiRQp
Look at the result for char. For types narrower than int, without the static_cast type promotion would make everything an int or unsigned int, which is not quite what we want since we need to preserve the requested type.
| namespace Buffer { | ||
| namespace { | ||
|
|
||
| TEST(BufferHelperTest, PeekI8) { |
There was a problem hiding this comment.
Along with the comment in buffer.h, if buffer.peekLEInt<uint32_t, 2> is to be allowed, please add tests to exercise it.
test/common/buffer/utility.h
Outdated
| } | ||
| } | ||
|
|
||
| inline void addString(Buffer::Instance& buffer, const std::string& s) { buffer.add(s); } |
There was a problem hiding this comment.
Let's not move this one into common: people should just call buffer.add(s). If you want to leave it in the thrift code I'll remove it later.
I added it for conformity with the other utility methods, but because many of them are now just methods on Buffer there's no reason to provide this.
There was a problem hiding this comment.
removed and updated references
test/common/buffer/utility.h
Outdated
|
|
||
| inline void addString(Buffer::Instance& buffer, const std::string& s) { buffer.add(s); } | ||
|
|
||
| inline std::string bufferToString(Buffer::Instance& buffer) { |
There was a problem hiding this comment.
Likewise, I think people can just call buffer.toString().
Signed-off-by: Jason Jian <jason.jian@airbnb.com>
zuercher
left a comment
There was a problem hiding this comment.
Almost there. I think the implementation needs to be fixed for big-endian hosts, though.
include/envoy/buffer/buffer.h
Outdated
| << (Size * CHAR_BIT)) | ||
| : static_cast<T>(0); | ||
|
|
||
| return fromEndianness<Endianness>(static_cast<T>(result | extension)); |
There was a problem hiding this comment.
Doesn't this only work because the machines we're running tests on are little-endian?
On an LE host, peekBE<int32_t, 3> with a buffer containing F1 02 03 would produce result == 0x0302F100 (because BE to host conversion has not happened yet) and extension == 0xFF. This yields result | extension == 0x0302F1FF, which, after conversion is 0xFFF10203. That's correct.
On a BE host, peekBE<int32_t, 3> with a buffer containing F1 02 03 would produce result == 0x00F10203 (because the host is BE), but extension == 0xFF yields result | extension == 0x00F102FF.
I think the correct implementation would be to always compute extension without regard to endianness (as the correct number of high order bits) and apply it after result has been converted to the host byte order:
constexpr const auto extension =
std::is_signed<T>::value && Size < sizeof(T) && bytes[most_significant_read_byte] < 0
? static_cast<T>(
static_cast<typename std::make_unsigned<T>::type>(all_bits_enabled) << (Size * CHAR_BIT))
: static_cast<T>(0);
// this may require additional casting:
return fromEndianness<Endianness>(static_cast<T>(result)) | extension;
There was a problem hiding this comment.
you're absolutely correct. while helping Jason out yesterday debugging some tests I've realized my brain fart on the initial sign extension code
There was a problem hiding this comment.
ah, this was my bad, @juchem actually pointed it out yesterday but I ended up adding it back when working on big endian test cases, without realizing the host endianness factor. I'll correct it now. Thanks!
There was a problem hiding this comment.
actually, there's an additional detail that @zuercher pointed out which we missed yesterday
we must merge the extension bits after the endianness conversion, not before (move | extension outside of the call to fromEndianness, as specified in his code snippet)
There was a problem hiding this comment.
Yep yep, this is the part I failed to realize when debugging BE test failures.
Signed-off-by: Jason Jian <jason.jian@airbnb.com>
|
@zuercher PTAL |
|
We prefer to leave the history in PRs, since squashing sometimes destroys comment history. It'll get squashed when we merge. |
zuercher
left a comment
There was a problem hiding this comment.
Looks good to me. I think perhaps the detailed comment on peekInt is too detailed? Really just needs to be a brief explanation on why you'd ever want to set Size < sizeof(T) and that it will sign extend signed ints if you do.
|
@alyssawilk PTAL |
|
@dnoe do you have time to take a look at this (primarily the actual buffer changes -- the mods to thrift_proxy to use the new code less important). |
dnoe
left a comment
There was a problem hiding this comment.
Reviewed the buffer stuff - basically everything in this PR not under filters/network/thrift_proxy. I intentionally didn't review that to keep my eyes from glazing over - I'm not familiar with Thrift at all.
include/envoy/buffer/buffer.h
Outdated
| constexpr const auto displacement = Endianness == ByteOrder::BigEndian ? sizeof(T) - Size : 0; | ||
|
|
||
| auto result = static_cast<T>(0); | ||
| auto bytes = reinterpret_cast<char*>(std::addressof(result)); |
There was a problem hiding this comment.
This isn't typically done in Envoy in my experience, but I think this usage of auto calls out for explicitly setting the expectation about whether the result in a pointer or not, ie, auto* vs just auto. It will make things more obvious to the reader and also prevent inadvertent errors where T is some unexpected type.
There was a problem hiding this comment.
Might as well make it char*, since it always will be.
include/envoy/buffer/buffer.h
Outdated
| constexpr const auto most_significant_read_byte = | ||
| Endianness == ByteOrder::BigEndian ? displacement : Size - 1; | ||
|
|
||
| constexpr const auto all_bits_enabled = static_cast<T>(~static_cast<T>(0)); |
There was a problem hiding this comment.
What do you think about moving this declaration up to just below where result is defined? It is the negation of the starting value of result, so putting them together makes some sense.
include/envoy/buffer/buffer.h
Outdated
| */ | ||
| template <typename T, ByteOrder Endianness = ByteOrder::Host, size_t Size = sizeof(T)> | ||
| T drainInt() { | ||
| auto const result = peekInt<T, Endianness, Size>(); |
include/envoy/buffer/buffer.h
Outdated
| */ | ||
| template <ByteOrder Endianness = ByteOrder::Host, typename T, size_t Size = sizeof(T)> | ||
| void writeInt(T value) { | ||
| static_assert(Size <= sizeof(T), "requested size is bigger than integer being read"); |
There was a problem hiding this comment.
I find this error a bit confusing: should it say being written?
include/envoy/buffer/buffer.h
Outdated
| void writeInt(T value) { | ||
| static_assert(Size <= sizeof(T), "requested size is bigger than integer being read"); | ||
|
|
||
| auto const data = toEndianness<Endianness>(value); |
There was a problem hiding this comment.
converting, const auto looks more consistent with code base
include/envoy/buffer/buffer.h
Outdated
|
|
||
| constexpr const auto all_bits_enabled = static_cast<T>(~static_cast<T>(0)); | ||
|
|
||
| const auto extension = |
There was a problem hiding this comment.
To be extra clear, let's call this sign_extension_bits.
Signed-off-by: Jason Jian <jason.jian@airbnb.com>
Description:
Extract common buffer helper functions.
This is split out and rebased with upstream as per discussion in: #3502 (comment). After this is merged, I'll rebase mj's branch and reopen the PR.
Risk Level: Medium
Testing:
Unit tests added for buffer and byte_order changes
Docs Changes:
N/A
Release Notes:
Per discussion with @zuercher this change itself shouldn't need an entry in version_history.rst
PTAL: @mattklein123 @zuercher @juchem