Skip to content

common: extract buffer and thrift proxy refactors to a separate commit#4344

Merged
zuercher merged 4 commits intoenvoyproxy:masterfrom
airbnb:jj-separate-buffer-changes
Sep 17, 2018
Merged

common: extract buffer and thrift proxy refactors to a separate commit#4344
zuercher merged 4 commits intoenvoyproxy:masterfrom
airbnb:jj-separate-buffer-changes

Conversation

@jkemv
Copy link
Copy Markdown
Contributor

@jkemv jkemv commented Sep 5, 2018

Description:
Extract common buffer helper functions.

This is split out and rebased with upstream as per discussion in: #3502 (comment). After this is merged, I'll rebase mj's branch and reopen the PR.

Risk Level: Medium

Testing:
Unit tests added for buffer and byte_order changes

Docs Changes:
N/A

Release Notes:
Per discussion with @zuercher this change itself shouldn't need an entry in version_history.rst

PTAL: @mattklein123 @zuercher @juchem

Signed-off-by: Jason Jian <jason.jian@airbnb.com>
@jkemv jkemv requested a review from zuercher as a code owner September 5, 2018 14:06
Copy link
Copy Markdown
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Thanks again for taking this on.

* Copy a little endian integer out of the buffer and drain the read data.
* @param Size how many bytes to read out of the buffer.
*/
template <typename T, size_t Size = sizeof(T)> T drainLEIntOut() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not drainLEInt?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

}

// convenience function that converts a host byte-order integer to little endian byte-order
template <typename T> inline T toLittleEndian(T value) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be used anywhere. (Same for fromLittleEndian/toBigEndian/fromBigEndian.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was used in the initial versions of the MySQL health checker, prior to moving this functionality to the buffer implementation

it can still prove itself to be useful, but I have no strong opinion on whether to keep it or ditch it

your call entirely

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you don't have a use for it in the mysql health checker, it should be removed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds fair

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

/**
* Copy an integer out of the buffer.
* @param start supplies the buffer index to start copying from.
* @param Size how many bytes to read out of the buffer.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to allow Size < sizeof(T), there should be a comment explaining what that does.

Copy link
Copy Markdown
Contributor

@juchem juchem Sep 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great point. @Jason-Jian can you provide some context in the inline docs based on the review thread? I can make crude attempt here, but you'd need to edit it big time

--- BEGIN COMMENT SNIPPET ---
Some protocols have integer fields whose size in bytes won't match the size in bytes of C++'s integer types.

Take a 3-byte integer field for example, which we want to represent as a 32-bit (4 bytes) integer. One option to deal with that situation is to read 4 bytes from the buffer and ignore 1. There are a few problems with that solution, though.

The first problem is buffer underflow: there may not be more than Size bytes available (say, last field in the payload), so that's an edge case to take into consideration.

The second problem is draining the buffer after reading. With the above solution we cannot read and discard in one go. We'd need to peek 4 bytes, ignore 1 and then drain 3. That not only looks hacky since the sizes don't match, but also produces less terse code and requires the caller to propagate that logic to all call sites.

Things complicate even further when endianness is taken into consideration: should the most or least-significant bytes be discarded? Dealing with this situation requires a high level of care and attention to detail. Properly calculating which bytes to discard and how to displace the data is not only error prone, but also shifts to the caller a burden that could be solved in a much more generic, transparent and well tested manner.

To make matters easier, the optional Size parameter can be specified in those situations where there's a need to read less bytes than a C++'s integer size in bytes.

For the most common case when one needs to read exactly as many bytes as the size of C++'s integer, this parameter can simply be omitted and it will be automatically deduced from the size of the type T.
--- END COMMENT SNIPPET ---

Copy link
Copy Markdown
Member

@zuercher zuercher Sep 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I still think it needs at least a brief comment. (Sorry. I thought the comment above was copied from the thread where I originally asked about this and didn't see the note at the top.)

Does the peekBEInt implementation need to be modified? As I read it, invoking peekBEInt<uint32_t, 3>() on a buffer containing (hex bytes) 01 02 03 04 will return 0x020304. If the goal is to support uncommon integer sizes, I think it should return 0x010203.

Also, if you call peekBEInt<uint32_t, 3>() on a buffer containing only 3 bytes, the length check will succeed, but it will try to read past the end of the buffer.

Copy link
Copy Markdown
Member

@zuercher zuercher Sep 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if this code should handle signed integers differently? If I invoke peekLEInt<int32_t, 3>() on a buffer containing FF FF FF do I get -1 or 0x00FFFFFF in my int32_t?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right on all 3 remarks, including the sign extension. We need to add these as test cases.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated comment snippet reflecting sign extension:

--- BEGIN COMMENT SNIPPET ---
Some protocols have integer fields whose size in bytes won't match the size in bytes of C++'s integer types.

Take a 3-byte integer field for example, which we want to represent as a 32-bit (4 bytes) integer. One option to deal with that situation is to read 4 bytes from the buffer and ignore 1. There are a few problems with that solution, though.

The first problem is buffer underflow: there may not be more than Size bytes available (say, last field in the payload), so that's an edge case to take into consideration.

The second problem is draining the buffer after reading. With the above solution we cannot read and discard in one go. We'd need to peek 4 bytes, ignore 1 and then drain 3. That not only looks hacky since the sizes don't match, but also produces less terse code and requires the caller to propagate that logic to all call sites.

Things complicate even further when endianness is taken into consideration: should the most or least-significant bytes be padded? Dealing with this situation requires a high level of care and attention to detail. Properly calculating which bytes to discard and how to displace the data is not only error prone, but also shifts to the caller a burden that could be solved in a much more generic, transparent and well tested manner.

The last problem in the list is sign extension, which should be properly handled when reading signed types with negative values.

To make matters easier, the optional Size parameter can be specified in those situations where there's a need to read less bytes than a C++'s integer size in bytes.

For the most common case when one needs to read exactly as many bytes as the size of C++'s integer, this parameter can simply be omitted and it will be automatically deduced from the size of the type T.
--- END COMMENT SNIPPET ---

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on this, thanks @juchem !

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yes I misread how the displacement was used. It does take the correct bytes for big-endian peek.

For sign extension, I don't think you need the ternary expression. You want to all_bits_enabled << (Size * CHAR_BIT). Otherwise for little endian you'll be modifying the low-order bits.

Also, perhaps static_cast<T>(~static_cast<T>(0)) can be reduced to just ~static_cast<T>(0)?

Copy link
Copy Markdown
Contributor

@juchem juchem Sep 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had problems with type promotion and the complement operator in the past. C and C++ have this annoying tendency to turn bitwise results into an int sometimes. Honestly, I can never remember whether it's needed before or after ~ so I just choose to err on the safe side since static_cast are compile-time evaluated anyway.

With regards to the ternary, I'm not so sure, but you may be right.

My thinking was this (3 bytes read into a 32-bit signed integer from buffer with f1 f2 f3 f4):

Big Endian:

  • result is initialized with 0;
  • bytes are read at displacement 1, result = 00 f1 f2 f3;
  • we need to fill the leftmost bytes due to big-endianness (most significant byte is to the left), therefore extension should be ff 00 00 00, so shift all_bits_enabled left by Size - 1 bytes;
  • merge bits and final result = ff f1 f2 f3.

Little Endian:

  • result is initialized with 0;
  • bytes are read at displacement 0, result = f1 f2 f3 00;
  • we need to fill the righttmost bytes due to little-endianness (most significant byte is to the right), therefore extension should be 00 00 00 ff, so shift all_bits_enabled right by Size - 1 bytes;
  • merge bits and final result = f1 f2 f3 ff.

The tests should surface that anyway, so I'm not too concerned.

Copy link
Copy Markdown
Contributor

@juchem juchem Sep 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this quick test with regards to the casts: https://ideone.com/IyiRQp

Look at the result for char. For types narrower than int, without the static_cast type promotion would make everything an int or unsigned int, which is not quite what we want since we need to preserve the requested type.

namespace Buffer {
namespace {

TEST(BufferHelperTest, PeekI8) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with the comment in buffer.h, if buffer.peekLEInt<uint32_t, 2> is to be allowed, please add tests to exercise it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding

}
}

inline void addString(Buffer::Instance& buffer, const std::string& s) { buffer.add(s); }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not move this one into common: people should just call buffer.add(s). If you want to leave it in the thrift code I'll remove it later.

I added it for conformity with the other utility methods, but because many of them are now just methods on Buffer there's no reason to provide this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed and updated references


inline void addString(Buffer::Instance& buffer, const std::string& s) { buffer.add(s); }

inline std::string bufferToString(Buffer::Instance& buffer) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, I think people can just call buffer.toString().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Signed-off-by: Jason Jian <jason.jian@airbnb.com>
@jkemv
Copy link
Copy Markdown
Contributor Author

jkemv commented Sep 7, 2018

@zuercher I just pushed the commit addressing the feedback. Please take a look when you get a chance, thanks! cc: @juchem

Copy link
Copy Markdown
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. I think the implementation needs to be fixed for big-endian hosts, though.

<< (Size * CHAR_BIT))
: static_cast<T>(0);

return fromEndianness<Endianness>(static_cast<T>(result | extension));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this only work because the machines we're running tests on are little-endian?

On an LE host, peekBE<int32_t, 3> with a buffer containing F1 02 03 would produce result == 0x0302F100 (because BE to host conversion has not happened yet) and extension == 0xFF. This yields result | extension == 0x0302F1FF, which, after conversion is 0xFFF10203. That's correct.

On a BE host, peekBE<int32_t, 3> with a buffer containing F1 02 03 would produce result == 0x00F10203 (because the host is BE), but extension == 0xFF yields result | extension == 0x00F102FF.

I think the correct implementation would be to always compute extension without regard to endianness (as the correct number of high order bits) and apply it after result has been converted to the host byte order:

  constexpr const auto extension = 
      std::is_signed<T>::value && Size < sizeof(T) && bytes[most_significant_read_byte] < 0
          ? static_cast<T>(
                  static_cast<typename std::make_unsigned<T>::type>(all_bits_enabled) << (Size * CHAR_BIT))
          : static_cast<T>(0);

  // this may require additional casting:
  return fromEndianness<Endianness>(static_cast<T>(result)) | extension;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're absolutely correct. while helping Jason out yesterday debugging some tests I've realized my brain fart on the initial sign extension code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, this was my bad, @juchem actually pointed it out yesterday but I ended up adding it back when working on big endian test cases, without realizing the host endianness factor. I'll correct it now. Thanks!

Copy link
Copy Markdown
Contributor

@juchem juchem Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, there's an additional detail that @zuercher pointed out which we missed yesterday
we must merge the extension bits after the endianness conversion, not before (move | extension outside of the call to fromEndianness, as specified in his code snippet)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep yep, this is the part I failed to realize when debugging BE test failures.

Signed-off-by: Jason Jian <jason.jian@airbnb.com>
@jkemv
Copy link
Copy Markdown
Contributor Author

jkemv commented Sep 8, 2018

@zuercher PTAL
Do you need me to rebase & squash the commits? Let me know. Thanks for all the feedback!

@zuercher
Copy link
Copy Markdown
Member

We prefer to leave the history in PRs, since squashing sometimes destroys comment history. It'll get squashed when we merge.

zuercher
zuercher previously approved these changes Sep 10, 2018
Copy link
Copy Markdown
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I think perhaps the detailed comment on peekInt is too detailed? Really just needs to be a brief explanation on why you'd ever want to set Size < sizeof(T) and that it will sign extend signed ints if you do.

@zuercher
Copy link
Copy Markdown
Member

@alyssawilk PTAL

@zuercher zuercher assigned alyssawilk and unassigned alyssawilk Sep 10, 2018
@zuercher
Copy link
Copy Markdown
Member

@dnoe do you have time to take a look at this (primarily the actual buffer changes -- the mods to thrift_proxy to use the new code less important).

Copy link
Copy Markdown
Contributor

@dnoe dnoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the buffer stuff - basically everything in this PR not under filters/network/thrift_proxy. I intentionally didn't review that to keep my eyes from glazing over - I'm not familiar with Thrift at all.

constexpr const auto displacement = Endianness == ByteOrder::BigEndian ? sizeof(T) - Size : 0;

auto result = static_cast<T>(0);
auto bytes = reinterpret_cast<char*>(std::addressof(result));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't typically done in Envoy in my experience, but I think this usage of auto calls out for explicitly setting the expectation about whether the result in a pointer or not, ie, auto* vs just auto. It will make things more obvious to the reader and also prevent inadvertent errors where T is some unexpected type.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well make it char*, since it always will be.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to char *

constexpr const auto most_significant_read_byte =
Endianness == ByteOrder::BigEndian ? displacement : Size - 1;

constexpr const auto all_bits_enabled = static_cast<T>(~static_cast<T>(0));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about moving this declaration up to just below where result is defined? It is the negation of the starting value of result, so putting them together makes some sense.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

*/
template <typename T, ByteOrder Endianness = ByteOrder::Host, size_t Size = sizeof(T)>
T drainInt() {
auto const result = peekInt<T, Endianness, Size>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

*/
template <ByteOrder Endianness = ByteOrder::Host, typename T, size_t Size = sizeof(T)>
void writeInt(T value) {
static_assert(Size <= sizeof(T), "requested size is bigger than integer being read");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this error a bit confusing: should it say being written?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right

void writeInt(T value) {
static_assert(Size <= sizeof(T), "requested size is bigger than integer being read");

auto const data = toEndianness<Endianness>(value);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converting, const auto looks more consistent with code base


constexpr const auto all_bits_enabled = static_cast<T>(~static_cast<T>(0));

const auto extension =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be extra clear, let's call this sign_extension_bits.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Signed-off-by: Jason Jian <jason.jian@airbnb.com>
@jkemv
Copy link
Copy Markdown
Contributor Author

jkemv commented Sep 14, 2018

@zuercher @dnoe PTAL. Sorry it took me a while to address the comments as I was on a trip earlier this week.

@zuercher zuercher merged commit 2ee64ff into envoyproxy:master Sep 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants