common: extract buffer and thrift proxy refactors to a separate commit by jkemv · Pull Request #4344 · envoyproxy/envoy

jkemv · 2018-09-05T14:06:32Z

Description:
Extract common buffer helper functions.

This is split out and rebased with upstream as per discussion in: #3502 (comment). After this is merged, I'll rebase mj's branch and reopen the PR.

Risk Level: Medium

Testing:
Unit tests added for buffer and byte_order changes

Docs Changes:
N/A

Release Notes:
Per discussion with @zuercher this change itself shouldn't need an entry in version_history.rst

PTAL: @mattklein123 @zuercher @juchem

Signed-off-by: Jason Jian <jason.jian@airbnb.com>

zuercher

Cool. Thanks again for taking this on.

zuercher · 2018-09-05T16:23:18Z

include/envoy/buffer/buffer.h

+   * Copy a little endian integer out of the buffer and drain the read data.
+   * @param Size how many bytes to read out of the buffer.
+   */
+  template <typename T, size_t Size = sizeof(T)> T drainLEIntOut() {


Why not drainLEInt?

zuercher · 2018-09-05T16:27:20Z

source/common/common/byte_order.h

+}
+
+// convenience function that converts a host byte-order integer to little endian byte-order
+template <typename T> inline T toLittleEndian(T value) {


This doesn't seem to be used anywhere. (Same for fromLittleEndian/toBigEndian/fromBigEndian.)

it was used in the initial versions of the MySQL health checker, prior to moving this functionality to the buffer implementation

it can still prove itself to be useful, but I have no strong opinion on whether to keep it or ditch it

your call entirely

I think if you don't have a use for it in the mysql health checker, it should be removed.

sounds fair

zuercher · 2018-09-05T16:35:00Z

include/envoy/buffer/buffer.h

+  /**
+   * Copy an integer out of the buffer.
+   * @param start supplies the buffer index to start copying from.
+   * @param Size how many bytes to read out of the buffer.


If we're going to allow Size < sizeof(T), there should be a comment explaining what that does.

great point. @Jason-Jian can you provide some context in the inline docs based on the review thread? I can make crude attempt here, but you'd need to edit it big time

--- BEGIN COMMENT SNIPPET ---
Some protocols have integer fields whose size in bytes won't match the size in bytes of C++'s integer types.

Take a 3-byte integer field for example, which we want to represent as a 32-bit (4 bytes) integer. One option to deal with that situation is to read 4 bytes from the buffer and ignore 1. There are a few problems with that solution, though.

The first problem is buffer underflow: there may not be more than Size bytes available (say, last field in the payload), so that's an edge case to take into consideration.

The second problem is draining the buffer after reading. With the above solution we cannot read and discard in one go. We'd need to peek 4 bytes, ignore 1 and then drain 3. That not only looks hacky since the sizes don't match, but also produces less terse code and requires the caller to propagate that logic to all call sites.

Things complicate even further when endianness is taken into consideration: should the most or least-significant bytes be discarded? Dealing with this situation requires a high level of care and attention to detail. Properly calculating which bytes to discard and how to displace the data is not only error prone, but also shifts to the caller a burden that could be solved in a much more generic, transparent and well tested manner.

To make matters easier, the optional Size parameter can be specified in those situations where there's a need to read less bytes than a C++'s integer size in bytes.

For the most common case when one needs to read exactly as many bytes as the size of C++'s integer, this parameter can simply be omitted and it will be automatically deduced from the size of the type T.
--- END COMMENT SNIPPET ---

Ok. ~~I still think it needs at least a brief comment.~~ (Sorry. I thought the comment above was copied from the thread where I originally asked about this and didn't see the note at the top.)

Does the peekBEInt implementation need to be modified? As I read it, invoking peekBEInt<uint32_t, 3>() on a buffer containing (hex bytes) 01 02 03 04 will return 0x020304. If the goal is to support uncommon integer sizes, I think it should return 0x010203.

Also, if you call peekBEInt<uint32_t, 3>() on a buffer containing only 3 bytes, the length check will succeed, but it will try to read past the end of the buffer.

I also wonder if this code should handle signed integers differently? If I invoke peekLEInt<int32_t, 3>() on a buffer containing FF FF FF do I get -1 or 0x00FFFFFF in my int32_t?

You're right on all 3 remarks, including the sign extension. We need to add these as test cases.

updated comment snippet reflecting sign extension:

--- BEGIN COMMENT SNIPPET ---
Some protocols have integer fields whose size in bytes won't match the size in bytes of C++'s integer types.

Take a 3-byte integer field for example, which we want to represent as a 32-bit (4 bytes) integer. One option to deal with that situation is to read 4 bytes from the buffer and ignore 1. There are a few problems with that solution, though.

The first problem is buffer underflow: there may not be more than Size bytes available (say, last field in the payload), so that's an edge case to take into consideration.

The second problem is draining the buffer after reading. With the above solution we cannot read and discard in one go. We'd need to peek 4 bytes, ignore 1 and then drain 3. That not only looks hacky since the sizes don't match, but also produces less terse code and requires the caller to propagate that logic to all call sites.

Things complicate even further when endianness is taken into consideration: should the most or least-significant bytes be padded? Dealing with this situation requires a high level of care and attention to detail. Properly calculating which bytes to discard and how to displace the data is not only error prone, but also shifts to the caller a burden that could be solved in a much more generic, transparent and well tested manner.

The last problem in the list is sign extension, which should be properly handled when reading signed types with negative values.

To make matters easier, the optional Size parameter can be specified in those situations where there's a need to read less bytes than a C++'s integer size in bytes.

For the most common case when one needs to read exactly as many bytes as the size of C++'s integer, this parameter can simply be omitted and it will be automatically deduced from the size of the type T.
--- END COMMENT SNIPPET ---

Working on this, thanks @juchem !

Sorry, yes I misread how the displacement was used. It does take the correct bytes for big-endian peek.

For sign extension, I don't think you need the ternary expression. You want to all_bits_enabled << (Size * CHAR_BIT). Otherwise for little endian you'll be modifying the low-order bits.

Also, perhaps static_cast<T>(~static_cast<T>(0)) can be reduced to just ~static_cast<T>(0)?

I had problems with type promotion and the complement operator in the past. C and C++ have this annoying tendency to turn bitwise results into an int sometimes. Honestly, I can never remember whether it's needed before or after ~ so I just choose to err on the safe side since static_cast are compile-time evaluated anyway.

With regards to the ternary, I'm not so sure, but you may be right.

My thinking was this (3 bytes read into a 32-bit signed integer from buffer with f1 f2 f3 f4):

Big Endian:

result is initialized with 0;

bytes are read at displacement 1, result = 00 f1 f2 f3;

we need to fill the leftmost bytes due to big-endianness (most significant byte is to the left), therefore extension should be ff 00 00 00, so shift all_bits_enabled left by Size - 1 bytes;

merge bits and final result = ff f1 f2 f3.

Little Endian:

result is initialized with 0;

bytes are read at displacement 0, result = f1 f2 f3 00;

we need to fill the righttmost bytes due to little-endianness (most significant byte is to the right), therefore extension should be 00 00 00 ff, so shift all_bits_enabled right by Size - 1 bytes;

merge bits and final result = f1 f2 f3 ff.

The tests should surface that anyway, so I'm not too concerned.

I wrote this quick test with regards to the casts: https://ideone.com/IyiRQp

Look at the result for char. For types narrower than int, without the static_cast type promotion would make everything an int or unsigned int, which is not quite what we want since we need to preserve the requested type.

zuercher · 2018-09-05T17:07:39Z

test/common/buffer/buffer_test.cc

+namespace Buffer {
+namespace {
+
+TEST(BufferHelperTest, PeekI8) {


Along with the comment in buffer.h, if buffer.peekLEInt<uint32_t, 2> is to be allowed, please add tests to exercise it.

zuercher · 2018-09-05T17:13:37Z

test/common/buffer/utility.h

+  }
+}
+
+inline void addString(Buffer::Instance& buffer, const std::string& s) { buffer.add(s); }


Let's not move this one into common: people should just call buffer.add(s). If you want to leave it in the thrift code I'll remove it later.

I added it for conformity with the other utility methods, but because many of them are now just methods on Buffer there's no reason to provide this.

removed and updated references

zuercher · 2018-09-05T17:14:24Z

test/common/buffer/utility.h

+
+inline void addString(Buffer::Instance& buffer, const std::string& s) { buffer.add(s); }
+
+inline std::string bufferToString(Buffer::Instance& buffer) {


Likewise, I think people can just call buffer.toString().

same as above

Signed-off-by: Jason Jian <jason.jian@airbnb.com>

jkemv · 2018-09-07T06:24:38Z

@zuercher I just pushed the commit addressing the feedback. Please take a look when you get a chance, thanks! cc: @juchem

zuercher

Almost there. I think the implementation needs to be fixed for big-endian hosts, though.

zuercher · 2018-09-07T17:00:58Z

include/envoy/buffer/buffer.h

+                        << (Size * CHAR_BIT))
+            : static_cast<T>(0);
+
+    return fromEndianness<Endianness>(static_cast<T>(result | extension));


Doesn't this only work because the machines we're running tests on are little-endian?

On an LE host, peekBE<int32_t, 3> with a buffer containing F1 02 03 would produce result == 0x0302F100 (because BE to host conversion has not happened yet) and extension == 0xFF. This yields result | extension == 0x0302F1FF, which, after conversion is 0xFFF10203. That's correct.

On a BE host, peekBE<int32_t, 3> with a buffer containing F1 02 03 would produce result == 0x00F10203 (because the host is BE), but extension == 0xFF yields result | extension == 0x00F102FF.

I think the correct implementation would be to always compute extension without regard to endianness (as the correct number of high order bits) and apply it after result has been converted to the host byte order:

constexpr const auto extension = std::is_signed<T>::value && Size < sizeof(T) && bytes[most_significant_read_byte] < 0 ? static_cast<T>( static_cast<typename std::make_unsigned<T>::type>(all_bits_enabled) << (Size * CHAR_BIT)) : static_cast<T>(0); // this may require additional casting: return fromEndianness<Endianness>(static_cast<T>(result)) | extension;

you're absolutely correct. while helping Jason out yesterday debugging some tests I've realized my brain fart on the initial sign extension code

ah, this was my bad, @juchem actually pointed it out yesterday but I ended up adding it back when working on big endian test cases, without realizing the host endianness factor. I'll correct it now. Thanks!

actually, there's an additional detail that @zuercher pointed out which we missed yesterday
we must merge the extension bits after the endianness conversion, not before (move | extension outside of the call to fromEndianness, as specified in his code snippet)

Yep yep, this is the part I failed to realize when debugging BE test failures.

Signed-off-by: Jason Jian <jason.jian@airbnb.com>

jkemv · 2018-09-08T00:01:00Z

@zuercher PTAL
Do you need me to rebase & squash the commits? Let me know. Thanks for all the feedback!

zuercher · 2018-09-10T16:29:52Z

We prefer to leave the history in PRs, since squashing sometimes destroys comment history. It'll get squashed when we merge.

zuercher

Looks good to me. I think perhaps the detailed comment on peekInt is too detailed? Really just needs to be a brief explanation on why you'd ever want to set Size < sizeof(T) and that it will sign extend signed ints if you do.

zuercher · 2018-09-10T17:12:42Z

@alyssawilk PTAL

zuercher · 2018-09-10T22:37:55Z

@dnoe do you have time to take a look at this (primarily the actual buffer changes -- the mods to thrift_proxy to use the new code less important).

dnoe

Reviewed the buffer stuff - basically everything in this PR not under filters/network/thrift_proxy. I intentionally didn't review that to keep my eyes from glazing over - I'm not familiar with Thrift at all.

dnoe · 2018-09-12T01:58:05Z

include/envoy/buffer/buffer.h

+    constexpr const auto displacement = Endianness == ByteOrder::BigEndian ? sizeof(T) - Size : 0;
+
+    auto result = static_cast<T>(0);
+    auto bytes = reinterpret_cast<char*>(std::addressof(result));


This isn't typically done in Envoy in my experience, but I think this usage of auto calls out for explicitly setting the expectation about whether the result in a pointer or not, ie, auto* vs just auto. It will make things more obvious to the reader and also prevent inadvertent errors where T is some unexpected type.

Might as well make it char*, since it always will be.

changed to char *

dnoe · 2018-09-12T02:12:23Z

include/envoy/buffer/buffer.h

+    constexpr const auto most_significant_read_byte =
+        Endianness == ByteOrder::BigEndian ? displacement : Size - 1;
+
+    constexpr const auto all_bits_enabled = static_cast<T>(~static_cast<T>(0));


What do you think about moving this declaration up to just below where result is defined? It is the negation of the starting value of result, so putting them together makes some sense.

dnoe · 2018-09-12T02:20:47Z

include/envoy/buffer/buffer.h

+   */
+  template <typename T, ByteOrder Endianness = ByteOrder::Host, size_t Size = sizeof(T)>
+  T drainInt() {
+    auto const result = peekInt<T, Endianness, Size>();


const auto?

dnoe · 2018-09-12T02:25:22Z

include/envoy/buffer/buffer.h

+   */
+  template <ByteOrder Endianness = ByteOrder::Host, typename T, size_t Size = sizeof(T)>
+  void writeInt(T value) {
+    static_assert(Size <= sizeof(T), "requested size is bigger than integer being read");


I find this error a bit confusing: should it say being written?

you're right

dnoe · 2018-09-12T02:25:52Z

include/envoy/buffer/buffer.h

+  void writeInt(T value) {
+    static_assert(Size <= sizeof(T), "requested size is bigger than integer being read");
+
+    auto const data = toEndianness<Endianness>(value);


const auto ?

converting, const auto looks more consistent with code base

source/common/common/byte_order.h

dnoe · 2018-09-12T02:52:29Z

include/envoy/buffer/buffer.h

+
+    constexpr const auto all_bits_enabled = static_cast<T>(~static_cast<T>(0));
+
+    const auto extension =


To be extra clear, let's call this sign_extension_bits.

Signed-off-by: Jason Jian <jason.jian@airbnb.com>

jkemv · 2018-09-14T23:07:01Z

@zuercher @dnoe PTAL. Sorry it took me a while to address the comments as I was on a trip earlier this week.

common: buffer and thrift proxy refactors to a separate commit

4554ad1

Signed-off-by: Jason Jian <jason.jian@airbnb.com>

jkemv requested a review from zuercher as a code owner September 5, 2018 14:06

dnoe assigned zuercher Sep 5, 2018

zuercher reviewed Sep 5, 2018

View reviewed changes

zuercher mentioned this pull request Sep 5, 2018

dubbo_proxy: implement dubbo proxy #4200

Closed

common: address buffer refactor feedback

68042d2

Signed-off-by: Jason Jian <jason.jian@airbnb.com>

zuercher requested changes Sep 7, 2018

View reviewed changes

common: buffer.h extension and casting fix

6f1cc56

Signed-off-by: Jason Jian <jason.jian@airbnb.com>

zuercher previously approved these changes Sep 10, 2018

View reviewed changes

zuercher assigned alyssawilk and unassigned alyssawilk Sep 10, 2018

zuercher assigned dnoe Sep 11, 2018

dnoe reviewed Sep 12, 2018

View reviewed changes

zuercher mentioned this pull request Sep 14, 2018

thrift: make test use Buffer::Instance::toString() #4419

Closed

common: addressing feedback

17bfbd9

Signed-off-by: Jason Jian <jason.jian@airbnb.com>

jkemv dismissed zuercher’s stale review via 17bfbd9 September 14, 2018 23:04

zuercher approved these changes Sep 17, 2018

View reviewed changes

dnoe approved these changes Sep 17, 2018

View reviewed changes

zuercher merged commit 2ee64ff into envoyproxy:master Sep 17, 2018


		inline void addString(Buffer::Instance& buffer, const std::string& s) { buffer.add(s); }

		inline std::string bufferToString(Buffer::Instance& buffer) {


		constexpr const auto all_bits_enabled = static_cast<T>(~static_cast<T>(0));

		const auto extension =

Conversation

jkemv commented Sep 5, 2018

Uh oh!

zuercher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juchem Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zuercher Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zuercher Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juchem Sep 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juchem Sep 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkemv commented Sep 7, 2018

Uh oh!

zuercher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juchem Sep 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkemv commented Sep 8, 2018

juchem Sep 5, 2018 •

edited

Loading

zuercher Sep 5, 2018 •

edited

Loading

zuercher Sep 5, 2018 •

edited

Loading

juchem Sep 6, 2018 •

edited

Loading

juchem Sep 6, 2018 •

edited

Loading

juchem Sep 7, 2018 •

edited

Loading