Skip to content

perf: provide custom implementation for creating UUIDs.#1401

Merged
RomanDzhabarov merged 9 commits intoenvoyproxy:masterfrom
PiotrSikora:uuid
Aug 7, 2017
Merged

perf: provide custom implementation for creating UUIDs.#1401
RomanDzhabarov merged 9 commits intoenvoyproxy:masterfrom
PiotrSikora:uuid

Conversation

@PiotrSikora
Copy link
Copy Markdown
Contributor

@PiotrSikora PiotrSikora commented Aug 5, 2017

This change yield ~5x performance improvement for UUID generation,
which results in ~30% performance improvement when proxying HTTP/2
traffic on localhost:

Before:
UUIDUtilsTest.benchmark: 2003 ms
h2load -c 100 -n 100000: 26467 req/s

After:
UUIDUtilsTest.benchmark: 408 ms
h2load -c 100 -n 100000: 34099 req/s

As a bonus, this implementation is fully portable and depends only
on BoringSSL's crypto.

Partially fixes #249.

This change yield ~5x performance improvement for UUID generation,
which results in ~30% performance improvement when proxying HTTP/2
traffic on localhost:

Before:
UUIDUtilsTest.benchmark:  2003 ms
h2load -c 100 -n 100000: 26467 req/s

After:
UUIDUtilsTest.benchmark:   408 ms
h2load -c 100 -n 100000: 34099 req/s

As a bonus, this implementation is fully portable and depends only
on BoringSSL's crypto.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>
@PiotrSikora
Copy link
Copy Markdown
Contributor Author

Similar results on CI:

Before (current master):

[ RUN      ] UUIDUtilsTest.checkDistribution
[       OK ] UUIDUtilsTest.checkDistribution (4500 ms)

After (this PR):

[ RUN      ] UUIDUtilsTest.benchmark
[       OK ] UUIDUtilsTest.benchmark (795 ms)
[ RUN      ] UUIDUtilsTest.checkDistribution
[       OK ] UUIDUtilsTest.checkDistribution (860 ms)

@mattklein123 @htuch PTAL. Please note that ASAN test failed to build 1/146 tests because of No space left on device, and that's why it's marked as failed, but all tests that built passed.

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, this is a great improvement. When we do performance work, we've typically been disabling the UUID header due to this known issue.

// Create UUID from Truly Random or Pseudo-Random Numbers.
// See: https://tools.ietf.org/html/rfc4122#section-4.4
uint8_t rand[16];
RAND_bytes(rand, 16);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to use RAND_pseudo_bytes() here? Presumably it's faster and we don't need something cryptographically strong, UUIDs don't need to be unpredictable in the strong crypto sense (the RFC explicitly calls out pseudo random).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking on this is to keep using real random, but actually do the following:

  1. Store the random bytes in a static thread_local variable along with an iteration counter.
  2. Select 2 bytes that we will sequence through in order.
  3. Basically fill in 2 bytes with the sequence, meaning we will only need to refresh the random every ~64K UUIDs. (This is arbitrary, feel free to use less bits and make it like 16K or something).

This should further improve performance by several orders of magnitude and for the purposes that Envoy uses random UUIDs for I think will be just fine from a collision standpoint.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch: RAND_pseduo_bytes() in just an alias to RAND_bytes() in BoringSSL (see: crypto/fipsmodule/rand/rand.c#269).

@mattklein123: I agree with the prefetching idea (but perhaps we could do it in subsequent PR?). However, I'm afraid I don't follow the 2 bytes idea... you need 16 bytes of randomness (sans 6 constant bits) for UUID version 4.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me to make the optimizations a separate PR. I think the idea here is that since the original UUID is reasonably random, incrementing it is unlikely to produce a conflict with any other UUID in existence (and none produced locally) until you overflow, at which point you regenerate with fresh entropy.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's what I meant re: further optimization. Fine to do in a follow up PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll do that in a separate PR. I have a funny feeling about 2 vs 16 bytes, so I'll see if the extra gains are worth it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might be talking past each other a bit. I'm suggesting allocating ~14 bytes of entropy, and then filling in the last 2 bytes with an incrementing integer. (or some other combination lengths). Anyway we can discuss in a follow up.

// Create UUID from Truly Random or Pseudo-Random Numbers.
// See: https://tools.ietf.org/html/rfc4122#section-4.4
uint8_t rand[16];
RAND_bytes(rand, 16);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check the return code from this as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAND_bytes() in BoringSSL always returns 1 (see: crypto/fipsmodule/rand/rand.c#265), so there is no point in checking that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest an ASSERT or RELEASE_ASSERT on the value, since it feels like we're relying on implementation detail that might conceivably change. Up to you which one.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason why RAND_bytes() returns anything and isn't declared as void is for backward compatibility with OpenSSL. If you look at other functions, i.e. new void RAND_bytes_with_additional_data() which RAND_bytes() calls internally (crypto/fipsmodule/rand/rand.c#264), it doesn't return anything, so adding check here will just waste CPU cycles, and it won't be optimized out by the compiler, because we don't do LTO.

I'd agree with you if we linked against system's OpenSSL library, but we're statically linking against known BoringSSL version, so this won't be checking anything that can fail.

Having said that, I can add ASSERT() if you feel strongly about it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use these new interfaces instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAND_bytes_with_additional_data() is not exported, and RAND_bytes() is the only non-deprecated interface available.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (per offline conversation).


uuid.push_back('-');

for (unsigned i = 8; i < 10; i++) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer concrete fixed size type, e.g. uint32_t.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (uint8_t).

uuid.push_back('-');

for (unsigned i = 8; i < 10; i++) {
uint8_t d = rand[i];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer const uint8_t d = ..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, thanks for doing this. Some comments. Also, this will fix #249.

// See: https://tools.ietf.org/html/rfc4122#section-4.4
uint8_t rand[16];
RAND_bytes(rand, 16);
rand[6] = (rand[6] & 0x0f) | 0x40;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here that this is a version 4,variant 1 UUID. Also, can you add tests for these constants in the distribution test or another test?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

EXPECT_EQ(7295, result);
}

TEST(UUIDUtilsTest, benchmark) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO I would make this disabled by default.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if it's disabled, then it's never run...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point mainly was that this "test" doesn't actually test anything. There are no assertions, etc. It will just burn CPU and always pass. (And from a perf perspective it will always return a different result depending on where it is run). So, typically for this type of thing I would make it a DISABLED test so that it can be run for local testing, but doesn't waste time in general.

// Create UUID from Truly Random or Pseudo-Random Numbers.
// See: https://tools.ietf.org/html/rfc4122#section-4.4
uint8_t rand[16];
RAND_bytes(rand, 16);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking on this is to keep using real random, but actually do the following:

  1. Store the random bytes in a static thread_local variable along with an iteration counter.
  2. Select 2 bytes that we will sequence through in order.
  3. Basically fill in 2 bytes with the sequence, meaning we will only need to refresh the random every ~64K UUIDs. (This is arbitrary, feel free to use less bits and make it like 16K or something).

This should further improve performance by several orders of magnitude and for the purposes that Envoy uses random UUIDs for I think will be just fine from a collision standpoint.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
This way benchmark uses already initialized PRNG.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the ASSERT, LGTM.

@htuch
Copy link
Copy Markdown
Member

htuch commented Aug 6, 2017

(Also, if you merge with master, you should be able to pickup the bazel.asan workaround).

@mattklein123
Copy link
Copy Markdown
Member

LGTM other than @htuch comments and my comment about disabling the benchmark test.

Copy link
Copy Markdown
Member

@RomanDzhabarov RomanDzhabarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice improvement!

for (uint8_t i = 0; i < 4; i++) {
const uint8_t d = rand[i];
uuid.push_back(hex[d >> 4]);
uuid.push_back(hex[d & 0xf]);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: d & 0x0f, this might be slightly more clear (clear 4 most significant bits and keep last 4 bits in a byte).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. Thank you!

@RomanDzhabarov RomanDzhabarov merged commit c8ed244 into envoyproxy:master Aug 7, 2017
@PiotrSikora
Copy link
Copy Markdown
Contributor Author

Bummer, the commit message was removed during merge.

Could we please not do that in the future (or more precisely, could the person merging PR make sure that the commit message is retained)? Thanks.

@mattklein123
Copy link
Copy Markdown
Member

@RomanDzhabarov please see the nodes on cleaning up the commit message in https://github.com/lyft/envoy/blob/master/CONTRIBUTING.md. The goal is to clean it up but not delete it. Thanks.

@RomanDzhabarov
Copy link
Copy Markdown
Member

@PiotrSikora yeah, sorry that's my bad.

jpsim pushed a commit that referenced this pull request Nov 28, 2022
This makes the test test/java/integration:android_engine_start_test to pass when running on Linux

Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Risk Level: none
Testing: N/A
Docs Changes: N/A
Release Notes: N/A
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this pull request Nov 29, 2022
This makes the test test/java/integration:android_engine_start_test to pass when running on Linux

Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Risk Level: none
Testing: N/A
Docs Changes: N/A
Release Notes: N/A
Signed-off-by: JP Simard <jp@jpsim.com>
nezdolik pushed a commit to nezdolik/envoy that referenced this pull request May 4, 2024
Some compiler warnings are caused by this lack of information to
compiler.

Refers to envoyproxy#1401
nezdolik pushed a commit to nezdolik/envoy that referenced this pull request May 4, 2024
This fixes couple gcc warnings (array bounds) when accessing page map
radix tree.

Refers to envoyproxy#1401
nezdolik pushed a commit to nezdolik/envoy that referenced this pull request May 4, 2024
In heap checker we actually use test_str address after deletion, to
verify at runtime that heap checker is indeed tracking deletions. But
gcc barks. So lets hide allocation from it by calling
tc_newarray/tc_deletearray.

Refers to envoyproxy#1401
nezdolik pushed a commit to nezdolik/envoy that referenced this pull request May 4, 2024
Our tests do explicitly trigger certain "bad" cases. And compilers are
getting increasingly smarter, so they start giving us warnings. People
dislike warnings, so lets try to disable them specifically for unit
tests.

Refers to issue envoyproxy#1401
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: faster PRNG

4 participants