Skip to content

Fix jitter behavior for large values.#552

Merged
mum4k merged 3 commits intoenvoyproxy:masterfrom
oschaaf:jitter-testing
Sep 28, 2020
Merged

Fix jitter behavior for large values.#552
mum4k merged 3 commits intoenvoyproxy:masterfrom
oschaaf:jitter-testing

Conversation

@oschaaf
Copy link
Copy Markdown
Member

@oschaaf oschaaf commented Sep 25, 2020

By chance weird behaviour was observed for when specifying large jitter
values. This warranted analysis, and upon investigation the root problem seems
to be that when configuring --uniform-jitter to a value that would allow the
adjusted value(s) to overlap with the pacing of the underlying rate limiter,
the new overlapping value could overwrite the old one that was scheduled.

This modifies behaviour and tests to fix and consolidate, by modifying
DistributionSamplingRateLimiter so that it can queue adjusted release
timings instead of storing a single one. This will allow it to accumulate multiple
release timings that are adjusted for jitter.
This ensures that the pacing of the rate limiter that modifies timings to add jitter
gets properly disconnected from the pacing of the underlying rate limiter.

NOTE: There are no known consumers running into problems / inaccuracies
because of this. To tease the problem out, either one needs to:

  • configure --jitter-uniform in a way that it would adjust timings to overlap
    with the frequency imposed by --rps, or
  • configure any combination of --jitter-uniform > 0s and --burst-size > 0.

Signed-off-by: Otto van der Schaaf oschaaf@we-amp.com

By chance weird behavior was observed for when specifying large
jitter values. This fixes behavior for that case by allowing
DistributionSamplingRateLimiterTest to schedule multiple jittered
values into the future.

This also improves testing around this rate limiter behavior.

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
@oschaaf oschaaf added waiting-for-review A PR waiting for a review. bug Something isn't working labels Sep 25, 2020
@mum4k mum4k requested a review from eric846 September 25, 2020 19:15
@mum4k
Copy link
Copy Markdown
Collaborator

mum4k commented Sep 25, 2020

@eric846 please review and assign back to me once done.

class DistributionSamplingRateLimiterTest : public RateLimiterTest {
public:
DistributionSamplingRateLimiterTest()
: mock_rate_limiter_(std::make_unique<NiceMock<MockRateLimiter>>()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this should be called something like inner_mock_rate_limiter_.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

public:
DistributionSamplingRateLimiterTest()
: mock_rate_limiter_(std::make_unique<NiceMock<MockRateLimiter>>()),
unsafe_mock_rate_limiter_(*mock_rate_limiter_),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused by how many rate limiters there are -- can we get by without this field and just refer to *mock_rate_limiter_ in the test code?

.WillOnce(Return(true))
.WillOnce(Return(false))
.WillOnce(Return(false));
EXPECT_CALL(*unsafe_discrete_numeric_distribution_sampler_, getValue).WillOnce(Return(1));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment on all these Return(1) to clarify that they mean delay releasing the request for 1 ns?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

sanity_check_pending_release_ = false;
if (rate_limiter_->tryAcquireOne()) {
const Envoy::MonotonicTime adjusted = now + random_distribution_generator_();
distributed_timings_.insert(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment about the invariant we maintain on distributed_timings_? It looks like it's simply ascending order. Also, is it easy to add unit tests that insert multiple distribution values other than 1ns in sorted and nonsorted order to make sure each acquisition always happens at the expected nanosecond?

Also, is distributed_timings_ ever anticipated to become large? If so, would a tree be better than a vector?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is distributed_timings_ ever anticipated to become large? If so, would a tree be better than a vector?

Yes I think you are right, but I feel it would become more prudent to optimise this if we find use cases that need to support specifying large ranges of values to distribute timings over. In that case, this could list could grow large. Then we should optimise, but for now I'm happy with correctness, and I'd rather add a documentation note recommending to not configure large values at this point in time.

(For the known use case where people add white noise for desynchronisation purposes, I wouldn't expect this to be a factor, as the distribution range should usually be smaller then the request-release interval).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is it easy to add unit tests that insert multiple distribution values other than 1ns in sorted and nonsorted order to make sure each acquisition always happens at the expected nanosecond?

Done.

std::unique_ptr<NiceMock<MockRateLimiter>> mock_rate_limiter_;
MockRateLimiter& unsafe_mock_rate_limiter_;
Envoy::Event::SimulatedTimeSystem time_system_;
MockDiscreteNumericDistributionSampler* unsafe_discrete_numeric_distribution_sampler_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment what operations are unsafe on this field?

DistributionSamplingRateLimiterTest()
: mock_rate_limiter_(std::make_unique<NiceMock<MockRateLimiter>>()),
unsafe_mock_rate_limiter_(*mock_rate_limiter_),
unsafe_discrete_numeric_distribution_sampler_(new MockDiscreteNumericDistributionSampler()),
Copy link
Copy Markdown
Contributor

@eric846 eric846 Sep 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a special need for bare new, or would making the field std::unique_ptr<MockDiscreteNumericDistributionSampler> discrete_numeric_distribution_sampler_ and referring to *discrete_numeric_distribution_sampler_ in the tests work just the same? This seems like it would get flagged by asan.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends up being owned by the rate limiter, so there shouldn't be an ASAN report about it.
Annoyingly, we need temporary member variables to grab references to the mock before we transfer ownership, as the constructor requires a unique_ptr&&. This works nice in prod code by making the path of least resistance a good one, but it is annoying here as we need to go through this hoop. I renamed some things, and made this code use std::move like we do with the rate limiter mocks, to hopefully clean this up and make it more easy to grok.

@eric846 eric846 added waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. and removed waiting-for-review A PR waiting for a review. labels Sep 28, 2020
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Sep 28, 2020

@eric846 thanks for the review, I pushed 131d2e0 to address everything modulo optimisation of the release-timings buffering via a tree. I file #555 to track that.

@oschaaf oschaaf added waiting-for-review A PR waiting for a review. and removed waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. labels Sep 28, 2020
@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Sep 28, 2020

@eric846 one last thing -- while I didn't implement a tree, the buffered timings are now tracked via a std::list instead of the std::vector. I haven't measured it yet, but I think that the list may be a better match here, with the sorting/insert and pop from the front that is being done.

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
@eric846
Copy link
Copy Markdown
Contributor

eric846 commented Sep 28, 2020

LGTM, thanks!

@mum4k mum4k merged commit e21ef4b into envoyproxy:master Sep 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working waiting-for-review A PR waiting for a review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants