Skip to content

quiche: use event loop prepare time in QuicClock (#7177)#8494

Merged
mattklein123 merged 21 commits intoenvoyproxy:masterfrom
bencebeky:now
Oct 28, 2019
Merged

quiche: use event loop prepare time in QuicClock (#7177)#8494
mattklein123 merged 21 commits intoenvoyproxy:masterfrom
bencebeky:now

Conversation

@bencebeky
Copy link
Copy Markdown
Contributor

Signed-off-by: Bence Béky bnc@google.com

Description: Modify QuicClock::ApproximateNow() to make it less precise but less costly.
Risk Level: Low.
Testing: No testing, this change only affects QUICHE.
Docs Changes: N/A
Release Notes: N/A
Fixes #7177

Signed-off-by: Bence Béky <bnc@google.com>
@bencebeky
Copy link
Copy Markdown
Contributor Author

/assign danzh2010

@bencebeky
Copy link
Copy Markdown
Contributor Author

Not ready for review.

/unreview mattklein123
/unreview alyssawilk

@bencebeky
Copy link
Copy Markdown
Contributor Author

CI fails. I need to sort out my local environment, reproduce failures locally, and fix them.

In the meanwhile:
/unassign danzh2010

Signed-off-by: Bence Béky <bnc@google.com>
Signed-off-by: Bence Béky <bnc@google.com>
@bencebeky
Copy link
Copy Markdown
Contributor Author

/assign danzh2010

Dan: please help.

EnvoyQuicAlarmTest.PostponeDeadline fails at line 148:
https://cs.corp.google.com/piper///depot/google3/third_party/envoy/src/test/extensions/quic_listeners/quiche/envoy_quic_alarm_test.cc?l=148

EnvoyQuicServerSessionTest.NewStream crashes with
CHECK failed: new_deadline.IsInitialized()
I see no filename or line number printed, but I suspect it's at
https://cs.corp.google.com/piper///depot/google3/third_party/quic/core/quic_alarm.cc?l=16

I can reproduce both failures locally but do not know how to fix them.

…by DispatcherImpl.

Signed-off-by: Bence Béky <bnc@google.com>
Signed-off-by: Bence Béky <bnc@google.com>
Signed-off-by: Bence Béky <bnc@google.com>
@bencebeky
Copy link
Copy Markdown
Contributor Author

Dan: please help. I implemented the new approach that we discussed. Since DispatcherImpl::approximateMonotonicTime() seems to be called on a thread different from the one on which libevent is calling the onPrepare callback, I need a Thread::LockGuard. Without this lock DispatcherImplTest.ApproximateMonotonicTime fails, because time3 is the same as time1. With this lock the test passes.

Of course this is not a great solution, because locks are expensive. Maybe instead a task to update approximate_monotonic_time_ should be posted from one thread to another.

Also, EnvoyQuicAlarmTest.PostponeDeadline still fails. Any ideas?

Signed-off-by: Bence Béky <bnc@google.com>
@bencebeky
Copy link
Copy Markdown
Contributor Author

As far as the failing EnvoyQuicAlarmTests go, even though EnvoyQuicAlarmTest::advanceMsAndLoop() calls dispatcher_->run(), it does not seem like LibeventScheduler::onPrepare() is called at all. Not sure if this is an issue with production code or how the test is written, and how to fix it.

@danzh2010
Copy link
Copy Markdown
Contributor

As far as the failing EnvoyQuicAlarmTests go, even though EnvoyQuicAlarmTest::advanceMsAndLoop() calls dispatcher_->run(), it does not seem like LibeventScheduler::onPrepare() is called at all. Not sure if this is an issue with production code or how the test is written, and how to fix it.

onPrepare() is registered in LibeventScheduler::initializeStats() which is not called in this test I believe. In production initializeStats() is only called when enable_dispatcher_stats is configured. If we need to update approximate time in onPrepare(), it's better not to make the call depends on stats config. You may want to register a new evwatch using evwatch_prepare_new() when the scheduler is constructed instead of re-using onPrepare().

Sorry, this change seems to be more complicated than I would expect.

@bencebeky
Copy link
Copy Markdown
Contributor Author

As far as the failing EnvoyQuicAlarmTests go, even though EnvoyQuicAlarmTest::advanceMsAndLoop() calls dispatcher_->run(), it does not seem like LibeventScheduler::onPrepare() is called at all. Not sure if this is an issue with production code or how the test is written, and how to fix it.

onPrepare() is registered in LibeventScheduler::initializeStats() which is not called in this test I believe. In production initializeStats() is only called when enable_dispatcher_stats is configured. If we need to update approximate time in onPrepare(), it's better not to make the call depends on stats config. You may want to register a new evwatch using evwatch_prepare_new() when the scheduler is constructed instead of re-using onPrepare().

Sorry, this change seems to be more complicated than I would expect.

LibeventScheduler::registerOnPrepareCallback() calls evwatch_prepare_new(), so this should not be a problem even if initializeStats() is not called. I locally verified that LibeventScheduler::registerOnPrepareCallback() is called when running any of the three failing EnvoyQuicAlarmTests.

@danzh2010
Copy link
Copy Markdown
Contributor

As far as the failing EnvoyQuicAlarmTests go, even though EnvoyQuicAlarmTest::advanceMsAndLoop() calls dispatcher_->run(), it does not seem like LibeventScheduler::onPrepare() is called at all. Not sure if this is an issue with production code or how the test is written, and how to fix it.

onPrepare() is registered in LibeventScheduler::initializeStats() which is not called in this test I believe. In production initializeStats() is only called when enable_dispatcher_stats is configured. If we need to update approximate time in onPrepare(), it's better not to make the call depends on stats config. You may want to register a new evwatch using evwatch_prepare_new() when the scheduler is constructed instead of re-using onPrepare().
Sorry, this change seems to be more complicated than I would expect.

LibeventScheduler::registerOnPrepareCallback() calls evwatch_prepare_new(), so this should not be a problem even if initializeStats() is not called. I locally verified that LibeventScheduler::registerOnPrepareCallback() is called when running any of the three failing EnvoyQuicAlarmTests.

I can't find registerOnPrepareCallback() in LibeventScheduler. The only place register onPrepare() is

evwatch_prepare_new(libevent_.get(), &onPrepare, this);
, this is only called in initializeStats()

@danzh2010
Copy link
Copy Markdown
Contributor

As far as the failing EnvoyQuicAlarmTests go, even though EnvoyQuicAlarmTest::advanceMsAndLoop() calls dispatcher_->run(), it does not seem like LibeventScheduler::onPrepare() is called at all. Not sure if this is an issue with production code or how the test is written, and how to fix it.

onPrepare() is registered in LibeventScheduler::initializeStats() which is not called in this test I believe. In production initializeStats() is only called when enable_dispatcher_stats is configured. If we need to update approximate time in onPrepare(), it's better not to make the call depends on stats config. You may want to register a new evwatch using evwatch_prepare_new() when the scheduler is constructed instead of re-using onPrepare().
Sorry, this change seems to be more complicated than I would expect.

LibeventScheduler::registerOnPrepareCallback() calls evwatch_prepare_new(), so this should not be a problem even if initializeStats() is not called. I locally verified that LibeventScheduler::registerOnPrepareCallback() is called when running any of the three failing EnvoyQuicAlarmTests.

I can't find registerOnPrepareCallback() in LibeventScheduler. The only place register onPrepare() is

evwatch_prepare_new(libevent_.get(), &onPrepare, this);

, this is only called in initializeStats()

Sorry, I didn't notice it was the interface you added in this PR. The alarm test shouldn't have anything to do with onPrepare(), the fact that it's failing makes me suspect that the change in dispatcher or scheduler causes dispatcher.run() didn't really run the event loop. Can you add some log to check that?

@danzh2010
Copy link
Copy Markdown
Contributor

As far as the failing EnvoyQuicAlarmTests go, even though EnvoyQuicAlarmTest::advanceMsAndLoop() calls dispatcher_->run(), it does not seem like LibeventScheduler::onPrepare() is called at all. Not sure if this is an issue with production code or how the test is written, and how to fix it.

I patched this PR and debugged locally. In the test the postponed alarm was not fired because it was postponed to a wrong time. SetImpl() was overridden to calculate the duration based on new deadline and ApproximateNow(). Because libevent timer is set based on the duration from current time but QuicAlarm is set based on absolute time. However, approximate time is not updated because previous event_base_loop() early returned as a result of lack of queued events. This is also why onPrepare() is not called. If you look into the function definition of event_base_loop(), the flag our tests pass in will lead to early return if no events to be process without calling prepare watcher. So EnvoyQuicAlarm::SetImpl() calculates the duration to be larger than expected.

This shouldn't be a case in production because we pass a different flag to event_base_loop() so onPrepare() should always be triggered.

To fix the test, and make the ApproximateNow() more robust, I think we can further propagate the onPrepare() callback to EnvoyQuicClock and latch the approximateMonotonicTime() there. And also update the latched value whenever we call EnvoyQuicClock::Now(). And ApprosimateNow() returns the latched value. To do so you may need to add an interface to Dispatcher, i.e. DispatcherCallback with a method onApproximateTimeUpdated(), and make EnvoyQuicClock inherit from the callback.

@bencebeky
Copy link
Copy Markdown
Contributor Author

Dan: PTAL. Thank you.

Signed-off-by: Bence Béky <bnc@google.com>
Copy link
Copy Markdown
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just request one more test!

@danzh2010
Copy link
Copy Markdown
Contributor

/assign @mattklein123

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Some high level questions. Thank you!

/wait

Signed-off-by: Bence Béky <bnc@google.com>
…rvalue reference.

Signed-off-by: Bence Béky <bnc@google.com>
Signed-off-by: Bence Béky <bnc@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks great modulo a few nits.

/wait

danzh2010
danzh2010 previously approved these changes Oct 24, 2019
Signed-off-by: Bence Béky <bnc@google.com>
This reverts commit f8e6676.

Signed-off-by: Bence Béky <bnc@google.com>
Signed-off-by: Bence Béky <bnc@google.com>
@danzh2010
Copy link
Copy Markdown
Contributor

envoy_quic_server_session_test needs to initialize dispatcher's approximate time in ASAN and TSAN. Please fix.

Signed-off-by: Bence Béky <bnc@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mattklein123 mattklein123 merged commit 80657f8 into envoyproxy:master Oct 28, 2019
@bencebeky bencebeky deleted the now branch April 26, 2022 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

quiche: add support for QuicClock::ApproximateNow

4 participants