core: SynchronizationContext exposed by LoadBalancer.Helper by zhangkun83 · Pull Request #4971 · grpc/grpc-java

zhangkun83 · 2018-10-18T16:11:21Z

Provides a SynchronizationContext for scheduling tasks, with and without delay, from LoadBalancer implementations. This absorbs and extends the internal utility ChannelExecutor. It supersedes Helper.runSerialized(), which is now deprecated.

Motivation

I see multiple cases that schedule tasks with a delay while requiring the task to run in the "Channel Executor". There have been repeated work to wrap scheduled tasks and handle races between cancellation and task run (see the diff in GrpclbState.java for example). The LoadBalancer implementation (e.g., GrpclbLoadBalancer) also has to acquire the ScheduledExecutorService from somewhere and release it upon shutdown.

The upcoming HealthCheckLoadBalancer (#4932), which would use back-off policy to retry health-checking streams, would have to do all the things above. At this point I think we need to provide something that combines runSerialized() with a scheduled executor with the same synchronization guarantees.

Design details

SynchronizationContext is a similar to ScheduledExecutorService but tailored for use in LoadBalancer and potentially other cases outside of LoadBalancer. It offers task queuing and serialization and delayed scheduling. It guarantees non-reentrancy and happens-before among tasks. It owns no thread, but run tasks on caller's or caller-provided threads.

All channel-level state mutations and callback methods on LoadBalancer are done in a SynchronizationContext, which was previously referred to as "Channel Executor".

SynchronizationContext.schedule() returns a ScheduledHandle for status checking and cancellation. ScheduedFuture from SchedulingExecutorService.schedule() is too broad for our use cases (e.g., the blocking get() should never be used).

SynchronizationContext.schedule() requires a ScheduledExecutorService, which is now available through Helper.getScheduledExecutorService(). LoadBalancers don't need to worry about where to get SchedulingExecutorService any more.

Alternatives

Alternatively, we could keep Helper.runSerialized() and add something like Helper.runSerialiezdWithDelay(), but having them on their own interface allows clean fake implementation by FakeClock for test, and allows other components (potentially InternalSubchannel for reconnection backoff) to use it too.

Instead of asking caller of schedule() to provide the ScheduledExecutorService, we considered having SynchronizationContext take a ScheduledExecutorService at construction. It would be inconvenient for LoadBalancer implementations that don't use schedule(), as they would be forced to provide a fake ScheduledExecutorService (which is cumbersome).

Instead of making SynchronizationContext a (semi-)concrete class, we considered making it an pure abstract class. However, we found it nontrivial to implement execute() correctly with the non-reentrancy guarantee.

ejona86 · 2018-10-18T16:16:40Z

core/src/main/java/io/grpc/ControlPlaneScheduler.java

+   * submitted.
+   */
+  public final ScheduledContext scheduleNow(Runnable task) {
+    return schedule(task, 0, TimeUnit.NANOSECONDS);


I'm not excited about making an easy way to make a zero-delay task. This just abuses the scheduled executor and is a strong code smell to me.

I can instead abstract it and make schedule() call scheduleNow() when delay <= 0. Is it better?

No. It would be surprising for a task to suddenly run in the current thread when the delay is 0. They are fundamentally different.

It's the same as the current runSerialized(). I still don't understand the issue.

I'm not sure what you're saying is the same. Today runSerialized() runs on the current thread:
https://github.com/grpc/grpc-java/blob/v1.15.0/core/src/main/java/io/grpc/internal/ManagedChannelImpl.java#L1236-L1238

And any schedule() would run on a separate thread. I'm against having schedule() turn into running on the current thread based on the timeout.

Fair enough. I have decoupled schedule() with scheduleNow().

ejona86 · 2018-10-22T17:55:22Z

core/src/main/java/io/grpc/ControlPlaneScheduler.java

+  /**
+   * Returns the current time in nanos from the same clock that {@link #schedule} uses.
+   */
+  public abstract long currentTimeNanos();


This is a weird API to expose, since it will not agree with either currentTimeMillis nor nanoTime. Based on the documentation it would appear to be similar to nanoTime(), but the actual implementation uses a epoch of 1970 like currentTimeMillis, except if currentTimeMillis and nanoTime get out-of-sync. It seems this should just be nanoTime().

(I don't care really if it has a different offset than nanoTime(), but aligning it to 1970 seems like a bad idea since it can't be guaranteed to align with 1970.)

ejona86 · 2018-10-22T17:59:20Z

core/src/main/java/io/grpc/ControlPlaneScheduler.java

+  /**
+   * Schedules a task to run as soon as poassible.
+   *
+   * <p>Non-reentrency is guaranteed.  Although task may run inline, but if this method is called


s/Although task may run inline, but if/If/ . That seems more clear.

ejona86 · 2018-10-22T18:02:59Z

core/src/main/java/io/grpc/ControlPlaneScheduler.java

+  public abstract ScheduledContext scheduleNow(Runnable task);
+
+  /**
+   * Schedules a task to be run after a delay.  Unlike {@link #scheduleNow}, the task will typically


The "typically" is hard to reason about. Could we just say, "Unlike {@link #scheduleNow}, will never be run inline."? (Or maybe even, just "Will never be run inline.")

And make it a semi-concrete class. And it absorbs ChannelExecutor. TODO: unit test on new methods on SynchronizationContext.

ejona86 · 2018-10-23T15:09:37Z

core/src/test/java/io/grpc/internal/FakeClock.java

-  private final PriorityBlockingQueue<ScheduledTask> tasks =
-      new PriorityBlockingQueue<ScheduledTask>();
+  // Must keep the ordering of tasks as they are required by ControlPlaneScheduler.scheduleNow().
+  private final LinkedBlockingQueue<ScheduledTask> tasks = new LinkedBlockingQueue<>();


Can we simply use two queues instead? One for pending (ready to be executed) tasks and one for scheduled (for a future time) tasks? We'd keep the previous PriorityBlockingQueue and then just add a LinkedBlockingQueue for execute(). That more closely matches what would happen in practice and makes the code more clear.

ejona86 · 2018-10-23T15:18:29Z

core/src/main/java/io/grpc/SynchronizationContext.java

+   *
+   * <p>The default implementation logs a warning.
+   */
+  protected void handleUncaughtThrowable(Throwable t) {


Nit: the class could be made final and be passed a Thread.UncaughtExceptionHandler (with a note that the thread will not die after executing the handler, which is different from its documentation).

ejona86 · 2018-10-23T15:20:08Z

core/src/main/java/io/grpc/SynchronizationContext.java

+  /**
+   * Enqueues a task that will be run when {@link #drain} is called.
+   */
+  public final void executeLater(Runnable runnable) {


It might be good to point out this is useful for adding things from within a lock and then calling drain outside the lock.

zhangkun83

Thanks @ejona86. All comments are addressed.

zhangkun83 · 2018-10-23T18:24:02Z

core/src/test/java/io/grpc/internal/FakeClock.java

-  private final PriorityBlockingQueue<ScheduledTask> tasks =
-      new PriorityBlockingQueue<ScheduledTask>();
+  // Must keep the ordering of tasks as they are required by ControlPlaneScheduler.scheduleNow().
+  private final LinkedBlockingQueue<ScheduledTask> tasks = new LinkedBlockingQueue<>();


zhangkun83 · 2018-10-23T18:27:28Z

core/src/main/java/io/grpc/SynchronizationContext.java

+  /**
+   * Enqueues a task that will be run when {@link #drain} is called.
+   */
+  public final void executeLater(Runnable runnable) {


zhangkun83 · 2018-10-23T18:46:40Z

core/src/main/java/io/grpc/SynchronizationContext.java

+   *
+   * <p>The default implementation logs a warning.
+   */
+  protected void handleUncaughtThrowable(Throwable t) {


ejona86 · 2018-10-23T20:39:32Z

core/src/main/java/io/grpc/SynchronizationContext.java

+   *        what's documented on {@link UncaughtExceptionHandler#uncaughtException}, the thread is
+   *        not terminated when the handler is called.
+   */
+  public SynchronizationContext(UncaughtExceptionHandler uncaughtExceptionHandler) {


We could provide a zero-arg version that just logs by default. But we can do that at any time. This seems fine for now.

zhangkun83 added 3 commits October 17, 2018 22:15

core: ControlPlaneScheduler exposed by LoadBalancer.Helper

1bc779a

Fix FakeClock.runDueTasks(): should exhaust all due tasks

6c3273a

Polish documentation and fix warnings

83421cc

zhangkun83 requested review from carl-mastrangelo and ejona86 October 18, 2018 16:11

ejona86 reviewed Oct 18, 2018

View reviewed changes

Decouple scheduleNow() and schedule()

2416617

zhangkun83 assigned ejona86 Oct 19, 2018

ejona86 reviewed Oct 22, 2018

View reviewed changes

Renmame ControlPlaneScheduler to SynchronizationContext.

8ae0bad

And make it a semi-concrete class. And it absorbs ChannelExecutor. TODO: unit test on new methods on SynchronizationContext.

ejona86 approved these changes Oct 23, 2018

View reviewed changes

Add tests and polish javadoc

a952e05

zhangkun83 changed the title ~~core: ControlPlaneScheduler exposed by LoadBalancer.Helper~~ core: SynchronizationContext exposed by LoadBalancer.Helper Oct 23, 2018

Address comments

5d6a11a

zhangkun83 commented Oct 23, 2018

View reviewed changes

Make linter happy

a138de6

ejona86 approved these changes Oct 23, 2018

View reviewed changes

Add tests for cancel()

8602922

zhangkun83 merged commit 7582049 into grpc:master Oct 23, 2018

ericgribkoff mentioned this pull request Oct 24, 2018

core: fix lint #4994

Closed

lock bot locked as resolved and limited conversation to collaborators Jan 21, 2019

Conversation

zhangkun83 commented Oct 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Design details

Alternatives

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangkun83 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhangkun83 commented Oct 18, 2018 •

edited

Loading