lb: gentle failover across load balancing priority levels. by alyssawilk · Pull Request #2290 · envoyproxy/envoy

alyssawilk · 2018-01-02T20:25:54Z

Changes Envoy load balancing across priority levels from a hard failover to trickling data based on the health percentage of each priority level.

Risk Level: Medium

Testing:
Added thorough unit testing for the lb failover code, as well as fixing up ring hash failover testing and adding ring-hash specific tests.

Docs Changes:
envoyproxy/data-plane-api#359

Release Notes: n/a: falls under existing note

[Optional Fixes #Issue]
Fixes #1929

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

mattklein123

Very cool stuff. Some comments to get started. I haven't looked at the tests yet.

mattklein123 · 2018-01-02T23:06:27Z

source/common/upstream/load_balancer_impl.cc

-    if (!host_set->healthyHosts().empty()) {
-      return host_set.get();
-    }
+uint32_t sumEntries(std::vector<uint32_t>& vector) {


nit: const std::vector<uint32_t>&

mattklein123 · 2018-01-02T23:07:17Z

source/common/upstream/load_balancer_impl.cc

+    }
+  }
+  // The percentages should always add up to 100 but we have to have a return for the compiler.
+  ASSERT(0);


nit: I would use NOT_REACHED here.

mattklein123 · 2018-01-02T23:08:26Z

source/common/upstream/load_balancer_impl.cc

+
+  // Determine the health of the newly modified priority level.
+  // Health ranges from 0-100, and is the ratio of healthy hosts to total hosts, modified by the
+  // overprovision factor of 1.4


Can you add some more detail on where 1.4 comes from? Also, nit, period end of sentence.

From a decade working somewhere where we over-provision roughly 20%. I can make it configurable from the get go but I figured we could leave it until someone actually cared.

I'll comment both that it's arbitrary and can eventually be configurable unless you'd prefer I make it a config option now.

mattklein123 · 2018-01-02T23:09:24Z

source/common/upstream/load_balancer_impl.cc

+  // First, determine if the load needs to be scaled relative to health. For example if there are
+  // 3 host sets with 20% / 20% / 10% health they will get 40% / 40% / 20% load to ensure total load
+  // adds up to 100.
+  uint32_t total_health = std::min<uint32_t>(sumEntries(per_priority_health_), 100);


mattklein123 · 2018-01-02T23:09:53Z

source/common/upstream/load_balancer_impl.cc

+}
+
+const HostSet& LoadBalancerBase::chooseHostSet() {
+  uint32_t priority = choosePriority(random_.random(), per_priority_load_);


mattklein123 · 2018-01-02T23:13:53Z

source/common/upstream/ring_hash_lb.cc

+  }
+  const uint64_t h = hash.valid() ? hash.value() : random_.random();
+
+  uint32_t priority = LoadBalancerBase::choosePriority(h, per_priority_load_);


mattklein123 · 2018-01-02T23:14:29Z

source/common/upstream/ring_hash_lb.cc

    stats_.lb_healthy_panic_.inc();
  }
-  return ring_->chooseHost(context, random_);
+  RingConstSharedPtr ring = per_priority_state_[priority]->current_ring_;


the compiler will probably optimize this, but I would avoid the local variable so we don't inc/decref the shared pointer.

mattklein123 · 2018-01-02T23:17:00Z

source/common/upstream/ring_hash_lb.cc

-  RingConstSharedPtr ring_to_use;
-  bool global_panic_to_use;
-  {
-    std::shared_lock<std::shared_timed_mutex> lock(mutex_);


I might be missing something here but I don't think we can lose the locking?

mattklein123 · 2018-01-02T23:18:38Z

source/common/upstream/ring_hash_lb.cc

-  std::unique_lock<std::shared_timed_mutex> lock(factory_->mutex_);
-  factory_->current_ring_ = new_ring;
-  factory_->global_panic_ = new_global_panic;
+    std::unique_lock<std::shared_timed_mutex> lock(factory_->mutex_);


This happens on the main thread. You don't need the read lock here I don't think, you just need the write lock below when you swap it all in. You still need the read lock on the factory create call though.

Above was a clear error, but this one confuses me.

You were locking factory_->mutex before changing factory->current_ring and factory->global_panic. Why then don't I need to snag the same lock when changing factory->per_priority_state?

Sorry I read this too quickly. The issue here I think is that you might have inconsistent state since you release the lock several times during this code. I think the correct way to do this is:

Do the first loop to generate state, etc.

Acquire lock

Do 2nd loop to store state.

mattklein123 · 2018-01-02T23:20:25Z

source/common/upstream/ring_hash_lb.h

    Runtime::RandomGenerator& random_;
-    const RingConstSharedPtr ring_;
-    const bool global_panic_;
+    std::vector<PerPriorityStatePtr> per_priority_state_;


Instead of copying the vectors can we just use shared_ptr to constant vector? This will make lock hold time smaller also. (I think per other comments the locking is not quite right unless I am missing something).

I can but it's a perf hit for everything else since they then can't do in-place edits.
If we're worried about lock time how about a shared pointer for the ring hash where we refresh, which can be shared by all the threads?

nit: I would either do shared_ptr for both per_priority_state_ and per_priority_load_, or for neither. It's fine with me either way (doing it out of lock with swap is fine). Any reason not to be consistent?

htuch · 2018-01-03T15:42:40Z

source/common/upstream/load_balancer_impl.cc

+    sum += entry;
  }
-  return priority_set->hostSetsPerPriority()[0].get();
+  return sum;


Nit: could use http://en.cppreference.com/w/cpp/algorithm/accumulate

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

mattklein123 · 2018-01-03T19:13:44Z

source/common/upstream/ring_hash_lb.cc

-  std::unique_lock<std::shared_timed_mutex> lock(factory_->mutex_);
-  factory_->current_ring_ = new_ring;
-  factory_->global_panic_ = new_global_panic;
+    std::unique_lock<std::shared_timed_mutex> lock(factory_->mutex_);


Sorry I read this too quickly. The issue here I think is that you might have inconsistent state since you release the lock several times during this code. I think the correct way to do this is:

Do the first loop to generate state, etc.

Acquire lock

Do 2nd loop to store state.

mattklein123 · 2018-01-03T19:16:17Z

source/common/upstream/ring_hash_lb.cc

+
+  std::shared_lock<std::shared_timed_mutex> lock(mutex_);
+  for (size_t i = 0; i < per_priority_state_.size(); ++i) {
+    lb->per_priority_state_.push_back(PerPriorityStatePtr{new PerPriorityState});


Can you build an allocate the vector outside of the lock? Then you can just acquire the lock and copy the data values in.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

mattklein123

Looks solid. Thanks for all the locking changes. A few more random comments. Also I think the OSX build failure is legit, can you take a look?

mattklein123 · 2018-01-03T21:21:57Z

source/common/upstream/load_balancer_impl.cc

-  return priority_set->hostSetsPerPriority()[0].get();
+  // The percentages should always add up to 100 but we have to have a return for the compiler.
+  NOT_REACHED;
+  return 0;


nit: I don't think the compiler will require the return if you have NOT_REACHED.

mattklein123 · 2018-01-03T21:24:35Z

source/common/upstream/ring_hash_lb.h

    Runtime::RandomGenerator& random_;
-    const RingConstSharedPtr ring_;
-    const bool global_panic_;
+    std::vector<PerPriorityStatePtr> per_priority_state_;


nit: I would either do shared_ptr for both per_priority_state_ and per_priority_load_, or for neither. It's fine with me either way (doing it out of lock with swap is fine). Any reason not to be consistent?

…ters Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

mattklein123

LGTM other than small nit and OSX breakage.

mattklein123 · 2018-01-04T00:39:49Z

source/common/upstream/ring_hash_lb.cc

+  }
+  const uint64_t h = hash.valid() ? hash.value() : random_.random();
+
+  uint32_t priority = LoadBalancerBase::choosePriority(h, *per_priority_load_);


Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

mattklein123

Nice!

alyssawilk added 12 commits December 20, 2017 14:18

Restructing the base LB class to for impending weighted failover.

024bd70

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Restructing for gentle failover. RNG and not using it. Useful test diffs

d897716

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Gentle failover

cf1cd8e

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Disabling zone aware routing for P>0 until someone wants it

af8b8bc

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Merge branch 'master' into lb_hash

9054f2e

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Merge branch 'lb_hash' into gentle_failover

aef3d63

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Removing more priority logic from locality based routing

7758dba

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Removing more priorities from zone aware routing

8d4bd3e

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Merge branch 'refs/heads/master' into lb_hash

ea0ed39

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Merge branch 'lb_hash' into gentle_failovert

71c34d4

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Fixing up ring hash for the new gentle failover

334fa52

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Making the UberTest happy

052d143

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

mattklein123 reviewed Jan 2, 2018

View reviewed changes

htuch reviewed Jan 3, 2018

View reviewed changes

reviewer comments

aed5a5a

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

mattklein123 reviewed Jan 3, 2018

View reviewed changes

swapping some code around

badf490

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

mattklein123 reviewed Jan 3, 2018

View reviewed changes

weighing the efficiency of shared pointers vs ugliness of vector poin…

b37b396

…ters Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk force-pushed the gentle_failover branch from ea6d818 to b37b396 Compare January 3, 2018 22:10

mattklein123 reviewed Jan 4, 2018

View reviewed changes

constification

dac7d86

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

mattklein123 approved these changes Jan 4, 2018

View reviewed changes

alyssawilk merged commit 8717773 into envoyproxy:master Jan 4, 2018

alyssawilk deleted the gentle_failover branch January 30, 2018 18:47

Conversation

alyssawilk commented Jan 2, 2018

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants