Skip to content

lb: gentle failover across load balancing priority levels.#2290

Merged
alyssawilk merged 16 commits intoenvoyproxy:masterfrom
alyssawilk:gentle_failover
Jan 4, 2018
Merged

lb: gentle failover across load balancing priority levels.#2290
alyssawilk merged 16 commits intoenvoyproxy:masterfrom
alyssawilk:gentle_failover

Conversation

@alyssawilk
Copy link
Copy Markdown
Contributor

Changes Envoy load balancing across priority levels from a hard failover to trickling data based on the health percentage of each priority level.

Risk Level: Medium

Testing:
Added thorough unit testing for the lb failover code, as well as fixing up ring hash failover testing and adding ring-hash specific tests.

Docs Changes:
envoyproxy/data-plane-api#359

Release Notes: n/a: falls under existing note

[Optional Fixes #Issue]
Fixes #1929

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool stuff. Some comments to get started. I haven't looked at the tests yet.

if (!host_set->healthyHosts().empty()) {
return host_set.get();
}
uint32_t sumEntries(std::vector<uint32_t>& vector) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const std::vector<uint32_t>&

}
}
// The percentages should always add up to 100 but we have to have a return for the compiler.
ASSERT(0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would use NOT_REACHED here.


// Determine the health of the newly modified priority level.
// Health ranges from 0-100, and is the ratio of healthy hosts to total hosts, modified by the
// overprovision factor of 1.4
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some more detail on where 1.4 comes from? Also, nit, period end of sentence.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a decade working somewhere where we over-provision roughly 20%. I can make it configurable from the get go but I figured we could leave it until someone actually cared.

I'll comment both that it's arbitrary and can eventually be configurable unless you'd prefer I make it a config option now.

// First, determine if the load needs to be scaled relative to health. For example if there are
// 3 host sets with 20% / 20% / 10% health they will get 40% / 40% / 20% load to ensure total load
// adds up to 100.
uint32_t total_health = std::min<uint32_t>(sumEntries(per_priority_health_), 100);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const

}

const HostSet& LoadBalancerBase::chooseHostSet() {
uint32_t priority = choosePriority(random_.random(), per_priority_load_);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const

}
const uint64_t h = hash.valid() ? hash.value() : random_.random();

uint32_t priority = LoadBalancerBase::choosePriority(h, per_priority_load_);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const

stats_.lb_healthy_panic_.inc();
}
return ring_->chooseHost(context, random_);
RingConstSharedPtr ring = per_priority_state_[priority]->current_ring_;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the compiler will probably optimize this, but I would avoid the local variable so we don't inc/decref the shared pointer.

RingConstSharedPtr ring_to_use;
bool global_panic_to_use;
{
std::shared_lock<std::shared_timed_mutex> lock(mutex_);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something here but I don't think we can lose the locking?

std::unique_lock<std::shared_timed_mutex> lock(factory_->mutex_);
factory_->current_ring_ = new_ring;
factory_->global_panic_ = new_global_panic;
std::unique_lock<std::shared_timed_mutex> lock(factory_->mutex_);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens on the main thread. You don't need the read lock here I don't think, you just need the write lock below when you swap it all in. You still need the read lock on the factory create call though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above was a clear error, but this one confuses me.

You were locking factory_->mutex before changing factory->current_ring and factory->global_panic. Why then don't I need to snag the same lock when changing factory->per_priority_state?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I read this too quickly. The issue here I think is that you might have inconsistent state since you release the lock several times during this code. I think the correct way to do this is:

  1. Do the first loop to generate state, etc.
  2. Acquire lock
  3. Do 2nd loop to store state.

Runtime::RandomGenerator& random_;
const RingConstSharedPtr ring_;
const bool global_panic_;
std::vector<PerPriorityStatePtr> per_priority_state_;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of copying the vectors can we just use shared_ptr to constant vector? This will make lock hold time smaller also. (I think per other comments the locking is not quite right unless I am missing something).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can but it's a perf hit for everything else since they then can't do in-place edits.
If we're worried about lock time how about a shared pointer for the ring hash where we refresh, which can be shared by all the threads?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would either do shared_ptr for both per_priority_state_ and per_priority_load_, or for neither. It's fine with me either way (doing it out of lock with swap is fine). Any reason not to be consistent?

sum += entry;
}
return priority_set->hostSetsPerPriority()[0].get();
return sum;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
std::unique_lock<std::shared_timed_mutex> lock(factory_->mutex_);
factory_->current_ring_ = new_ring;
factory_->global_panic_ = new_global_panic;
std::unique_lock<std::shared_timed_mutex> lock(factory_->mutex_);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I read this too quickly. The issue here I think is that you might have inconsistent state since you release the lock several times during this code. I think the correct way to do this is:

  1. Do the first loop to generate state, etc.
  2. Acquire lock
  3. Do 2nd loop to store state.


std::shared_lock<std::shared_timed_mutex> lock(mutex_);
for (size_t i = 0; i < per_priority_state_.size(); ++i) {
lb->per_priority_state_.push_back(PerPriorityStatePtr{new PerPriorityState});
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you build an allocate the vector outside of the lock? Then you can just acquire the lock and copy the data values in.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid. Thanks for all the locking changes. A few more random comments. Also I think the OSX build failure is legit, can you take a look?

return priority_set->hostSetsPerPriority()[0].get();
// The percentages should always add up to 100 but we have to have a return for the compiler.
NOT_REACHED;
return 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think the compiler will require the return if you have NOT_REACHED.

Runtime::RandomGenerator& random_;
const RingConstSharedPtr ring_;
const bool global_panic_;
std::vector<PerPriorityStatePtr> per_priority_state_;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would either do shared_ptr for both per_priority_state_ and per_priority_load_, or for neither. It's fine with me either way (doing it out of lock with swap is fine). Any reason not to be consistent?

…ters

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than small nit and OSX breakage.

}
const uint64_t h = hash.valid() ? hash.value() : random_.random();

uint32_t priority = LoadBalancerBase::choosePriority(h, *per_priority_load_);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@alyssawilk alyssawilk merged commit 8717773 into envoyproxy:master Jan 4, 2018
@alyssawilk alyssawilk deleted the gentle_failover branch January 30, 2018 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for N levels of failover endpoints

3 participants