Fix bug in HostSetImpl::chooseLocality()#4061
Conversation
Signed-off-by: James Buckland <jbuckland@google.com>
Signed-off-by: James Buckland <jbuckland@google.com>
Signed-off-by: James Buckland <jbuckland@google.com>
htuch
left a comment
There was a problem hiding this comment.
I think this should be done at construction time, rather than at pick time; this causes us to spin through the entire queue on each pick.
+1 |
Signed-off-by: James Buckland <jbuckland@google.com>
| locality_scheduler_->add(effective_weight, locality_entries_.back()); | ||
| } | ||
| } | ||
| if (locality_scheduler_->empty()) { |
There was a problem hiding this comment.
I'm still a bit confused here; in the above loop, we don't add entries to locality_schduler_ if they have zero weight. How do we get into the situation where locality_scheduler_ is populated with only entries with zero weight?
There was a problem hiding this comment.
I think there's a scenario where locality_scheduler_ is created, but we don't populate it at all. I don't know the mechanism but I suspect there's a path where effectiveLocalityWeight returns 0 for all hosts, and no entries get written. This might not get caught by the preconditions in that outer if clause.
There was a problem hiding this comment.
We should be able to do a lighter weight check for locality_scheduler_.empty() that doesn't require us to cycle through all hosts, just looking at an empty queue?
There was a problem hiding this comment.
What if the queue contains localities but they are expired? Checking to see if the queue is empty doesn't guarantee that .pick() won't return a nullptr, I think.
There was a problem hiding this comment.
locality_scheduler_ should be brand new in this function, as should any entries, so there should be no change for something to expire. This does raise an interesting bug though; we should probably perform a locality_scheduler_.reset() above the if statement, to ensure it is empty unless we explicitly initialize it. updateHosts is a "here's the entire new state of the world" operation, we should rebuild everything.
…construction. Signed-off-by: James Buckland <jbuckland@google.com>
htuch
left a comment
There was a problem hiding this comment.
Yep, this looks good! Just one nit.
| * Implements empty() on the internal queue. Does not attempt to discard expired elements. | ||
| * @return bool whether or not the internal queue is empty. | ||
| */ | ||
| bool empty() { return queue_.empty(); } |
There was a problem hiding this comment.
Nit: this should be bool empty() const.
Signed-off-by: James Buckland <jbuckland@google.com>
Description: There is a bug in
HostSetImpl::chooseLocality()where the pick()d locality could be nullptr. (and Envoy will crash on nullptr->access() as a result).This is because we won't build a schedule if there are no weighted localities, but we will build a schedule (albiet an empty one) if there are localities with weight zero. This PR introduces an
EdfScheduler<C>::empty()fn which essentially walks thru the same logic asEdfScheduler<C>::pick(), except it does not pop theCat the end. This has the added benefit(?) of pruning expired weakptrs along the way, although it does not promise to prune all of them.Risk Level: Low
Testing: Wrote new now-passing test under
//test/common/upstream:upstream_impl_test.