Skip to content

upstream: fix bug causing crashes during priority host moves#6396

Merged
snowp merged 3 commits intoenvoyproxy:masterfrom
snowp:fix-eds
Mar 27, 2019
Merged

upstream: fix bug causing crashes during priority host moves#6396
snowp merged 3 commits intoenvoyproxy:masterfrom
snowp:fix-eds

Conversation

@snowp
Copy link
Copy Markdown
Contributor

@snowp snowp commented Mar 27, 2019

This fixes a bug where hosts that were moved between priorities would
not be included in the hosts_added vector, resulting in crashes if the
same host was moved multiple times when used with active health
checking: if a host was moved between priorities twice, it would first
get removed from the health checker, then on the second move the health
checker would crash as it would attempt to remove a host it didn't know
about.

We fix this by explicitly adding the existing host to the list of added
hosts iff the host existed previously in a different priority.

Uncovering this bug lead to the discovery of a bug in the batch updating
done during EDS: std::set_difference assumes that the provided ranges
are both sorted, which is not generally true during this update flow.
This meant that the filtering of hosts that were added/removed did not
work correctly, and would produce inconsistent result dependent on the
ordering of the host pointers in the unordered_map.

We fix this by using a standard for loop instead of std::set_difference.
Not only is this more correct, it should also be faster for large sets
as it performs the filtering in O(n) instead of O(n^2).

Signed-off-by: Snow Pettersen snowp@squareup.com

Risk Level: Medium
Testing: Extended EDS UT to prove that hosts aren't added/removed
Docs Changes: n/a
Release Notes: n/a
Fixes #6395

This fixes a bug where hosts that were moved between priorities would
not be included in the hosts_added vector, resulting in crashes if the
same host was moved multiple times when used with active health
checking: if a host was moved between priorities twice, it would first
get removed from the health checker, then on the second move the health
checker would crash as it would attempt to remove a host it didn't know
about.

We fix this by explicitly adding the existing host to the list of added
hosts iff the host was previously in a different priority.

Uncovering this bug lead to the discovery of a bug in the batch updating
done during EDS: std::set_difference assumes that the provided ranges
are both *sorted*, which is not generally true during this update flow.
This meant that the filtering of hosts that were added/removed did not
work correctly, and would produce inconsistent result dependent on the
ordering of the host pointers in the unordered_map.

We fix this by using a standard for loop instead of std::set_difference.
Not only is this more correct, it should also be faster for large sets
as it performs the filtering in O(n) instead of O(n^2).

Signed-off-by: Snow Pettersen <snowp@squareup.com>
Snow Pettersen added 2 commits March 27, 2019 06:04
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for fixing this. Can you add "Fixes " for both issues that this PR fixes into the description so they get closed and cross linked? Thank you!

@snowp
Copy link
Copy Markdown
Contributor Author

snowp commented Mar 27, 2019

Added it for #6395. Unfortunately I don't think this fixes #6282 since the crash seems to be elsewhere in the config update

@snowp snowp merged commit 9a82dee into envoyproxy:master Mar 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segmentation fault on EDS endpoint priority updates with active health checking

2 participants