Skip to content

upstream: avoid quadratic time complexity in server initialization#15237

Merged
mattklein123 merged 1 commit intoenvoyproxy:mainfrom
rojkov:cds-std-list-v2
Mar 2, 2021
Merged

upstream: avoid quadratic time complexity in server initialization#15237
mattklein123 merged 1 commit intoenvoyproxy:mainfrom
rojkov:cds-std-list-v2

Conversation

@rojkov
Copy link
Copy Markdown
Member

@rojkov rojkov commented Mar 1, 2021

Commit Message: upstream: avoid quadratic time complexity in server initialization

Additional Description:
Currently ClusterManagerInitHelper.secondary_init_clusters_ is a list. Upon every insert the list gets searched for an already added cluster with the same name. Since in normal circumstances all new clusters have unique names the worst case scenario is triggered and all elements of the list are checked sequentially.
Replace the list with a hash map.

Risk Level: low
Testing: unit tests, manual tests
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A
Contributes to #12138

For the case of 30k clusters added the difference in CdsApiImpl::onConfigUpdate() execution time is about 68 sec on i9-7920X.

Before

Duration(us)  # Calls     Mean(ns)   StdDev(ns)      Min(ns)      Max(ns)  Category     Description
    84242133        1  84242133561          nan  84242133561  84242133561  done         CdsApiImpl::onConfigUpdate
    67135851    30002      2237712  1.80539e+06          188      9198395  add_cluster  ClusterManagerInitHelper::addCluster

After

Duration(us)  # Calls     Mean(ns)   StdDev(ns)      Min(ns)      Max(ns)  Category     Description
    15398710        1  15398710271          nan  15398710271  15398710271  done         CdsApiImpl::onConfigUpdate
       12918    30002          430      21087.1          100      3104004  add_cluster  ClusterManagerInitHelper::addCluster

Currently ClusterManagerInitHelper.secondary_init_clusters_ is a list.
Upon every insert the list gets searched for an already added cluster
with the same name. Since in normal circumstances all new clusters
have unique names the worst case scenario is triggered and all elements
of the list are checked sequentially upon every cluster added.

Replace the list with a hash map.

Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Are we concerned about increased per cluster memory with this or are we thinking that this is fine since presumably the extra cost of the string keys would only be present during init? cc @jmarantz

cluster_list->remove(&cluster);
// present in the initializer map. If so, this is fine.
absl::string_view cluster_name = cluster.cluster().info()->name();
if (cluster_map->at(cluster_name) == &cluster) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we now make this check? Was this not necessary previously?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need to be an ASSERT

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is equivalent to the original: the remove() method of std::list checks if the argument is identical to any element and removes all of them for which the check is true.

As the comment above suggests it's possible that the cluster we are removing has already been initialized and is not present in the map. Without the check we'd remove a wrong cluster: the one updated once again after it had been initialized. There is a test case that tests this: bazel test //test/common/upstream:cluster_manager_impl_test --gtest_filter=*UpdateAlreadyInitialized*. It starts to fail without the check.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for the explanation!

Copy link
Copy Markdown
Contributor

@tbarrella tbarrella Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this can lead to a crash when cluster_name isn't in the map - see the crash log here

I guess the possibility of this is what the comment in line 106 refers to. Maybe @rojkov would you be able to fix?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbarrella if this is crashing can you do a revert PR and we can reapply later?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the comment was about the case when a cluster gets removed, but another one with the same name still exists.

Anyway I should have checked the code was exception safe. Will submit a new fix shortly with a test checking for removal of an unknown cluster.

@yanavlasov yanavlasov self-assigned this Mar 2, 2021
@rojkov
Copy link
Copy Markdown
Member Author

rojkov commented Mar 2, 2021

Initially I used absl::string_view for the map keys. Basically all tests pass except ads_intergation_test which crashes presumably due to objects' life time issues. So I resorted to std::string.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@mattklein123 mattklein123 merged commit b5e47bf into envoyproxy:main Mar 2, 2021
tbarrella added a commit to tbarrella/envoy that referenced this pull request Mar 6, 2021
…ation (envoyproxy#15237)"

This reverts commit b5e47bf.

Signed-off-by: Taylor Barrella <tabarr@google.com>
mattklein123 pushed a commit that referenced this pull request Mar 6, 2021
…ation (#15237)" (#15340)

This reverts commit b5e47bf.

Signed-off-by: Taylor Barrella <tabarr@google.com>
rojkov added a commit to rojkov/envoy that referenced this pull request Mar 6, 2021
…nvoyproxy#15237)

Currently ClusterManagerInitHelper.secondary_init_clusters_ is a list.
Upon every insert the list gets searched for an already added cluster
with the same name. Since in normal circumstances all new clusters
have unique names the worst case scenario is triggered and all elements
of the list are checked sequentially upon every cluster added.

Replace the list with a hash map.

Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants