Skip to content

ecds: fix a flake in the integration test#12268

Merged
mattklein123 merged 5 commits intoenvoyproxy:masterfrom
kyessenov:fix_ecds_flake
Jul 24, 2020
Merged

ecds: fix a flake in the integration test#12268
mattklein123 merged 5 commits intoenvoyproxy:masterfrom
kyessenov:fix_ecds_flake

Conversation

@kyessenov
Copy link
Copy Markdown
Contributor

Signed-off-by: Kuat Yessenov kuat@google.com

Commit Message: gRPC mux is trying to do some work during destruction (drain the queue), so it needs the parent to stay alive.
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]

Signed-off-by: Kuat Yessenov <kuat@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the quick fix. A few comments.

/wait

absl::flat_hash_set<DynamicFilterConfigProviderImpl*> filter_config_providers_;
friend class DynamicFilterConfigProviderImpl;

std::unique_ptr<Config::Subscription> subscription_;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix here. As this is a real bug and it's not immediately clear to me what the problem is, can you please do two things:

  1. Add some comments here about ordering dependencies.
  2. Can you add a unit test that would crash without this change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the stats scope. gRPC stream posts some stats on drainage, and it drains on destruction. Will do.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's another race. gRPC stream needs to stay alive on the server while the client closes it, to enable draining and the crash. Trying to find a way to emulate server closure while holding test alive.

Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
EXPECT_EQ("200", response->headers().getStatusValue());
}

TEST_P(ExtensionDiscoveryIntegrationTest, DestroyDuringInit) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more comments in here? It's not clear to me how rate limiting is part of this and what is going on here exactly.

Also, like I said before, a unit test is fine if it's easier to test this case deterministically. Up to you.

Thanks a ton for working on this.

/wait

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea why rate limiting is important, but it allows me to get through to the offending line here: https://github.com/envoyproxy/envoy/blob/master/source/common/config/grpc_mux_impl.cc#L258. The block prior to that is guarded by rate limiting check.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK this feels pretty fragile to me, but I would like to merge this to unflake things. If we can figure out a better way to test this vs. this as a follow up that would be good. cc @lizan who might have additional thoughts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think stat update should be guarded by an exceptional event (rate limit kicking in). It creates a really tough edge case to hit.

Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@mattklein123 mattklein123 merged commit 44f4399 into envoyproxy:master Jul 24, 2020
kyessenov added a commit to istio/envoy that referenced this pull request Jul 27, 2020
Signed-off-by: Kuat Yessenov <kuat@google.com>
istio-testing pushed a commit to istio/envoy that referenced this pull request Jul 28, 2020
* format fix

Signed-off-by: Kuat Yessenov <kuat@google.com>

* ecds: fix a flake in the integration test (envoyproxy#12268)

Signed-off-by: Kuat Yessenov <kuat@google.com>

* format fix

Signed-off-by: Kuat Yessenov <kuat@google.com>
KBaichoo pushed a commit to KBaichoo/envoy that referenced this pull request Jul 30, 2020
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
chaoqin-li1123 pushed a commit to chaoqin-li1123/envoy that referenced this pull request Aug 7, 2020
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: chaoqinli <chaoqinli@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants