ecds: fix a flake in the integration test#12268
ecds: fix a flake in the integration test#12268mattklein123 merged 5 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Kuat Yessenov <kuat@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
Awesome, thanks for the quick fix. A few comments.
/wait
| absl::flat_hash_set<DynamicFilterConfigProviderImpl*> filter_config_providers_; | ||
| friend class DynamicFilterConfigProviderImpl; | ||
|
|
||
| std::unique_ptr<Config::Subscription> subscription_; |
There was a problem hiding this comment.
Thanks for the quick fix here. As this is a real bug and it's not immediately clear to me what the problem is, can you please do two things:
- Add some comments here about ordering dependencies.
- Can you add a unit test that would crash without this change?
There was a problem hiding this comment.
It's the stats scope. gRPC stream posts some stats on drainage, and it drains on destruction. Will do.
There was a problem hiding this comment.
It's another race. gRPC stream needs to stay alive on the server while the client closes it, to enable draining and the crash. Trying to find a way to emulate server closure while holding test alive.
Signed-off-by: Kuat Yessenov <kuat@google.com>
| EXPECT_EQ("200", response->headers().getStatusValue()); | ||
| } | ||
|
|
||
| TEST_P(ExtensionDiscoveryIntegrationTest, DestroyDuringInit) { |
There was a problem hiding this comment.
Can you add more comments in here? It's not clear to me how rate limiting is part of this and what is going on here exactly.
Also, like I said before, a unit test is fine if it's easier to test this case deterministically. Up to you.
Thanks a ton for working on this.
/wait
There was a problem hiding this comment.
I have no idea why rate limiting is important, but it allows me to get through to the offending line here: https://github.com/envoyproxy/envoy/blob/master/source/common/config/grpc_mux_impl.cc#L258. The block prior to that is guarded by rate limiting check.
There was a problem hiding this comment.
OK this feels pretty fragile to me, but I would like to merge this to unflake things. If we can figure out a better way to test this vs. this as a follow up that would be good. cc @lizan who might have additional thoughts.
There was a problem hiding this comment.
Yeah, I don't think stat update should be guarded by an exceptional event (rate limit kicking in). It creates a really tough edge case to hit.
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
* format fix Signed-off-by: Kuat Yessenov <kuat@google.com> * ecds: fix a flake in the integration test (envoyproxy#12268) Signed-off-by: Kuat Yessenov <kuat@google.com> * format fix Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com> Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com> Signed-off-by: chaoqinli <chaoqinli@google.com>
Signed-off-by: Kuat Yessenov kuat@google.com
Commit Message: gRPC mux is trying to do some work during destruction (drain the queue), so it needs the parent to stay alive.
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]