Skip to content

xDS fault injection e2e test: fix flakes caused by processing queued calls in parallel#32429

Merged
markdroth merged 1 commit intogrpc:masterfrom
markdroth:xds_fault_injection_test_flake
Feb 17, 2023
Merged

xDS fault injection e2e test: fix flakes caused by processing queued calls in parallel#32429
markdroth merged 1 commit intogrpc:masterfrom
markdroth:xds_fault_injection_test_flake

Conversation

@markdroth
Copy link
Copy Markdown
Member

The XdsFaultInjectionMaxFault test has seen a few flakes since #32326 was merged. I believe the flakiness is caused by the fact that when a large number of RPCs are queued up before the resolver result comes in, those RPCs are now re-processed in parallel instead of sequentially, which can cause us to delay more RPCs than we should due to the max_faults setting. To fix this, we change the test to ensure that the channel is connected (i.e., the resolver result has already been returned) before we start sending a large number of concurrent RPCs.

Although this is the only test that I've seen flakes in, I've made this same change consistently to all fault injection tests that are creating a large number of concurrent RPCs, since the same flake could affect any of them.

Copy link
Copy Markdown
Member

@yijiem yijiem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mark!

@markdroth markdroth merged commit e022a3d into grpc:master Feb 17, 2023
@markdroth markdroth deleted the xds_fault_injection_test_flake branch February 17, 2023 23:50
@copybara-service copybara-service bot added the imported Specifies if the PR has been imported to the internal repository label Feb 18, 2023
XuanWang-Amos pushed a commit to XuanWang-Amos/grpc that referenced this pull request May 1, 2023
…calls in parallel (grpc#32429)

The `XdsFaultInjectionMaxFault` test has seen a few flakes since grpc#32326
was merged. I believe the flakiness is caused by the fact that when a
large number of RPCs are queued up before the resolver result comes in,
those RPCs are now re-processed in parallel instead of sequentially,
which can cause us to delay more RPCs than we should due to the
`max_faults` setting. To fix this, we change the test to ensure that the
channel is connected (i.e., the resolver result has already been
returned) before we start sending a large number of concurrent RPCs.

Although this is the only test that I've seen flakes in, I've made this
same change consistently to all fault injection tests that are creating
a large number of concurrent RPCs, since the same flake could affect any
of them.
wanlin31 pushed a commit that referenced this pull request May 18, 2023
…calls in parallel (#32429)

The `XdsFaultInjectionMaxFault` test has seen a few flakes since #32326
was merged. I believe the flakiness is caused by the fact that when a
large number of RPCs are queued up before the resolver result comes in,
those RPCs are now re-processed in parallel instead of sequentially,
which can cause us to delay more RPCs than we should due to the
`max_faults` setting. To fix this, we change the test to ensure that the
channel is connected (i.e., the resolver result has already been
returned) before we start sending a large number of concurrent RPCs.

Although this is the only test that I've seen flakes in, I've made this
same change consistently to all fault injection tests that are creating
a large number of concurrent RPCs, since the same flake could affect any
of them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bloat/none imported Specifies if the PR has been imported to the internal repository lang/c++ per-call-memory/neutral per-channel-memory/neutral release notes: no Indicates if PR should not be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants