Skip to content

Fix intermittent failures#557

Merged
arkodg merged 6 commits intoenvoyproxy:mainfrom
datawire:lukeshu/flakes
Oct 14, 2022
Merged

Fix intermittent failures#557
arkodg merged 6 commits intoenvoyproxy:mainfrom
datawire:lukeshu/flakes

Conversation

@LukeShu
Copy link
Copy Markdown
Contributor

@LukeShu LukeShu commented Oct 13, 2022

Fixes #504

The XDS translator's startup is racy; it needs to be changed to handle anything that's already in the watchable.Map when it calls .Subscribe. Since this turned out to be a common pattern I created a utility function, and moved everything over to use it.

Also, a different test is broken with any -count > 1, because controller-runtime.SetupSignalHandler() is designed to panic if it is called more than once.

I used go test -v -count=300 ./... to validate that these changes make the intermittent test failures to go away.

Also, I added vendor/ to .gitignore so that I could run go mod vendor and then add log statements to 3rd-party code while debugging.

Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
If the watchable.Map has content in it already when .Subscribe() is called
on it, then those initial entries won't have a snapshot.Updates entry in
that first snapshot.  For the first snapshot we just need to iterate over
snapshot.State.

Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
controller-runtime.SetupSignalHandler() panics if called more than once in
a process.  So running the test multiple times (`go test -count=2`)
reliably causes the test to panic.

So don't use ctrl.SetupSignalHandler() in unit tests.

Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
@LukeShu LukeShu requested a review from a team as a code owner October 13, 2022 13:50
Copy link
Copy Markdown
Member

@Xunzhuo Xunzhuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, we can fix it with this way, but I wonder if we can make the first time creation, to be a update? The create/update/delete, I think they should be notified.

@LukeShu LukeShu changed the title Fix intermitent test failures Fix intermittent test failures Oct 13, 2022
@danehans danehans added the area/translator Issues related to Gateway's translation service, e.g. translating Gateway APIs into the IR. label Oct 13, 2022
@danehans danehans added this to the 0.2.0 milestone Oct 13, 2022
As the added godoc comment says, "This is better than iterating over
snapshot.Updates because it handles the case where the the watchable.Map
already contains entries before .Subscribe is called."

The generalizes the fix that I made in the XDS translator.

Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
@LukeShu
Copy link
Copy Markdown
Contributor Author

LukeShu commented Oct 13, 2022

I've added 2 new commits based on discussion from today's call.

@LukeShu LukeShu changed the title Fix intermittent test failures Fix intermittent failures Oct 13, 2022
I was going to do a type alias for `watchable.Update`, but:

    internal/message/watchutil.go:7:6: generic type cannot be alias

So I just defined a new child type, which is fine because there aren't any
methods on Update.

Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
@LukeShu LukeShu requested a review from arkodg October 14, 2022 17:08
Copy link
Copy Markdown
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for catching the race and refactoring how snapshot updates are consumed !

@arkodg arkodg merged commit 319bcb9 into envoyproxy:main Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/translator Issues related to Gateway's translation service, e.g. translating Gateway APIs into the IR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix flaky xds Translator TestRunner

4 participants