Skip to content

test(proxy::config): deflake TestInitialSync#94564

Merged
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
knight42:fix/TestInitialSync
Jun 12, 2021
Merged

test(proxy::config): deflake TestInitialSync#94564
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
knight42:fix/TestInitialSync

Conversation

@knight42
Copy link
Copy Markdown
Member

@knight42 knight42 commented Sep 5, 2020

Signed-off-by: knight42 anonymousknight96@gmail.com

What type of PR is this?

/kind flake

What this PR does / why we need it:

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/94556/pull-kubernetes-bazel-test/1302309891905425412

=== RUN   TestInitialSync
I0905 18:24:19.320048      23 config.go:315] Starting service config controller
I0905 18:24:19.320662      23 shared_informer.go:240] Waiting for caches to sync for service config
I0905 18:24:19.324410      23 shared_informer.go:247] Caches are synced for service config 
I0905 18:24:19.320602      23 config.go:133] Starting endpoints config controller
I0905 18:24:19.325287      23 shared_informer.go:240] Waiting for caches to sync for endpoints config
I0905 18:24:19.325457      23 shared_informer.go:247] Caches are synced for endpoints config 
    api_test.go:171: Unexpected endpoints: []*v1.Endpoints{(*v1.Endpoints)(0xc0000dc500)}, expected: []*v1.Endpoints{(*v1.Endpoints)(0xc0003e5cc0), (*v1.Endpoints)(0xc0003e5b80)}
--- FAIL: TestInitialSync (0.01s)

The error could be reproduced by the following patch:

diff --git pkg/proxy/config/config_test.go pkg/proxy/config/config_test.go
index 531b3b1a089..bec2aca56d9 100644
--- pkg/proxy/config/config_test.go
+++ pkg/proxy/config/config_test.go
@@ -23,7 +23,7 @@ import (
 	"testing"
 	"time"
 
-	"k8s.io/api/core/v1"
+	v1 "k8s.io/api/core/v1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 	"k8s.io/apimachinery/pkg/types"
 	"k8s.io/apimachinery/pkg/util/wait"
@@ -161,6 +161,7 @@ func NewEndpointsHandlerMock() *EndpointsHandlerMock {
 }
 
 func (h *EndpointsHandlerMock) OnEndpointsAdd(endpoints *v1.Endpoints) {
+	time.Sleep(time.Second)
 	h.lock.Lock()
 	defer h.lock.Unlock()
 	namespacedName := types.NamespacedName{Namespace: endpoints.Namespace, Name: endpoints.Name}

I think the root cause is even after the cache is synced, the execution of event handlers, such as OnEndpointsAdd, might not have completed yet, so the function in process field received unexpected endpoints.

Which issue(s) this PR fixes:

Part of #94528

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/flake Categorizes issue or PR as related to a flaky test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 5, 2020
@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 5, 2020
@knight42 knight42 force-pushed the fix/TestInitialSync branch from d0394f0 to b148677 Compare September 5, 2020 19:38
@knight42
Copy link
Copy Markdown
Member Author

knight42 commented Sep 5, 2020

/cc @liggitt

@liggitt
Copy link
Copy Markdown
Member

liggitt commented Sep 6, 2020

/cc @dcbw

@k8s-ci-robot k8s-ci-robot requested a review from dcbw September 6, 2020 02:30
@knight42 knight42 force-pushed the fix/TestInitialSync branch from b148677 to 3e43e62 Compare September 6, 2020 05:54
@knight42
Copy link
Copy Markdown
Member Author

knight42 commented Sep 9, 2020

/test pull-kubernetes-e2e-gce-ubuntu-containerd

Copy link
Copy Markdown

@cmluciano cmluciano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question error verbosity

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any objections to returning the more verbose error with details on the current versus expected not being equal? Is the more concise error suggesting that a timeout would be the only reason this would fail?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any objections to returning the more verbose error with details on the current versus expected not being equal

I think we could add some logging if what we got was different from the expected, but it might not be desired to return an error immediately as the poll would be interrupted.

Is the more concise error suggesting that a timeout would be the only reason this would fail?

yes

@knight42
Copy link
Copy Markdown
Member Author

knight42 commented Dec 8, 2020

@fejta-bot
Copy link
Copy Markdown

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 8, 2021
@fejta-bot
Copy link
Copy Markdown

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 7, 2021
@fejta-bot
Copy link
Copy Markdown

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@fejta-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knight42
Copy link
Copy Markdown
Member Author

/remove-lifecycle rotten
/reopen

@k8s-ci-robot k8s-ci-robot reopened this Jun 11, 2021
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@knight42: Reopened this PR.

Details

In response to this:

/remove-lifecycle rotten
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jun 11, 2021
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@knight42: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knight42 knight42 force-pushed the fix/TestInitialSync branch from 3e43e62 to aa5563b Compare June 11, 2021 13:59
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 11, 2021
Signed-off-by: Jian Zeng <zengjian.zj@bytedance.com>
@knight42 knight42 force-pushed the fix/TestInitialSync branch from aa5563b to 9109d92 Compare June 11, 2021 15:01
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 11, 2021
@knight42
Copy link
Copy Markdown
Member Author

/cc @aojea

@k8s-ci-robot k8s-ci-robot requested a review from aojea June 11, 2021 15:05
@aojea
Copy link
Copy Markdown
Member

aojea commented Jun 11, 2021

/retest

[aojea@aojea-laptop kubernetes]$ go test -v -race ./pkg/proxy/config/ -run TestInitialSync
=== RUN   TestInitialSync
I0612 00:57:32.582516  145359 config.go:315] Starting service config controller
I0612 00:57:32.582718  145359 shared_informer.go:240] Waiting for caches to sync for service config
I0612 00:57:32.582809  145359 shared_informer.go:247] Caches are synced for service config 
I0612 00:57:32.582843  145359 config.go:133] Starting endpoints config controller
I0612 00:57:32.582861  145359 shared_informer.go:240] Waiting for caches to sync for endpoints config
I0612 00:57:32.582938  145359 shared_informer.go:247] Caches are synced for endpoints config 
--- PASS: TestInitialSync (0.01s)
PASS
ok      k8s.io/kubernetes/pkg/proxy/config      0.066s
[aojea@aojea-laptop kubernetes]$ stress ./config.test -test.run TestInitialSync -test.v
5s: 644 runs so far, 0 failures
10s: 1302 runs so far, 0 failures
15s: 1955 runs so far, 0 failures
20s: 2597 runs so far, 0 failures
25s: 3228 runs so far, 0 failures
30s: 3882 runs so far, 0 failures
35s: 4538 runs so far, 0 failures
40s: 5182 runs so far, 0 failures
45s: 5823 runs so far, 0 failures
50s: 6464 runs so far, 0 failures
55s: 7095 runs so far, 0 failures
1m0s: 7738 runs so far, 0 failures
1m5s: 8375 runs so far, 0 failures
1m10s: 9015 runs so far, 0 failures

@aojea
Copy link
Copy Markdown
Member

aojea commented Jun 11, 2021

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 11, 2021
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, knight42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2021
@k8s-ci-robot k8s-ci-robot merged commit 496f16c into kubernetes:master Jun 12, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Jun 12, 2021
@knight42 knight42 deleted the fix/TestInitialSync branch June 12, 2021 02:09
@BenTheElder
Copy link
Copy Markdown
Member

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/flake Categorizes issue or PR as related to a flaky test. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants