Skip to content

Bumped the number of times a node tries to lookup itself#81880

Merged
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
MikeSpreitzer:fix81879
Nov 27, 2019
Merged

Bumped the number of times a node tries to lookup itself#81880
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
MikeSpreitzer:fix81879

Conversation

@MikeSpreitzer
Copy link
Copy Markdown
Member

@MikeSpreitzer MikeSpreitzer commented Aug 24, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change

/kind bug

/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
Increased the number of tries in pkg/util/node/node.go::GetNodeIP by 1, because the kube-proxy was giving up too early.

Based on early feedback, this PR went further (but no longer does):

The second commit goes much further, introducing a parameter and a new exponential backoff function in the wait utility package.

Which issue(s) this PR fixes:

Fixes #81879

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Increased the number of tries in pkg/util/node/node.go::GetNodeIP by
1, because the kube-proxy was giving up too early.

This is meant to address kubernetes#81879
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 24, 2019
@MikeSpreitzer
Copy link
Copy Markdown
Member Author

@kubernetes/sig-node-bugs

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 24, 2019
@MikeSpreitzer
Copy link
Copy Markdown
Member Author

@MikeSpreitzer
@yue9944882

Copy link
Copy Markdown
Contributor

@mattjmcnaughton mattjmcnaughton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your pr here :)

I'm not entirely opposed to this solution. However, it does feel like we're implementing a bit of a stop-gap solution. I wonder, did you explore how difficult it would be to make this value a parameter that the user can specify? I think a parameter is justifiable here, acceptable wait time does seem like it would very based on different use cases and different environments in which the node is running.

@MikeSpreitzer
Copy link
Copy Markdown
Member Author

MikeSpreitzer commented Aug 24, 2019

No, I did not consider adding a parameter. Before asking about the difficulty of implementing it I wonder about the advisability of doing it. Do we really want to increase the operational complexity of the relevant process(es)? Would it be sufficient to observe that the revised fixed time limit causes no problems?

@mattjmcnaughton
Copy link
Copy Markdown
Contributor

mattjmcnaughton commented Aug 25, 2019 via email

@MikeSpreitzer
Copy link
Copy Markdown
Member Author

@mattjmcnaughton : that's what I thought you meant in the first place. But I see your point that if a problem is discovered then it takes a long time to fix. I will sadly draft a revision that adds a parameter.

@mattjmcnaughton
Copy link
Copy Markdown
Contributor

mattjmcnaughton commented Aug 25, 2019 via email

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/network Categorizes an issue or PR as relevant to SIG Network. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 25, 2019
@MikeSpreitzer MikeSpreitzer force-pushed the fix81879 branch 3 times, most recently from 2386327 to 9dfe281 Compare August 25, 2019 22:13
@fejta-bot
Copy link
Copy Markdown

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@MikeSpreitzer
Copy link
Copy Markdown
Member Author

/test pull-kubernetes-bazel-build

@roycaihw
Copy link
Copy Markdown
Member

/cc @yliaog

@k8s-ci-robot k8s-ci-robot requested a review from yliaog August 26, 2019 21:10
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 6, 2019
@MikeSpreitzer
Copy link
Copy Markdown
Member Author

OK, I changed the branch in this PR back to pointing at simply bumping the count.
The more extensive revision is preserved as https://github.com/MikeSpreitzer/kubernetes/tree/fix81879%2Bparam

Copy link
Copy Markdown
Contributor

@mattjmcnaughton mattjmcnaughton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

thanks for your work here @MikeSpreitzer !

/assign @smarterclayton

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 7, 2019
@mattjmcnaughton
Copy link
Copy Markdown
Contributor

/remove-kind api-change

@k8s-ci-robot k8s-ci-robot removed the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Sep 7, 2019
@MikeSpreitzer
Copy link
Copy Markdown
Member Author

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 9, 2019
@thockin
Copy link
Copy Markdown
Member

thockin commented Sep 9, 2019

I am +1 on a simple fix. Why 6 and not 10, though? What is the practical downside of just waiting longer?

It looks like this is just used by kube-proxy, so adding a retries param to GetNodeIP() seems viable, too (if we want to keep this function really generic), but I don't see why we NEED to do that absent a real use case.

@MikeSpreitzer
Copy link
Copy Markdown
Member Author

MikeSpreitzer commented Sep 16, 2019

@thockin : I was nervous about waiting longer because when I first started working on a related issue I heard about higher level management logic with timeouts that are longer than the current timeouts but not like 500 seconds, and did not want to change the way this logic interacts with that.

@danwinship
Copy link
Copy Markdown
Contributor

"6" (which translates to a total wait of 31 seconds) seems to work in practice. If we really needed to bump beyond that then we should probably rework the way that local-up-cluster.sh startup works anyway.

Copy link
Copy Markdown
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MikeSpreitzer, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 15, 2019
@thockin
Copy link
Copy Markdown
Member

thockin commented Nov 27, 2019

/retest

@thockin
Copy link
Copy Markdown
Member

thockin commented Nov 27, 2019

Thanks for popping this back onto my radar. Not sure why this never completed. @BenTheElder can you enlighten?

We'll need to cherry pick this to 1.17

@dims
Copy link
Copy Markdown
Member

dims commented Nov 27, 2019

@thockin please approve #85663 for the cherry-pick

@BenTheElder
Copy link
Copy Markdown
Member

I think this PR was opened before this presubmit was running or required, had no code changes since then, so it was never started. I did enable it running and reporting without blocking merges for O(months) before we made it blocking.

Prow handles this poorly. IIRC there actually is support for triggering newly required jobs but that isn't enabled for some reason.

@MikeSpreitzer
Copy link
Copy Markdown
Member Author

/retest

@fejta-bot
Copy link
Copy Markdown

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 7ed5eb6 into kubernetes:master Nov 27, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Nov 27, 2019
k8s-ci-robot added a commit that referenced this pull request Nov 28, 2019
…pstream-release-1.17

Automated cherry pick of #81880: Bumped the number of times a node tries to lookup itself
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kube-proxy gives up too soon waiting for node registration in hack/local-up-cluster.sh