WIP: kubelet: Prioritize certain pod status updates#107897
WIP: kubelet: Prioritize certain pod status updates#107897smarterclayton wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
This construct is no longer necessary in the new code, because syncPod no longer calls needUpdate to check this value and we directly invoke syncPod only if needed.
There was a problem hiding this comment.
This value was previously always calculated, but could be invoked twice. Calculate it only once per pod.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Performed under the lock in case we decide to parallelize the syncPod method.
There was a problem hiding this comment.
This comment was duplicated and is calculated above now.
There was a problem hiding this comment.
This method holds the read lock so it can check apiStatusVersions in case we parallelize this method.
8e4d42d to
4cb4ed6
Compare
|
This is the simplest possible prioritization but i want to contrast a few different strategies before this would be a serious candidate. |
6ff507d to
f20ab45
Compare
Track how long it takes for pod updates to propagate from detection to successful change on API server. Will guide future improvements in pod start and shutdown latency.
Streamline the pod status manager to track the set of updated pods instead of using a buffered channel. Remove the time the pod status lock is held by moving other expensive checks out of the loop, which also opens the door for parallelizing the status queue later. Avoid making some checks twice now that syncPod is only called from syncBatch. Protect apiStatusVersions under the pod status lock as well to prevent accidents.
Some pod status transitions directly impact end-to-end user latency in the Kubelet, such as pods going ready, going unready, or becoming Succeeded or Failed. Prioritize the order that pods are updated in to minimize that latency.
f20ab45 to
e220a4c
Compare
|
@smarterclayton: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/triage accepted |
|
/assign @logicalhan |
|
@smarterclayton: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
|
@k8s-triage-robot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Some pod status transitions directly impact end-to-end user latency in the Kubelet, such as pods going ready, going unready, or becoming Succeeded or Failed. Prioritize the order that pods are updated in to minimize that latency.
To make this easier, the status manager reporting mechanism is streamlined to track the set of updated pods instead of using a buffered channel. Remove the time the pod status lock is held by moving other expensive checks out of the loop, which also opens the door for parallelizing the status queue later. Avoid making some checks twice now that syncPod is only called from syncBatch. Protect apiStatusVersions under the pod status lock as well to prevent accidents.
This should prevent head of line blocking where lots of status updates build up in the queue as seen in flakes like:
but have not confirmed that these are HOL.
Builds on #107896
/kind bug