enable cgroups tiers and node allocatable enforcement on pods by default.#42350
Conversation
enable node allocatable enforcement on pods by default. Signed-off-by: Vishnu kannan <vishnuk@google.com>
|
[APPROVALNOTIFIER] This PR is APPROVED The following people have approved this PR: vishh Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
|
Opening to facilitate https://groups.google.com/d/topic/kubernetes-milestone-burndown/NgpKrBm2gcA/discussion |
|
I am not sure why this needs the exception process since it made code freeze criteria earlier before being reverted to allow time for hollow kube stuff to get fixed, but this is /lgtm |
|
the dependent pr has been rebased, so we can remove the do-not-merge tag once that merges. |
|
Dependencies merged. This is good to go. |
|
Holding off until merging until code freeze is lifted - if this can be classified as a bug please feel free to remove. /cc @kubernetes/kubernetes-release-managers |
|
This I'm lifting the merge freeze based on this rationale. |
|
I think your definition of bug fix may be a little looser than mine @vishh. Is there any reason that we can't merge this as a patch release to 1.6? When were these defaults enabled in the e2e testing? How disruptive is the change expected to be on existing clusters? Given that the upgrade tests have not been running until very recently it really would have been nice to have some more soak time for these disruptive features. Also your roll out doc suggests that this should have a release note to inform operators of action required on their part. cc: @ethernetdan, @kubernetes/kubernetes-release-managers |
|
Also @vishh, will you be able to attend the burndown meeting on Friday to discuss the impact of the change? I'd rather leave the PR unmerged until then. |
|
I agree with @calebamiles that unless there is a pressing reason to merge now the safe move is to hold off until Friday when we can make a decision as part of burndown. I think we should keep the label in place until then unless again there are exigent circumstances. |
|
@ethernetdan -- enabling pod level cgroups feature was originally done in PR: but was reverted because it exposed a bug in kubemark, the revert was done to just allow the merge queue to continue pending kubemark being fixed. I am confused on why we are changing the meaning of feature freeze when dependent prs were posted but stuck in rebase loops? I think its a mistake to hold on this PR as it limits our ability to address any resolution to issues that will be encountered when this feature is turned on in the field. |
|
@ethernetdan -- to put this another way, i think everyone will turn this flag on even if we have it off by default currently. i think its a huge mistake to hold on this until friday as we need the slush time to soak out any bugs. the whole point is to have this feature on by default as it is a disruptive upgrade that we want to coincide with the release of the CRI. patch releases should not be disruptive upgrades. |
|
The pressing need is to get more soak time for this feature to weed out bugs. From a sig node standpoint this is a necessity for v1.6 (cc @dchen1107 @derekwaynecarr )
Yes. This feature requires node drains and reboots and so it is better to combine it with CRI which also required drains. Staging them across patch releases would complicate roll out for users.
#42350 (comment) hopefully answers this problem.
What do you mean by
#41349 already has a release note. Happy to add here as well if its necessary. User facing docs will be added. kubernetes/website#2649 is already out for node allocatable. The final release note has to be curated to some extent to highlight disruptive features and so I don't think per PR release notes alone will help. FYI!
That is the reason for enabling this feature/bug-fix sooner than later. Bottom line is that Node Reliability is critical for the system and it does not make sense to treat this as a feature. What is the point in running a cluster if a simple misconfigured pod could bring it down? |
|
We talked about this at Tuesday's sig-node. I don't think this pr requires an exception since it only enables a feature which is planned for 1.6 and the feature implementation were merged. |
|
@calebamiles I just saw you removed the do-not-merge label. Thanks! |
|
Automatic merge from submit-queue (batch tested with PRs 41919, 41149, 42350, 42351, 42285) |
Depends on #41753
cc @derekwaynecarr