Skip to content

enable cgroups tiers and node allocatable enforcement on pods by default.#42350

Merged
k8s-github-robot merged 1 commit into
kubernetes:masterfrom
vishh:enable-qos-cgroups
Mar 4, 2017
Merged

enable cgroups tiers and node allocatable enforcement on pods by default.#42350
k8s-github-robot merged 1 commit into
kubernetes:masterfrom
vishh:enable-qos-cgroups

Conversation

@vishh

@vishh vishh commented Mar 1, 2017

Copy link
Copy Markdown
Contributor
Pods are launched in a separate cgroup hierarchy than system services.

Depends on #41753

cc @derekwaynecarr

enable node allocatable enforcement on pods by default.

Signed-off-by: Vishnu kannan <vishnuk@google.com>
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 1, 2017
@k8s-github-robot

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

The following people have approved this PR: vishh

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 1, 2017
@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels Mar 1, 2017
@vishh

vishh commented Mar 1, 2017

Copy link
Copy Markdown
Contributor Author

@derekwaynecarr derekwaynecarr added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed release-note-label-needed labels Mar 1, 2017
@derekwaynecarr

Copy link
Copy Markdown
Member

I am not sure why this needs the exception process since it made code freeze criteria earlier before being reverted to allow time for hollow kube stuff to get fixed, but this is /lgtm

@derekwaynecarr derekwaynecarr added this to the v1.6 milestone Mar 1, 2017
@derekwaynecarr

Copy link
Copy Markdown
Member

the dependent pr has been rebased, so we can remove the do-not-merge tag once that merges.

@vishh

vishh commented Mar 1, 2017

Copy link
Copy Markdown
Contributor Author

@vishh vishh removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Mar 1, 2017
@vishh

vishh commented Mar 1, 2017

Copy link
Copy Markdown
Contributor Author

Dependencies merged. This is good to go.

@ethernetdan ethernetdan added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Mar 1, 2017
@ethernetdan

Copy link
Copy Markdown
Contributor

Holding off until merging until code freeze is lifted - if this can be classified as a bug please feel free to remove.

/cc @kubernetes/kubernetes-release-managers

@vishh

vishh commented Mar 1, 2017

Copy link
Copy Markdown
Contributor Author

This feature is attempting to fix isolation bugs in the system. So from a user standpoint, this is a bug fix @ethernetdan. I filed an exception request just to keep the release managers in the loop (@calebamiles suggested that)

I'm lifting the merge freeze based on this rationale.

@vishh vishh removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Mar 1, 2017
@k8s-reviewable

Copy link
Copy Markdown

This change is Reviewable

@calebamiles

Copy link
Copy Markdown
Contributor

I think your definition of bug fix may be a little looser than mine @vishh. Is there any reason that we can't merge this as a patch release to 1.6? When were these defaults enabled in the e2e testing? How disruptive is the change expected to be on existing clusters? Given that the upgrade tests have not been running until very recently it really would have been nice to have some more soak time for these disruptive features. Also your roll out doc suggests that this should have a release note to inform operators of action required on their part.

cc: @ethernetdan, @kubernetes/kubernetes-release-managers

@calebamiles

Copy link
Copy Markdown
Contributor

Also @vishh, will you be able to attend the burndown meeting on Friday to discuss the impact of the change? I'd rather leave the PR unmerged until then.

@ethernetdan

ethernetdan commented Mar 2, 2017

Copy link
Copy Markdown
Contributor

I agree with @calebamiles that unless there is a pressing reason to merge now the safe move is to hold off until Friday when we can make a decision as part of burndown.

I think we should keep the label in place until then unless again there are exigent circumstances.

@ethernetdan ethernetdan added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Mar 2, 2017
@derekwaynecarr

Copy link
Copy Markdown
Member

@ethernetdan -- enabling pod level cgroups feature was originally done in PR:

#41349

but was reverted because it exposed a bug in kubemark, the revert was done to just allow the merge queue to continue pending kubemark being fixed.

#42052

I am confused on why we are changing the meaning of feature freeze when dependent prs were posted but stuck in rebase loops? I think its a mistake to hold on this PR as it limits our ability to address any resolution to issues that will be encountered when this feature is turned on in the field.

@derekwaynecarr

Copy link
Copy Markdown
Member

@ethernetdan -- to put this another way, i think everyone will turn this flag on even if we have it off by default currently. i think its a huge mistake to hold on this until friday as we need the slush time to soak out any bugs. the whole point is to have this feature on by default as it is a disruptive upgrade that we want to coincide with the release of the CRI. patch releases should not be disruptive upgrades.

@vishh

vishh commented Mar 2, 2017

Copy link
Copy Markdown
Contributor Author

The pressing need is to get more soak time for this feature to weed out bugs. From a sig node standpoint this is a necessity for v1.6 (cc @dchen1107 @derekwaynecarr )

Is there any reason that we can't merge this as a patch release to 1.6

Yes. This feature requires node drains and reboots and so it is better to combine it with CRI which also required drains. Staging them across patch releases would complicate roll out for users.

When were these defaults enabled in the e2e testing?

#42350 (comment) hopefully answers this problem.

How disruptive is the change expected to be on existing clusters?

What do you mean by existing clusters? The feature will require drains. Why does this matter? Without this, nodes will fail in many different ways.

Also your roll out doc suggests that this should have a release note to inform operators of action required on their part.

#41349 already has a release note. Happy to add here as well if its necessary. User facing docs will be added. kubernetes/website#2649 is already out for node allocatable. The final release note has to be curated to some extent to highlight disruptive features and so I don't think per PR release notes alone will help. FYI!

Given that the upgrade tests have not been running until very recently it really would have been nice to have some more soak time for these disruptive features.

That is the reason for enabling this feature/bug-fix sooner than later.

Bottom line is that Node Reliability is critical for the system and it does not make sense to treat this as a feature. What is the point in running a cluster if a simple misconfigured pod could bring it down?

@derekwaynecarr derekwaynecarr added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Mar 2, 2017
@calebamiles calebamiles removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Mar 2, 2017
@dchen1107

Copy link
Copy Markdown
Member

We talked about this at Tuesday's sig-node. I don't think this pr requires an exception since it only enables a feature which is planned for 1.6 and the feature implementation were merged.

@dchen1107

Copy link
Copy Markdown
Member

@calebamiles I just saw you removed the do-not-merge label. Thanks!

@k8s-github-robot

Copy link
Copy Markdown

Automatic merge from submit-queue (batch tested with PRs 41919, 41149, 42350, 42351, 42285)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants