Skip to content

vendor: bump runc to rc94#101888

Merged
k8s-ci-robot merged 4 commits intokubernetes:masterfrom
kolyshkin:update-runc-rc94
May 17, 2021
Merged

vendor: bump runc to rc94#101888
k8s-ci-robot merged 4 commits intokubernetes:masterfrom
kolyshkin:update-runc-rc94

Conversation

@kolyshkin
Copy link
Copy Markdown
Contributor

@kolyshkin kolyshkin commented May 11, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

runc v1.0.0-rc94 release notes: https://github.com/opencontainers/runc/releases/tag/v1.0.0-rc94

This also picks up a couple of commits from #100631.

Does this PR introduce a user-facing change?

NONE

giuseppe added 2 commits May 10, 2021 17:34
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 11, 2021
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @kolyshkin. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver area/cloudprovider area/code-generation area/dependency Issues or PRs related to dependency changes area/kubectl area/kubelet area/test sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 11, 2021
@k8s-ci-robot k8s-ci-robot requested review from a team, andrewsykim and caesarxuchao May 11, 2021 02:18
@liggitt
Copy link
Copy Markdown
Member

liggitt commented May 17, 2021

/approve

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, giuseppe, kolyshkin, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 17, 2021
@SergeyKanzhelev
Copy link
Copy Markdown
Member

/retest

@k8s-ci-robot k8s-ci-robot merged commit b1b06fe into kubernetes:master May 17, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone May 17, 2021
Copy link
Copy Markdown
Member

@odinuge odinuge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be late at the game, but didn't see this PR before it marked some of my PRs as Needs rebase.... :/

With this version of runc, we need to do some refactoring to properly enforce cgroup values. Since runc changed they way they enforce limits (by not setting cgroup resources in Apply), this breaks our use of the systemd cgroup driver. As discussed in opencontainers/runc#2813, a fix is needed in order to support it properly..

I have a PR to properly deal with systemd here: #98374.

I am however quite worried this PR (as in this PR, not mine) might cause regressions since both https://testgrid.k8s.io/sig-node-kubelet#node-kubelet-serial and https://testgrid.k8s.io/sig-node-kubelet#node-kubelet-conformance have been failing for quite some time. Those test suites are the only real ways to verify if these cause regressions. Did you run some of the tests locally (or on gce) @kolyshkin?

I think we should either try to merge the systemd-fix PR as soon as possible, and find some Googlers who can debug node-kubelet-serial tests so we can start run them again. I guess reverting this until we have working tests isn't an option, or?

"RSSBytes": bounded(1*e2evolume.Kb, memoryLimit),
"PageFaults": bounded(0, 1000000),
"MajorPageFaults": bounded(0, 10),
"MajorPageFaults": bounded(0, 1000),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should revisit this, since this is changing a conformance test, and results in different behavior between cg v1 and v2. I don't have it on top of my head, but I think this will always be 0 on cg v1 since the metric only counts page faults on the pod cgroup itself.

@liggitt
Copy link
Copy Markdown
Member

liggitt commented May 18, 2021

The runc bump was referenced in a comment about a performance regression in 1.21 (#101989), so I expect there will be a desire to take this version back to the 1.21 release branch.

However, changing conformant behavior is a problem, so all the implications of this PR have to be understood and resolved before any backports to a release branch can be done.

@SergeyKanzhelev @bobbypage what would be your expectation here?

  1. revert this PR, resolve the cgroup implications of the version bump, then bring it all in as a single update, then pick to 1.21?
  2. add follow-up fixes, then pick all to 1.21?

Since this caused CI to turn red, my inclination is option 1.

@liggitt
Copy link
Copy Markdown
Member

liggitt commented May 18, 2021

revert open in #102094 to get CI running

@odinuge
Copy link
Copy Markdown
Member

odinuge commented May 18, 2021

Thanks for jumping in @liggitt !

The runc bump was referenced in a comment about a performance regression in 1.21 (#101989), so I expect there will be a desire to take this version back to the 1.21 release branch.

Yeah, I think that was the plan yes. This is however quite a big dependency change, containing way more code and fixes than just the issue referenced here.

However, changing conformant behavior is a problem, so all the implications of this PR have to be understood and resolved before any backports to a release branch can be done.

Yeah, that is a big issue. As far as I remember from working with these systemd-things last time, this change is effectively disabling all cgroup resource enforcement on pod (kubepod-xyz) and node level (kubepods.slice) (If I remember correctly, but it looks like that is the case.).

The change in the conformance test should also be done in a separate PR imo., so that all involved sigs understands how and why.

  1. revert this PR, resolve the cgroup implications of the version bump, then bring it all in as a single update, then pick to 1.21?

I vote for this in short term, in order to avoid other devs having to debug these issues while working on other unrelated things. I do think we should get https://testgrid.k8s.io/sig-node-kubelet#node-kubelet-serial up running again, and get proper signaling from that test suite in order to properly see that things works as expected.

Also, the fixes required to mitigate these issues are quite complex, so I think it is a bit dangerous to rush them.

edit1: added the last paragraph.

@liggitt
Copy link
Copy Markdown
Member

liggitt commented May 18, 2021

Also, the fixes required to mitigate these issues are quite complex, so I think it is a bit dangerous to rush them.

agreed. I asked in the issue about the performance regression how that can be resolved without taking rc94 back to release-1.21 (#101989 (comment))

@smarterclayton
Copy link
Copy Markdown
Contributor

smarterclayton commented May 18, 2021

Implication from

Since runc changed they way they enforce limits (by not setting cgroup resources in Apply), this breaks our use of the systemd cgroup driver.

So roughly we're saying that nothing in presubmit e2e is verifying the systemd-cgroup driver? And that https://testgrid.k8s.io/sig-node-kubelet#node-kubelet-serial is the way to get this going again but is currently broken?

If we can't guarantee that merges of runc don't break "resource management in kube", we definitely need that test (and others?) up and running ASAP, and potentially better ability to run those tests presubmit (as needed).

@odinuge
Copy link
Copy Markdown
Member

odinuge commented May 18, 2021

So roughly we're saying that nothing in presubmit e2e is verifying the systemd-cgroup driver? And that https://testgrid.k8s.io/sig-node-kubelet#node-kubelet-serial is the way to get this going again but is currently broken?

Yeah, that's right as of now... But I 100% agree that it is a bad thing. node-kubelet-serial has gotten way too little love the last few years, and imo. it is the most important test suite we have (ofc. with my cgroup and sig-node bias).

I have been working for the last two years fixing failing tests in node-kubelet-serial, but since people usually don't look at it, it tends to fail, and issues and regressions are not properly detected. I have been busy with my master thesis for the last 7-8 months, and more and more of tests have started failing. When I have made runc bumps previously, I have always been running those tests myself, either locally or on my own gce account. Early in may, node-kubelet-serial started to go dark (as it is now), and I have not had time to dig more into the issue. Since it usually take a few hours to run it, it isn't suited for blocking pre submit. We could (and should) create an optional pre submit if we can get it to go green.

If we can't guarantee that merges of runc don't break "resource management in kube", we definitely need that test (and others?) up and running ASAP, and potentially better ability to run those tests presubmit (as needed).

Yup! I have been using countless hours the last 2 years trying to keep it somewhat healthy (and this is not my full time job, just a side gig while at university), but I think some people with access to the gce account need to debug it as it is now.

We should also run the same test suite for cgroup v2 as well, otherwise I really don't see how we can graduate it to beta or GA with confidence that it actually works as we expect it to do.

So roughly we're saying that nothing in presubmit e2e is verifying the systemd-cgroup driver?

The tests that verify low level stuff like cgroup values etc. are all part of node-kubelet-serial. This is because they are slow and keep restarting kubelet etc. to verify all kinds of edge cases. But yeah, I guess most test suites are using cgroupfs driver.

edit: Updated with a comment to address the updates from your comment.

@harche
Copy link
Copy Markdown
Contributor

harche commented May 18, 2021

@smarterclayton #102092. We have node e2e jobs that verify that, but they aren't made blocking yet. We want to make them blocking, but we are still discussing if we should merge those crio jobs with existing docker job or make an independent blocking job, kubernetes/test-infra#21278 (comment)

@odinuge
Copy link
Copy Markdown
Member

odinuge commented May 18, 2021

@harche:
@smarterclayton #102092. We have node e2e jobs that verify that, but they aren't made blocking yet. We want to make them blocking, but we are still discussing if we should merge those crio jobs with existing docker job or make an independent blocking job, kubernetes/test-infra#21278 (comment)

Yeah, that test suite have some tests (a few) that will verify metrics etc. All tests that deal with cgroups and low level kubelet stuff etc. have to be executed in serial, since they keep restarting kubelet etc., and those tests are all a part of node-kubelet-serial.

k8s-ci-robot added a commit that referenced this pull request May 18, 2021
Revert "Merge pull request #101888 from kolyshkin/update-runc-rc94"
@bobbypage
Copy link
Copy Markdown
Member

bobbypage commented May 18, 2021

Thanks @odinuge for raising concerns with the runc bump with and @liggitt for quick revert. It sounds like there's two issues so far with runc update that were found:

  1. Breaking the [sig-node] Summary API [NodeConformance] test as described in crio cgroup v1 node conformance test failing after runc update #102092
    1a. It looks like this is passing in containerd conformance test https://testgrid.k8s.io/sig-node-containerd#containerd-node-conformance&width=5 , but failing in crio conformance test https://testgrid.k8s.io/sig-node-cri-o#ci-crio-cgroupv1-node-e2e-conformance
  2. Possible breaking change when applying limits when using systemd cgroup driver?

I'm not super clear on the second case. It looks like the current approach is we call Apply followed by calling set on each subsystem with setSupportedSubsystemsV1 to set the settings directly (i.e. not using libcontainer cgroup manager). Thus appears to me that kubelet does not depend on cgroup manager's Set which behavior was changed in runc rc94. Can you please clarify what is the regression caused by runc rc94 when combined with the systemd cgroup driver? Is there some issue with that approach that breaks when upgrading to runc rc94?

/cc @mrunalp @kolyshkin

@liggitt
Copy link
Copy Markdown
Member

liggitt commented May 18, 2021

Can you please clarify what is the regression caused by runc rc94 when combined with the systemd cgroup driver? Is there some issue with that approach that breaks when upgrading to runc rc94?

I'll defer to @odinuge (xref #101888 (review))

@odinuge
Copy link
Copy Markdown
Member

odinuge commented May 18, 2021

I'll defer to @odinuge (xref #101888 (review))

Thanks! Yeah, I think that comment describes it, but these systemd stuff is pretty complecated. This is a good resource: https://systemd.io/CGROUP_DELEGATION/

@bobbypage: Possible breaking change when applying limits when using systemd cgroup driver?

With this PR, we never set systemd properties for resources like cpu and memory, we only write the cgroup files. Unless we use delegated subtrees (ref systemd docs), we should not write to those files. Systemd will overwrite them.

Without this PR, we set systemd resources only at creation of systemd-slice/scope, that the PR I mentioned is fixing. #98374. Look at the issue that PR mark as "fixed" in order to understand the issue in current master. It essentially works as long as we don't change the values.

To show this in real life, one can play with systemd like this:

$ sudo systemd-run -p MemoryAccounting=true -- sleep inf   
Running as unit: run-r6d7644f4a4054efe8a79bd296090fefc.service
$ cat /sys/fs/cgroup/system.slice/run-r6d7644f4a4054efe8a79bd296090fefc.service/memory.max 
max
$ sudo tee /sys/fs/cgroup/system.slice/run-r6d7644f4a4054efe8a79bd296090fefc.service/memory.max <<< 11111111111
11111111111
$ cat /sys/fs/cgroup/system.slice/run-r6d7644f4a4054efe8a79bd296090fefc.service/memory.max                     
11111108608
$ systemctl set-property run-r6d7644f4a4054efe8a79bd296090fefc.service MemoryAccounting=true # or some other random systemd thingy
$ cat /sys/fs/cgroup/system.slice/run-r6d7644f4a4054efe8a79bd296090fefc.service/memory.max  
max

This is certainly behavior we don't would like to happen...

I guess I can write a book about this, but but this comment make it easier to understand the issue @liggitt @bobbypage @kolyshkin?

edit: sorry for the typos, it is getting late here

@kolyshkin
Copy link
Copy Markdown
Contributor Author

I'm not super clear on the second case. It looks like the current approach is we call Apply followed by calling set on each subsystem with setSupportedSubsystemsV1 to set the settings directly (i.e. not using libcontainer cgroup manager). Thus appears to me that kubelet does not depend on cgroup manager's Set which behavior was changed in runc rc94. Can you please clarify what is the regression caused by runc rc94 when combined with the systemd cgroup driver? Is there some issue with that approach that breaks when upgrading to runc rc94?

Set behaviour has not changed, it's Apply behavior that was changed (in case systemd cgroup manager is used), details are at opencontainers/runc#2814. I am still looking into the kubernetes e2e-node failures -- reproduced it locally but can't figure out why exactly it is failing. Going to check if #98374 helps.

@kolyshkin
Copy link
Copy Markdown
Contributor Author

Breaking the [sig-node] Summary API [NodeConformance] test as described in #102092
1a. It looks like this is passing in containerd conformance test https://testgrid.k8s.io/sig-node-containerd#containerd-node-conformance&width=5 , but failing in crio conformance test https://testgrid.k8s.io/sig-node-cri-o#ci-crio-cgroupv1-node-e2e-conformance

Spent a day on it, not sure what is causing the issue, will continue tomorrow.

@odinuge
Copy link
Copy Markdown
Member

odinuge commented May 19, 2021

Breaking the [sig-node] Summary API [NodeConformance] test as described in #102092
1a. It looks like this is passing in containerd conformance test https://testgrid.k8s.io/sig-node-containerd#containerd-node-conformance&width=5 , but failing in crio conformance test https://testgrid.k8s.io/sig-node-cri-o#ci-crio-cgroupv1-node-e2e-conformance

Spent a day on it, not sure what is causing the issue, will continue tomorrow.

Are we talking about the same things, or are you looking at some other issues @kolyshkin?

I did some testing yesterday on my PR that fixes the systemd-stuff I mentioned in #101888 (comment) yesterday, and that does fix failing test @bobbypage mentiones.

See #102092 (comment) for more info.

Also see @harche's comment here: #98374 (comment)

Here is a test run as pre submit: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/98374/pull-kubernetes-node-crio-e2e/1394592396901093376/

@kolyshkin
Copy link
Copy Markdown
Contributor Author

@odinuge you're probably right; I have created #102147 that incorporates the change from #98374 to see what CI will show.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/cloudprovider area/code-generation area/dependency Issues or PRs related to dependency changes area/kubectl area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.