vendor: bump runc to rc94 by kolyshkin · Pull Request #101888 · kubernetes/kubernetes

kolyshkin · 2021-05-11T02:18:09Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

runc v1.0.0-rc94 release notes: https://github.com/opencontainers/runc/releases/tag/v1.0.0-rc94

This also picks up a couple of commits from #100631.

Does this PR introduce a user-facing change?

NONE

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

k8s-ci-robot · 2021-05-11T02:18:17Z

Hi @kolyshkin. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

liggitt · 2021-05-17T13:44:09Z

/approve

k8s-ci-robot · 2021-05-17T13:44:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, giuseppe, kolyshkin, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [liggitt]
~~staging/src/k8s.io/api/OWNERS~~ [liggitt]
~~staging/src/k8s.io/legacy-cloud-providers/OWNERS~~ [dims,liggitt]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

SergeyKanzhelev · 2021-05-17T15:42:52Z

/retest

odinuge

Sorry to be late at the game, but didn't see this PR before it marked some of my PRs as Needs rebase.... :/

With this version of runc, we need to do some refactoring to properly enforce cgroup values. Since runc changed they way they enforce limits (by not setting cgroup resources in Apply), this breaks our use of the systemd cgroup driver. As discussed in opencontainers/runc#2813, a fix is needed in order to support it properly..

I have a PR to properly deal with systemd here: #98374.

I am however quite worried this PR (as in this PR, not mine) might cause regressions since both https://testgrid.k8s.io/sig-node-kubelet#node-kubelet-serial and https://testgrid.k8s.io/sig-node-kubelet#node-kubelet-conformance have been failing for quite some time. Those test suites are the only real ways to verify if these cause regressions. Did you run some of the tests locally (or on gce) @kolyshkin?

I think we should either try to merge the systemd-fix PR as soon as possible, and find some Googlers who can debug node-kubelet-serial tests so we can start run them again. I guess reverting this until we have working tests isn't an option, or?

odinuge · 2021-05-18T08:01:49Z

test/e2e_node/summary_test.go

 				"RSSBytes":        bounded(1*e2evolume.Kb, memoryLimit),
 				"PageFaults":      bounded(0, 1000000),
-				"MajorPageFaults": bounded(0, 10),
+				"MajorPageFaults": bounded(0, 1000),


We should revisit this, since this is changing a conformance test, and results in different behavior between cg v1 and v2. I don't have it on top of my head, but I think this will always be 0 on cg v1 since the metric only counts page faults on the pod cgroup itself.

liggitt · 2021-05-18T13:01:40Z

The runc bump was referenced in a comment about a performance regression in 1.21 (#101989), so I expect there will be a desire to take this version back to the 1.21 release branch.

However, changing conformant behavior is a problem, so all the implications of this PR have to be understood and resolved before any backports to a release branch can be done.

@SergeyKanzhelev @bobbypage what would be your expectation here?

revert this PR, resolve the cgroup implications of the version bump, then bring it all in as a single update, then pick to 1.21?
add follow-up fixes, then pick all to 1.21?

Since this caused CI to turn red, my inclination is option 1.

liggitt · 2021-05-18T13:14:06Z

revert open in #102094 to get CI running

odinuge · 2021-05-18T13:19:49Z

Thanks for jumping in @liggitt !

The runc bump was referenced in a comment about a performance regression in 1.21 (#101989), so I expect there will be a desire to take this version back to the 1.21 release branch.

Yeah, I think that was the plan yes. This is however quite a big dependency change, containing way more code and fixes than just the issue referenced here.

However, changing conformant behavior is a problem, so all the implications of this PR have to be understood and resolved before any backports to a release branch can be done.

Yeah, that is a big issue. As far as I remember from working with these systemd-things last time, this change is effectively disabling all cgroup resource enforcement on pod (kubepod-xyz) and node level (kubepods.slice) (If I remember correctly, but it looks like that is the case.).

The change in the conformance test should also be done in a separate PR imo., so that all involved sigs understands how and why.

revert this PR, resolve the cgroup implications of the version bump, then bring it all in as a single update, then pick to 1.21?

I vote for this in short term, in order to avoid other devs having to debug these issues while working on other unrelated things. I do think we should get https://testgrid.k8s.io/sig-node-kubelet#node-kubelet-serial up running again, and get proper signaling from that test suite in order to properly see that things works as expected.

Also, the fixes required to mitigate these issues are quite complex, so I think it is a bit dangerous to rush them.

edit1: added the last paragraph.

liggitt · 2021-05-18T13:36:19Z

Also, the fixes required to mitigate these issues are quite complex, so I think it is a bit dangerous to rush them.

agreed. I asked in the issue about the performance regression how that can be resolved without taking rc94 back to release-1.21 (#101989 (comment))

smarterclayton · 2021-05-18T13:58:19Z

Implication from

Since runc changed they way they enforce limits (by not setting cgroup resources in Apply), this breaks our use of the systemd cgroup driver.

So roughly we're saying that nothing in presubmit e2e is verifying the systemd-cgroup driver? And that https://testgrid.k8s.io/sig-node-kubelet#node-kubelet-serial is the way to get this going again but is currently broken?

If we can't guarantee that merges of runc don't break "resource management in kube", we definitely need that test (and others?) up and running ASAP, and potentially better ability to run those tests presubmit (as needed).

odinuge · 2021-05-18T14:10:25Z

So roughly we're saying that nothing in presubmit e2e is verifying the systemd-cgroup driver? And that https://testgrid.k8s.io/sig-node-kubelet#node-kubelet-serial is the way to get this going again but is currently broken?

Yeah, that's right as of now... But I 100% agree that it is a bad thing. node-kubelet-serial has gotten way too little love the last few years, and imo. it is the most important test suite we have (ofc. with my cgroup and sig-node bias).

I have been working for the last two years fixing failing tests in node-kubelet-serial, but since people usually don't look at it, it tends to fail, and issues and regressions are not properly detected. I have been busy with my master thesis for the last 7-8 months, and more and more of tests have started failing. When I have made runc bumps previously, I have always been running those tests myself, either locally or on my own gce account. Early in may, node-kubelet-serial started to go dark (as it is now), and I have not had time to dig more into the issue. Since it usually take a few hours to run it, it isn't suited for blocking pre submit. We could (and should) create an optional pre submit if we can get it to go green.

If we can't guarantee that merges of runc don't break "resource management in kube", we definitely need that test (and others?) up and running ASAP, and potentially better ability to run those tests presubmit (as needed).

Yup! I have been using countless hours the last 2 years trying to keep it somewhat healthy (and this is not my full time job, just a side gig while at university), but I think some people with access to the gce account need to debug it as it is now.

We should also run the same test suite for cgroup v2 as well, otherwise I really don't see how we can graduate it to beta or GA with confidence that it actually works as we expect it to do.

So roughly we're saying that nothing in presubmit e2e is verifying the systemd-cgroup driver?

The tests that verify low level stuff like cgroup values etc. are all part of node-kubelet-serial. This is because they are slow and keep restarting kubelet etc. to verify all kinds of edge cases. But yeah, I guess most test suites are using cgroupfs driver.

edit: Updated with a comment to address the updates from your comment.

harche · 2021-05-18T14:11:01Z

@smarterclayton #102092. We have node e2e jobs that verify that, but they aren't made blocking yet. We want to make them blocking, but we are still discussing if we should merge those crio jobs with existing docker job or make an independent blocking job, kubernetes/test-infra#21278 (comment)

odinuge · 2021-05-18T14:18:58Z

@harche:
@smarterclayton #102092. We have node e2e jobs that verify that, but they aren't made blocking yet. We want to make them blocking, but we are still discussing if we should merge those crio jobs with existing docker job or make an independent blocking job, kubernetes/test-infra#21278 (comment)

Yeah, that test suite have some tests (a few) that will verify metrics etc. All tests that deal with cgroups and low level kubelet stuff etc. have to be executed in serial, since they keep restarting kubelet etc., and those tests are all a part of node-kubelet-serial.

Revert "Merge pull request #101888 from kolyshkin/update-runc-rc94"

bobbypage · 2021-05-18T17:20:53Z

Thanks @odinuge for raising concerns with the runc bump with and @liggitt for quick revert. It sounds like there's two issues so far with runc update that were found:

Breaking the [sig-node] Summary API [NodeConformance] test as described in crio cgroup v1 node conformance test failing after runc update #102092
1a. It looks like this is passing in containerd conformance test https://testgrid.k8s.io/sig-node-containerd#containerd-node-conformance&width=5 , but failing in crio conformance test https://testgrid.k8s.io/sig-node-cri-o#ci-crio-cgroupv1-node-e2e-conformance
Possible breaking change when applying limits when using systemd cgroup driver?

I'm not super clear on the second case. It looks like the current approach is we call Apply followed by calling set on each subsystem with setSupportedSubsystemsV1 to set the settings directly (i.e. not using libcontainer cgroup manager). Thus appears to me that kubelet does not depend on cgroup manager's Set which behavior was changed in runc rc94. Can you please clarify what is the regression caused by runc rc94 when combined with the systemd cgroup driver? Is there some issue with that approach that breaks when upgrading to runc rc94?

/cc @mrunalp @kolyshkin

liggitt · 2021-05-18T18:56:44Z

Can you please clarify what is the regression caused by runc rc94 when combined with the systemd cgroup driver? Is there some issue with that approach that breaks when upgrading to runc rc94?

I'll defer to @odinuge (xref #101888 (review))

odinuge · 2021-05-18T19:42:24Z

I'll defer to @odinuge (xref #101888 (review))

Thanks! Yeah, I think that comment describes it, but these systemd stuff is pretty complecated. This is a good resource: https://systemd.io/CGROUP_DELEGATION/

@bobbypage: Possible breaking change when applying limits when using systemd cgroup driver?

With this PR, we never set systemd properties for resources like cpu and memory, we only write the cgroup files. Unless we use delegated subtrees (ref systemd docs), we should not write to those files. Systemd will overwrite them.

Without this PR, we set systemd resources only at creation of systemd-slice/scope, that the PR I mentioned is fixing. #98374. Look at the issue that PR mark as "fixed" in order to understand the issue in current master. It essentially works as long as we don't change the values.

To show this in real life, one can play with systemd like this:

$ sudo systemd-run -p MemoryAccounting=true -- sleep inf   
Running as unit: run-r6d7644f4a4054efe8a79bd296090fefc.service
$ cat /sys/fs/cgroup/system.slice/run-r6d7644f4a4054efe8a79bd296090fefc.service/memory.max 
max
$ sudo tee /sys/fs/cgroup/system.slice/run-r6d7644f4a4054efe8a79bd296090fefc.service/memory.max <<< 11111111111
11111111111
$ cat /sys/fs/cgroup/system.slice/run-r6d7644f4a4054efe8a79bd296090fefc.service/memory.max                     
11111108608
$ systemctl set-property run-r6d7644f4a4054efe8a79bd296090fefc.service MemoryAccounting=true # or some other random systemd thingy
$ cat /sys/fs/cgroup/system.slice/run-r6d7644f4a4054efe8a79bd296090fefc.service/memory.max  
max

This is certainly behavior we don't would like to happen...

I guess I can write a book about this, but but this comment make it easier to understand the issue @liggitt @bobbypage @kolyshkin?

edit: sorry for the typos, it is getting late here

kolyshkin · 2021-05-18T20:52:41Z

I'm not super clear on the second case. It looks like the current approach is we call Apply followed by calling set on each subsystem with setSupportedSubsystemsV1 to set the settings directly (i.e. not using libcontainer cgroup manager). Thus appears to me that kubelet does not depend on cgroup manager's Set which behavior was changed in runc rc94. Can you please clarify what is the regression caused by runc rc94 when combined with the systemd cgroup driver? Is there some issue with that approach that breaks when upgrading to runc rc94?

Set behaviour has not changed, it's Apply behavior that was changed (in case systemd cgroup manager is used), details are at opencontainers/runc#2814. I am still looking into the kubernetes e2e-node failures -- reproduced it locally but can't figure out why exactly it is failing. Going to check if #98374 helps.

kolyshkin · 2021-05-19T02:39:18Z

Breaking the [sig-node] Summary API [NodeConformance] test as described in #102092
1a. It looks like this is passing in containerd conformance test https://testgrid.k8s.io/sig-node-containerd#containerd-node-conformance&width=5 , but failing in crio conformance test https://testgrid.k8s.io/sig-node-cri-o#ci-crio-cgroupv1-node-e2e-conformance

Spent a day on it, not sure what is causing the issue, will continue tomorrow.

odinuge · 2021-05-19T08:22:15Z

Breaking the [sig-node] Summary API [NodeConformance] test as described in #102092
1a. It looks like this is passing in containerd conformance test https://testgrid.k8s.io/sig-node-containerd#containerd-node-conformance&width=5 , but failing in crio conformance test https://testgrid.k8s.io/sig-node-cri-o#ci-crio-cgroupv1-node-e2e-conformance

Spent a day on it, not sure what is causing the issue, will continue tomorrow.

Are we talking about the same things, or are you looking at some other issues @kolyshkin?

I did some testing yesterday on my PR that fixes the systemd-stuff I mentioned in #101888 (comment) yesterday, and that does fix failing test @bobbypage mentiones.

See #102092 (comment) for more info.

Also see @harche's comment here: #98374 (comment)

Here is a test run as pre submit: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/98374/pull-kubernetes-node-crio-e2e/1394592396901093376/

kolyshkin · 2021-05-19T17:12:51Z

@odinuge you're probably right; I have created #102147 that incorporates the change from #98374 to see what CI will show.

giuseppe added 2 commits May 10, 2021 17:34

test: adjust number of expected page faults

a460aaf

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

kubelet: reuse manager

fd7ecd3

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

k8s-ci-robot requested review from a team, andrewsykim and caesarxuchao May 11, 2021 02:18

SergeyKanzhelev mentioned this pull request May 13, 2021

Kubelet CPU perf regression 1.20->1.21 (10-15%) #101989

Closed

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 17, 2021

k8s-ci-robot merged commit b1b06fe into kubernetes:master May 17, 2021

k8s-ci-robot added this to the v1.22 milestone May 17, 2021

odinuge reviewed May 18, 2021

View reviewed changes

odinuge mentioned this pull request May 18, 2021

crio cgroup v1 node conformance test failing after runc update #102092

Closed

liggitt mentioned this pull request May 18, 2021

Revert "Merge pull request #101888 from kolyshkin/update-runc-rc94" #102094

Merged

k8s-ci-robot added a commit that referenced this pull request May 18, 2021

Merge pull request #102094 from liggitt/revert-runc

1295b2c

Revert "Merge pull request #101888 from kolyshkin/update-runc-rc94"

harche mentioned this pull request May 19, 2021

Set cgroup values via libcontainer #98374

Closed

kolyshkin mentioned this pull request May 19, 2021

vendor: bump runc to rc95 #102147

Merged

BenTheElder mentioned this pull request May 25, 2021

drop csi-hostpath driver kubernetes/test-infra#22193

Closed

Conversation

kolyshkin commented May 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

k8s-ci-robot commented May 11, 2021

Uh oh!

liggitt commented May 17, 2021

Uh oh!

k8s-ci-robot commented May 17, 2021

Uh oh!

SergeyKanzhelev commented May 17, 2021

Uh oh!

odinuge left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

odinuge May 18, 2021

Choose a reason for hiding this comment

Uh oh!

liggitt commented May 18, 2021

Uh oh!

liggitt commented May 18, 2021

Uh oh!

odinuge commented May 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liggitt commented May 18, 2021

Uh oh!

smarterclayton commented May 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

odinuge commented May 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harche commented May 18, 2021

Uh oh!

odinuge commented May 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bobbypage commented May 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liggitt commented May 18, 2021

Uh oh!

odinuge commented May 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kolyshkin commented May 18, 2021

Uh oh!

kolyshkin commented May 19, 2021

Uh oh!

odinuge commented May 19, 2021

Uh oh!

kolyshkin commented May 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

kolyshkin commented May 11, 2021 •

edited

Loading

odinuge left a comment •

edited

Loading

odinuge commented May 18, 2021 •

edited

Loading

smarterclayton commented May 18, 2021 •

edited

Loading

odinuge commented May 18, 2021 •

edited

Loading

odinuge commented May 18, 2021 •

edited

Loading

bobbypage commented May 18, 2021 •

edited

Loading

odinuge commented May 18, 2021 •

edited

Loading