Skip to content

In centos 7.4/7.5/7.6, runc may return EINVAL when an other process read the runc status file.#3705

Closed
zzzzzzzzzy9 wants to merge 1 commit intoopencontainers:mainfrom
zzzzzzzzzy9:main
Closed

In centos 7.4/7.5/7.6, runc may return EINVAL when an other process read the runc status file.#3705
zzzzzzzzzy9 wants to merge 1 commit intoopencontainers:mainfrom
zzzzzzzzzy9:main

Conversation

@zzzzzzzzzy9
Copy link

In centos 7.4/7.5/7.6, when other processes read the /proc/pid/status file of the runc: [1: CHILD] process, and if the runc: [1: CHILD] process happens to be in the unshare stage, the unshare syscall will report an error EINVAL。This is because when other processes read the status file, they will call the kernel function get_ task_ mm(), which sets the decision condition task->mm->mm_users +1. In the unshare syscall, mm does not meet the condition of mm<=1, and the kernel throws an EINVAL error. For this, I think we can try again.

We can use this python script to reproduce this problem.

import os
import time

def clean_fd(fd):
	for i in range(len(fd)):
		try:
			os.close(fd[i])
		except Exception as e:
			print("os.close err:", e)

while(True):
	docker_runc = 'ps -ef | grep "docker-runc create" | grep -v grep'
	ps_docker_runc = os.popen(docker_runc).read().strip("\n").split("\n")
	if ps_docker_runc[0] == '':
		continue
	print("ps docker-r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unc: ", ps_docker_runc)
	runc_pid = docker_runc_pid = ps_docker_runc[0].split()[1]
	command = 'ps -ef | grep ' + docker_runc_pid + ' | grep -v docker-runc | grep -v grep'
	ps_origin = os.popen(command).read().strip("\n")
	ps_origin = ps_origin.split("\n")
	if ps_origin[0] == '':
		continue
	fd = []
	for i in range(len(ps_origin)):
		start_time = time.time()
		ps = ps_origin[i].split()import os
import time

def clean_fd(fd):
	for i in range(len(fd)):
		try:
			os.close(fd[i])
		except Exception as e:
			print("os.close err:", e)

while(True):
	docker_runc = 'ps -ef | g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rep "docker-runc create" | grep -v grep'
	ps_docker_runc = os.popen(docker_runc).read().strip("\n").split("\n")
	if ps_docker_runc[0] == '':
		continue
	print("ps docker-runc: ", ps_docker_runc)
	runc_pid = docker_runc_pid = ps_docker_runc[0].split()[1]
	command = 'ps -ef | grep ' + docker_runc_pid + ' | grep -v docker-runc | grep -v grep'
	ps_origin = os.popen(command).read().strip("\n")
	ps_origin = ps_origin.split("\n")
	if ps_origin[0] == '':
		continue
	fd = []
	for i in range(len(ps_origin)):
		start_time = time.time()
		ps = ps_origin[i].split()
		print("ps is", ps_origin[i])
		pid = ps[1]
		ppid = ps[2]
		print(pid)
		file_name = "/proc/" + pid + "/status"
		try:
			fd.append(os.open(file_name, os.O_RDWR))
		except Exception as e:
			print("os.open err:", e)
			continue
	while(True):
		try:
			_ = os.read(fd[0], 1)
			os.lseek(fd[i], 0, 0)
			_ = os.read(fd[1], 1)
			os.lseek(fd[i], 0, 0)
		except Exception as e:
			print("os.read err:", e)
			clean_fd(fd)
			break
		try:
			os.stat(file_name)
		except Exception as e:
			break

		print("ps is", ps_origin[i])
		pid = ps[1]
		ppid = ps[2]
		print(pid)
		file_name = "/proc/" + pid + "/status"
		try:
			fd.append(os.open(file_name, os.O_RDWR))
		except Exception as e:
			print("os.open err:", e)
			continue
	while(True):
		try:
			_ = os.read(fd[0], 1)
			os.lseek(fd[i], 0, 0)
			_ = os.read(fd[1], 1)
			os.lseek(fd[i], 0, 0)
		except Exception as e:
			print("os.read err:", e)
			clean_fd(fd)
			break
		try:
			os.stat(file_name)
		except Exception as e:
			break

In addition, due to the slow running speed of python, it may be necessary to add a delay in the runc code like this:

                       time.sleep(1);
                       if (unshare(config.cloneflags & ~CLONE_NEWCGROUP) < 0)
                                bail("failed to unshare remaining namespaces (except cgroupns)");

Comment on lines +1132 to +1133
int i;
const int retry_times = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just do

int retries = 5;
...
for (; retries > 0; retries--) {
...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thank you for your suggestion.

@zzzzzzzzzy9
Copy link
Author

@AkihiroSuda

write_log(DEBUG, "unshare remaining namespace (except cgroupns)");
if (unshare(config.cloneflags & ~CLONE_NEWCGROUP) < 0)
bail("failed to unshare remaining namespaces (except cgroupns)");
for (; retries > 0; retries--) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment line to explain why this retry loop is needed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thank you for your suggestion.

bail("failed to unshare remaining namespaces (except cgroupns)");
if (retries == 1)
bail("failed to unshare remaining namespaces (except cgroupns), please retry");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to add delay ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't nead to add delay. If we add a delay, the recurrence of the bug will affect the runc running speed. So I think we just need to try again quickly.

@zzzzzzzzzy9 zzzzzzzzzy9 force-pushed the main branch 2 times, most recently from dccd897 to de91c5e Compare February 5, 2023 14:28
@kolyshkin kolyshkin added the backport/1.1-todo A PR in main branch which needs to be backported to release-1.1 label Feb 9, 2023
@kolyshkin
Copy link
Contributor

I think we need to backport it to release-1.1 once merged.

AkihiroSuda
AkihiroSuda previously approved these changes Feb 9, 2023
* kernel throws an EINVAL error. For this, I think we can try again.
*/
for (; retries > 0; retries--) {
if (unshare(config.cloneflags & ~CLONE_NEWCGROUP) >= 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't matter much, but this should be == 0 (the function returns 0 on success and -1 on error.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for your suggestion. Why I use >=0 is the origin codes using if (unshare(config.cloneflags & ~CLONE_NEWCGROUP) < 0). The Rigorous usage should be==0.

man 2 unshare:
RETURN VALUE
On success, zero returned. On failure, -1 is returned and errno is set
to indicate the error.

kolyshkin
kolyshkin previously approved these changes Feb 9, 2023
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin
Copy link
Contributor

@zzyyzte can you please fix the subject of the commit? It should say something like

libct/nsenter: retry unshare on EINVAL

Comment on lines +1229 to +1253
/*
* In centos 7.4/7.5/7.6, when other processes read the /proc/pid/status
* file of the runc: [1: CHILD] process, and if the runc: [1: CHILD]
* process happens to be in the unshare stage, the unshare syscall will
* report an error EINVAL. This is because when other processes read
* the status file, they will call the kernel function get_ task_ mm(),
* which sets the decision condition task->mm->mm_users +1. In the
* unshare syscall, mm does not meet the condition of mm<=1, and the
* kernel throws an EINVAL error. For this, I think we can try again.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this affect the rest of the unshare() calls here too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry mechanism only occurs when unshare fails, and the unshare successfully executed only once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, but my question is why don't we need the retry mechanism on the other unshare() calls we do here. For example, just a few lines above this, still in the runc CHILD, we unshare the userns: https://github.com/opencontainers/runc/pull/3705/files#diff-6383238247e090d88ade6343c0ef59dd09b3c10634bf0584e78445b843c55ab0R1175

There are other unshare calls too. Don't we want to cover with this retry mechanism all of them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zzyyzte btw, IIRC there are more calls to unshare than these two

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also write a wrapper for unshare that will retry and use it from all places that call unshare now. That should not create any measurable overhead I guess.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggestions. I create a new function to try unshare syscall.

@zzzzzzzzzy9
Copy link
Author

@zzyyzte can you please fix the subject of the commit? It should say something like

libct/nsenter: retry unshare on EINVAL

Done @kolyshkin

if (unshare(config.cloneflags & ~CLONE_NEWCGROUP) < 0)
bail("failed to unshare remaining namespaces (except cgroupns)");
/*
* In centos 7.4/7.5/7.6, when other processes read the /proc/pid/status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that only affect old centos or is it something that still happens on modern (non-centos) kernels? I guess the latter?

If it is the former, I'd clarify this is a workaround for those kernels, if it is the latter I'd not tie the comment so much to these specific centos versions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just CentOS that has this bug, it's a kernel bug that was fixed in torvalds/linux@12c641a. The issue exists in older kernels, such as the v3.10.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, okay, can you then make the comment not centos specific and mention that it was fixed there with kernel XX?

So we can drop it in the future if we want/need.

In centos 7.4/7.5/7.6, runc may return EINVAL when an other process read the runc status file.

Signed-off-by: zzyyzte <zhang.yu58@zte.com.cn>
@kolyshkin
Copy link
Contributor

I don't like the current patch either and it was easier for me to write it from scratch rather than to explain what needs to be fixed here. In the process I've also found the kernel commit that fixes things, and updated the comment accordingly.

Let's continue in #3772.

@kolyshkin kolyshkin closed this Mar 16, 2023
@kolyshkin kolyshkin removed the backport/1.1-todo A PR in main branch which needs to be backported to release-1.1 label Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants