In centos 7.4/7.5/7.6, runc may return EINVAL when an other process read the runc status file.#3705
In centos 7.4/7.5/7.6, runc may return EINVAL when an other process read the runc status file.#3705zzzzzzzzzy9 wants to merge 1 commit intoopencontainers:mainfrom
Conversation
libcontainer/nsenter/nsexec.c
Outdated
| int i; | ||
| const int retry_times = 5; |
There was a problem hiding this comment.
You can just do
int retries = 5;
...
for (; retries > 0; retries--) {
...There was a problem hiding this comment.
Done. Thank you for your suggestion.
libcontainer/nsenter/nsexec.c
Outdated
| write_log(DEBUG, "unshare remaining namespace (except cgroupns)"); | ||
| if (unshare(config.cloneflags & ~CLONE_NEWCGROUP) < 0) | ||
| bail("failed to unshare remaining namespaces (except cgroupns)"); | ||
| for (; retries > 0; retries--) { |
There was a problem hiding this comment.
Please add a comment line to explain why this retry loop is needed
There was a problem hiding this comment.
Done. Thank you for your suggestion.
| bail("failed to unshare remaining namespaces (except cgroupns)"); | ||
| if (retries == 1) | ||
| bail("failed to unshare remaining namespaces (except cgroupns), please retry"); | ||
| } |
There was a problem hiding this comment.
Don't we need to add delay ?
There was a problem hiding this comment.
I think we don't nead to add delay. If we add a delay, the recurrence of the bug will affect the runc running speed. So I think we just need to try again quickly.
dccd897 to
de91c5e
Compare
|
I think we need to backport it to |
libcontainer/nsenter/nsexec.c
Outdated
| * kernel throws an EINVAL error. For this, I think we can try again. | ||
| */ | ||
| for (; retries > 0; retries--) { | ||
| if (unshare(config.cloneflags & ~CLONE_NEWCGROUP) >= 0) |
There was a problem hiding this comment.
Doesn't matter much, but this should be == 0 (the function returns 0 on success and -1 on error.
There was a problem hiding this comment.
Done. Thanks for your suggestion. Why I use >=0 is the origin codes using if (unshare(config.cloneflags & ~CLONE_NEWCGROUP) < 0). The Rigorous usage should be==0.
man 2 unshare:
RETURN VALUE
On success, zero returned. On failure, -1 is returned and errno is set
to indicate the error.
|
@zzyyzte can you please fix the subject of the commit? It should say something like |
| /* | ||
| * In centos 7.4/7.5/7.6, when other processes read the /proc/pid/status | ||
| * file of the runc: [1: CHILD] process, and if the runc: [1: CHILD] | ||
| * process happens to be in the unshare stage, the unshare syscall will | ||
| * report an error EINVAL. This is because when other processes read | ||
| * the status file, they will call the kernel function get_ task_ mm(), | ||
| * which sets the decision condition task->mm->mm_users +1. In the | ||
| * unshare syscall, mm does not meet the condition of mm<=1, and the | ||
| * kernel throws an EINVAL error. For this, I think we can try again. | ||
| */ |
There was a problem hiding this comment.
Doesn't this affect the rest of the unshare() calls here too?
There was a problem hiding this comment.
The retry mechanism only occurs when unshare fails, and the unshare successfully executed only once.
There was a problem hiding this comment.
Exactly, but my question is why don't we need the retry mechanism on the other unshare() calls we do here. For example, just a few lines above this, still in the runc CHILD, we unshare the userns: https://github.com/opencontainers/runc/pull/3705/files#diff-6383238247e090d88ade6343c0ef59dd09b3c10634bf0584e78445b843c55ab0R1175
There are other unshare calls too. Don't we want to cover with this retry mechanism all of them?
There was a problem hiding this comment.
@zzyyzte btw, IIRC there are more calls to unshare than these two
There was a problem hiding this comment.
I would also write a wrapper for unshare that will retry and use it from all places that call unshare now. That should not create any measurable overhead I guess.
There was a problem hiding this comment.
Thanks for suggestions. I create a new function to try unshare syscall.
fa21f04 to
c63dc96
Compare
Done @kolyshkin |
| if (unshare(config.cloneflags & ~CLONE_NEWCGROUP) < 0) | ||
| bail("failed to unshare remaining namespaces (except cgroupns)"); | ||
| /* | ||
| * In centos 7.4/7.5/7.6, when other processes read the /proc/pid/status |
There was a problem hiding this comment.
Is this something that only affect old centos or is it something that still happens on modern (non-centos) kernels? I guess the latter?
If it is the former, I'd clarify this is a workaround for those kernels, if it is the latter I'd not tie the comment so much to these specific centos versions
There was a problem hiding this comment.
It's not just CentOS that has this bug, it's a kernel bug that was fixed in torvalds/linux@12c641a. The issue exists in older kernels, such as the v3.10.
There was a problem hiding this comment.
Oh, okay, can you then make the comment not centos specific and mention that it was fixed there with kernel XX?
So we can drop it in the future if we want/need.
In centos 7.4/7.5/7.6, runc may return EINVAL when an other process read the runc status file. Signed-off-by: zzyyzte <zhang.yu58@zte.com.cn>
|
I don't like the current patch either and it was easier for me to write it from scratch rather than to explain what needs to be fixed here. In the process I've also found the kernel commit that fixes things, and updated the comment accordingly. Let's continue in #3772. |
In centos 7.4/7.5/7.6, when other processes read the /proc/pid/status file of the runc: [1: CHILD] process, and if the runc: [1: CHILD] process happens to be in the unshare stage, the unshare syscall will report an error EINVAL。This is because when other processes read the status file, they will call the kernel function get_ task_ mm(), which sets the decision condition task->mm->mm_users +1. In the unshare syscall, mm does not meet the condition of mm<=1, and the kernel throws an EINVAL error. For this, I think we can try again.
We can use this python script to reproduce this problem.
In addition, due to the slow running speed of python, it may be necessary to add a delay in the runc code like this: