-
Notifications
You must be signed in to change notification settings - Fork 256
newuidmap/newgidmap: do not require CAP_SYS_ADMIN in the parent user namespace #132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ba1c18b to
336cead
Compare
|
seems related to torvalds/linux@41c21e3 cc @amluto |
Applies shadow-maint/shadow#132 so that we don't need to have CAP_SYS_ADMIN. See also genuinetools#170 . Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
* use Giuseppe's forked newuidmap/newgidmap Applies shadow-maint/shadow#132 so that we don't need to have CAP_SYS_ADMIN. See also #170 . Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp> * shut up codacy Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
336cead to
7f1100e
Compare
|
@giuseppe Great find. Does this for CAP_SETPCAP to be added to newuidmap and newgidmap? Could these utils be run with filecaps rather then having to be setuid? |
|
@t8m PTAL |
|
@vbatts FYI |
|
Wait. I'm already having an updated shadow build that has a newxidmap. We just talked about this |
|
@rhatdan I've not tried but I think filecaps should work as well |
7f1100e to
65f73ce
Compare
|
oh oh good. I was seeing the forked shadow, but that is a narrow bypass for this CAP issue. |
65f73ce to
186b211
Compare
186b211 to
72f8b2e
Compare
Apply shadow-maint/shadow#132 so that newuidmap/newgidmap doesn't require CAP_SYS_ADMIN Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Apply shadow-maint/shadow#132 so that newuidmap/newgidmap doesn't require CAP_SYS_ADMIN Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Apply shadow-maint/shadow#132 so that newuidmap/newgidmap doesn't require CAP_SYS_ADMIN Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Apply shadow-maint/shadow#132 so that newuidmap/newgidmap doesn't require CAP_SYS_ADMIN Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Apply shadow-maint/shadow#132 so that newuidmap/newgidmap doesn't require CAP_SYS_ADMIN Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
|
Hi, can you give a concise example of a way to trigger this? The patch is mostly fine, except build should absolutely not fail if libcap is not available. |
|
Hey @brauner , just wondering, is this something you've already thought through? |
|
So, I would really like to see reproducer for this scenario to better reason about it. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the euid!=owner of the userns, the kernel returns EPERM when trying
to write the uidmap and there is no CAP_SYS_ADMIN in the parent
namespace.
So that analysis is partially wrong and for what it's worth my first one [1] was too.
What actually fails is the CAP_SYS_ADMIN in the target user namespace in the first check with file_ns_capable() fails in the setuid case. Here is my full analysis of this:
Here is how I think this goes.
newuidmapissetuid, only caps that are kept areCAP_SETUIDandCAP_SETGID
unshare -U sleep infinity &
newuidmap $? 0 100000 65536
You first hit file_ns_capable(file, ns, CAP_SYS_ADMIN). This calls into cap_capable() and hits the loop
for (;;) {
/* Do we have the necessary capabilities? */
if (ns == cred->user_ns)
return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM;
/*
* If we're already at a lower level than we're looking for,
* we're done searching.
*/
if (ns->level <= cred->user_ns->level)
return -EPERM;
/*
* The owner of the user namespace in the parent of the
* user namespace has all caps.
*/
if ((ns->parent == cred->user_ns) && uid_eq(ns->owner, cred->euid))
return 0;
/*
* If you have a capability in a parent user ns, then you have
* it over all children user namespaces as well.
*/
ns = ns->parent;
}The first check fails and falls through to the end of the loop and retrieves the parent user namespace and checks whether CAP_SYS_ADMIN is available there which isn't.
newuidmaphasCAP_SETUIDas fscaps setCAP_SETUIDand all other caps other thanCAP_SETUIDhave been dropped.
unshare -U sleep infinity &
newuidmap $? 0 100000 65536
You pass the first file_ns_capable() check for CAP_SYS_ADMIN since the euid hasn't been changed and thus pass:
if ((ns->parent == cred->user_ns) && uid_eq(ns->owner, cred->euid))
return 0;Now this hits new_idmap_permitted() and ns_capable(ns->parent, CAP_SETUID) which is passed since CAP_SETUID is available in the parent user namespace. Now the right hand side of the conjunct file_ns_capable(file, ns->parent, CAP_SETUID) is hit and the cap_capable() loop (see above) is entered again. This passes
if (ns == cred->user_ns)
return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM;since CAP_SETUID is available in the parent user namespace. Hence you can write the mapping and are done.
[1]:
The caller doesn't need to have CAP_SYS_ADMIN in the parent user namespace it needs to have CAP_SYS_ADMIN in the target user namespace. That is if you call open() on /proc/<pid>/{g,u}id_map it will hit proc_id_map_open() which will retrieve (via a few more intermediate steps) the user namespace based on get_proc_task() which takes the pid associated with the inode of the proc file into account aka the user namespace you're seeing is not your own but that of the target task you're trying to operate on. You should always have CAP_SYS_ADMIN in there when you have unshared and or cloned it before you have written a mapping and before you have explicitly dropped any caps. If that weren't the case you would never get past the initial unconditional:
if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
goto out;
check. You would never be able to write any idmappings as an unprivileged user because you don't have CAP_SYS_ADMIN as uid != 0 in the initial user namespace.
So the problem you're hitting in the issue you've linked in is a specific corner case:
- you're uid != 0 (e.g. 1000)
- you're euid != uid
~~So you're hitting:
if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) && uid_eq(ns->owner, cred->euid))
since you don't fulfill the latter requirement you're falling into the case were you now need
CAP_SETUIDorCAP_SETGIDin your user namespaceCAP_SETUIDorCAP_SETGIDin the target's parent user namespace wrt to the/proc/<pid>/{g,u}id_mapfile~~
So to handle the corner case you want to handle all you should need is to do the seteuid() (It's late and I'm jotting down the analysis so please do test.). Also please note that this won't help you in the general case where you want to write more than a single mapping of a uid pair that is always allowed. To write more complex mappings you will need privileges that's why I don't see the need for the cap dance here.
Last thing that I find odd. The PRs of @AkihiroSuda that are linked here that switch to @giuseppe's vendored new{g,u}idmap do not show that this actually works and are misleading because the setuid is set on them. So it's trivial that they work.~~
setuid is set but CAP_SYS_ADMIN is dropped by default in Docker/Kubernetes. |
8cc6457 to
a03ebe4
Compare
|
Gah - sorry about that the repeated comments. github was wonky lastnight. |
configure.ac
Outdated
| LIBCAP=-lcap | ||
| AC_DEFINE(HAVE_LIBCAP, 1, [Defined if you have libcap.])], | ||
| [LIBCAP=""])], | ||
| [LIBCAP=""]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you checking for libcap? You're only using the raw syscalls cap{g,s}et() so there's no use in linking new{g,u}idmap against libcap, I think.
libmisc/idmapping.c
Outdated
| exit(EXIT_FAILURE); | ||
| } | ||
|
|
||
| data[0].permitted |= CAP_TO_MASK(uid_map ? CAP_SETUID : CAP_SETGID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clear all other capabilities. You really only need CAP_SET*ID and set it in effective too otherwise this will not work. :)
a03ebe4 to
f339235
Compare
if the euid!=owner of the userns, the kernel returns EPERM when trying to write the uidmap and there is no CAP_SYS_ADMIN in the parent namespace. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
f339235 to
1ecca84
Compare
|
I am trying to undetstand what is being proposed and what is going on here.
So I see a great argument for making newuidmap use file capabilities to gain SETUID and newgidmap to use file capabilities to gain SETGID. Beyond that everything appears to be the callers problem. I don't see a justification for unsharing a user namespace as one euid and calling newuidmap with another euid. As the caller is doing both then the caller can take care of whatever needs to be take care of. |
|
newuidmap/newgidmap are not going to be called with a different set of credentials, we only need to set the euid to the caller of the tools. If it makes things clearer I can use File caps work fine in this case, the euid is not changed to root and we don't need the UID->suid()->root->seteuid()->UID dance when root doesn't have CAP_SYS_ADMIN as it usually happens in a container. We also avoid gaining other capabilities that are not needed. |
|
@giuseppe You are saying that this is needed for when you run podman inside of a container that does not have access to SYS_ADMIN privs, But does have SETUID/SETGID. Would this also happen if a user account dropped SYS_ADMIN from the bounding set and then attempted to run podman? |
|
@rhatdan without CAP_SYS_ADMIN podman won't work, both when running as root or as rootless with CAP_SYS_ADMIN gained in a new user namespace |
|
@giuseppe Why does it make sense to change the code of newuidmap and newgidmap to support this environment where setuid does not grant CAP_SYS_ADMIN? Why not change the spec file or Makefile to install newuidmap and newgidmap with file capabilities? It seems like changing how permissions are granted would be the simpler and much less risky change. |
I am completely fine with using file capabilities. I've used the seteuid approach as all other programs that need capabilities are installed as suid binaries and changing newuidmap/newgidmap to use file caps seemed to me like a bigger change than a slightly different code path that is used only when CAP_SYS_ADMIN is not available (so in an already "safe" environment). I've opened another PR where file capabilities are used so we can more easily compare the two options: #136 |
|
If make the goal here to always make the code in newuidmap and newgidmap always run with the minimal set of capabilities and uids. As if the utilities were exclusively using file capabilities I think there are security improvements to be had. Something like this (untested): 0001-newuidmap-newgidmap-Only-run-with-the-minimal-necess.txt |
|
@ebiederm part of me feels like "the minimal amount of work we do, the better", but please feel free to post that as a PR. note that it can't be literally what you posted, since we cannot assume that libcap is available, but I can do the autoconf tweaking if you prefer. I'm closing this request, we'll go with one of the filecaps PRs instead. Thanks, everyone. |
|
That may be, but remember all this hubub is abuot a pretty silly context - where you do not have cap_sys_admin but have cap_setuid. So now for nsfscaps to come into play, we're talking about a case where you are in a user namespace, run docker in there, and now want to run unprivileged builds inside that? |
|
@hallyn @AkihiroSuda I am going to pass at opening a PR at this time. I don't currently care enough to take the code I have posted any farther. I just wanted to point out that it is possible and without conditionals, and with just the capset system call that glibc supports to lockdown newuidmap and newgidmap and in so doing happen to fix this issue. |
|
@hallyn since file caps are not going to be the default anyway for a while, could we please reconsider this patch for now? |
|
What about verifying, along with
that ruid is either 0 or uid ? Otherwise if two users somehow share a uid delegation, I think this would let them mess with each other's namespaces. |
|
No, I guess we already check for that. |
|
(... well, the title was still wrong, but too late now) |
|
Uhm... I was about to send a proper patch for this. Let me send an improved version. |
was any comment left unaddressed code wise? I don't see any functional difference in the other PR, in facts the new code block seems to duplicate what is already done below it and doesn't build when |
|
Forgot, to push over my missing rebase. The functional changes should be pretty obvious though:
Further simplifcations:
|
More details about the issue: genuinetools/img#170