Conversation
The following syscalls were added since kernel v5.16: - v5.17 (libseccomp v2.5.4): set_mempolicy_home_node - v6.5 (libseccomp v2.5.5): cachestat - v6.6 (libseccomp v2.5.5): fchmodat2, map_shadow_stack - v6.7 (libseccomp v2.5.5): futex_wake, futex_wait, futex_requeue [Not covered in this commit] - v6.8-rc1: statmount, listmount, lsm_get_self_attr, lsm_set_self_attr, lsm_list_modules ref: - `syscalls: update the syscall list for Linux v5.17` (libseccomp v2.5.4) seccomp/libseccomp@d83cb7a - `all: update the syscall table for Linux v6.7-rc3` (libseccomp v2.5.5) seccomp/libseccomp@53267af Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
| "get_mempolicy", | ||
| "mbind", | ||
| "set_mempolicy", | ||
| "set_mempolicy_home_node", // kernel v5.17, libseccomp v2.5.4 |
There was a problem hiding this comment.
mm/mempolicy: add set_mempolicy_home_node syscall
This syscall can be used to set a home node for the MPOL_BIND and
MPOL_PREFERRED_MANY memory policy. Users should use this syscall after
setting up a memory policy for the specified range as shown below.
mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp,
new_nodes->size + 1, 0);
sys_set_mempolicy_home_node((unsigned long)p, nr_pages * page_size,
home_node, 0);
The syscall allows specifying a home node/preferred node from which
kernel will fulfill memory allocation requests first.
...
| "alarm", | ||
| "bind", | ||
| "brk", | ||
| "cachestat", // kernel v6.5, libseccomp v2.5.5 |
There was a problem hiding this comment.
NAME
cachestat - query the page cache statistics of a file.
SYNOPSIS
#include <sys/mman.h>
struct cachestat_range {
__u64 off;
__u64 len;
};
struct cachestat {
__u64 nr_cache;
__u64 nr_dirty;
__u64 nr_writeback;
__u64 nr_evicted;
__u64 nr_recently_evicted;
};
int cachestat(unsigned int fd, struct cachestat_range *cstat_range,
struct cachestat *cstat, unsigned int flags);
DESCRIPTION
cachestat() queries the number of cached pages, number of dirty
pages, number of pages marked for writeback, number of evicted
pages, number of recently evicted pages, in the bytes range given by
`off` and `len`.
...
| "fchdir", | ||
| "fchmod", | ||
| "fchmodat", | ||
| "fchmodat2", // kernel v6.6, libseccomp v2.5.5 |
There was a problem hiding this comment.
fs: Add fchmodat2()
On the userspace side fchmodat(3) is implemented as a wrapper
function which implements the POSIX-specified interface. This
interface differs from the underlying kernel system call, which does not
have a flags argument. Most implementations require procfs [1][2].
There doesn't appear to be a good userspace workaround for this issue
but the implementation in the kernel is pretty straight-forward.
The new fchmodat2() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag,
unlike existing fchmodat.
...
| "mlock", | ||
| "mlock2", | ||
| "mlockall", | ||
| "map_shadow_stack", // kernel v6.6, libseccomp v2.5.5 |
There was a problem hiding this comment.
x86/shstk: Introduce map_shadow_stack syscall
When operating with shadow stacks enabled, the kernel will automatically
allocate shadow stacks for new threads, however in some cases userspace
will need additional shadow stacks. The main example of this is the
ucontext family of functions, which require userspace allocating and
pivoting to userspace managed stacks.
Unlike most other user memory permissions, shadow stacks need to be
provisioned with special data in order to be useful. They need to be setup
with a restore token so that userspace can pivot to them via the RSTORSSP
instruction. But, the security design of shadow stacks is that they
should not be written to except in limited circumstances. This presents a
problem for userspace, as to how userspace can provision this special
data, without allowing for the shadow stack to be generally writable.
...
| "futex_time64", | ||
| "futex_wait", // kernel v6.7, libseccomp v2.5.5 | ||
| "futex_waitv", | ||
| "futex_wake", // kernel v6.7, libseccomp v2.5.5 |
There was a problem hiding this comment.
futex: Add sys_futex_wake()
To complement sys_futex_waitv() add sys_futex_wake(). This syscall
implements what was previously known as FUTEX_WAKE_BITSET except it
uses 'unsigned long' for the bitmask and takes FUTEX2 flags.
The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms.
| "futex", | ||
| "futex_requeue", // kernel v6.7, libseccomp v2.5.5 | ||
| "futex_time64", | ||
| "futex_wait", // kernel v6.7, libseccomp v2.5.5 |
There was a problem hiding this comment.
futex: Add sys_futex_wait()
To complement sys_futex_waitv()/wake(), add sys_futex_wait(). This
syscall implements what was previously known as FUTEX_WAIT_BITSET
except it uses 'unsigned long' for the value and bitmask arguments,
takes timespec and clockid_t arguments for the absolute timeout and
uses FUTEX2 flags.
The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms.
| "ftruncate", | ||
| "ftruncate64", | ||
| "futex", | ||
| "futex_requeue", // kernel v6.7, libseccomp v2.5.5 |
There was a problem hiding this comment.
futex: Add sys_futex_requeue()
Finish off the 'simple' futex2 syscall group by adding
sys_futex_requeue(). Unlike sys_futex_{wait,wake}() its arguments are
too numerous to fit into a regular syscall. As such, use struct
futex_waitv to pass the 'source' and 'destination' futexes to the
syscall.
This syscall implements what was previously known as FUTEX_CMP_REQUEUE
and uses {val, uaddr, flags} for source and {uaddr, flags} for
destination.
This design explicitly allows requeueing between different types of
futex by having a different flags word per uaddr.
|
/cherry-pick release/1.7 |
|
@AkihiroSuda: new pull request created: #9693 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherry-pick release/1.6 |
|
@AkihiroSuda: new pull request created: #9694 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The following syscalls were added since kernel v5.16 (34f7173):
[Not covered in this commit]
ref:
syscalls: update the syscall list for Linux v5.17(libseccomp v2.5.4) seccomp/libseccomp@d83cb7aall: update the syscall table for Linux v6.7-rc3(libseccomp v2.5.5) seccomp/libseccomp@53267af