-
Notifications
You must be signed in to change notification settings - Fork 269
Implement vfork syscall #3123
Description
Part of #1987
The vfork syscall (or clone or clone3 with the CLONE_VFORK flag) is a way of saving some overhead when spawning a new process. Unlike fork, the child process shares memory with the parent (hence saving the overhead of copying page tables to make memory copy-on-write in the child). The parent process is suspended until the child process exits or execs.
Importance
Use-cases verified to not use vfork:
-
artiusesstd::process::Commandto spawn pluggable transport processes, which currently usesfork. -
More generally,
vforkwill probably not be used much in Rust until vfork can cause memory corruption due to the lack of #[ffi_returns_twice] rust-lang/libc#1596 is fixed. -
toralso usesfork, notvfork, to spawn processes. -
Using
straceon a simplebashscript on my machine shows it usingfork-likecloneinvocations (notvfork).
Use-cases that do use vfork:
- The
posix_spawnlibc function is specified as usingvfork. https://www.man7.org/linux/man-pages/man3/posix_spawn.3.html - python3's
subprocessmodule dash(which is what/bin/shresolves to on many systems)- Rust's
std::process::Command::spawn. It's also unusual in that it usesclonewithvforkand a new stack; e.g. from strace:clone3({flags=CLONE_VM|CLONE_VFORK, exit_signal=SIGCHLD, stack=0x7efc570b3000, stack_size=0x9000}, 88)
Feasibility
Implementing the shim-side code for vfork is tricky. Unlike spawning a new thread, the child process continues running on the same stack. Unlike fork, modifications to that stack are seen in the parent as well. Therefore we can't return from our syscall handling functions, since this would corrupt the stack in the parent. We also can't long jump to the point where the syscall was made (as we do when spawning a new thread) and have the parent return normally, since this would also corrupt the stack in the parent.
We might be able to return normally in the child process, and later long-jump in the parent process when it gets to run again. This seems pretty tricky, though.
One possibility is to just treat vfork exactly like fork (and treat the CLONE_VFORK flag as a no-op). In principle this would break code that relies on implementation details of vfork under Linux, e.g. by intentionally writing to parent memory from the child, but relying on such implementation details is already pretty fragile and not-portable. e.g. the vfork man page states that POSIX.1 specifies that
behavior is undefined if the process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value from vfork(), or returns from the function in which vfork() was called, or calls any other function before successfully calling _exit(2) or one of the exec(3) family of functions.".