Skip to content

Implement vfork syscall #3123

@sporksmith

Description

@sporksmith

Part of #1987

The vfork syscall (or clone or clone3 with the CLONE_VFORK flag) is a way of saving some overhead when spawning a new process. Unlike fork, the child process shares memory with the parent (hence saving the overhead of copying page tables to make memory copy-on-write in the child). The parent process is suspended until the child process exits or execs.

Importance

Use-cases verified to not use vfork:

Use-cases that do use vfork:

  • The posix_spawn libc function is specified as using vfork. https://www.man7.org/linux/man-pages/man3/posix_spawn.3.html
  • python3's subprocess module
  • dash (which is what /bin/sh resolves to on many systems)
  • Rust's std::process::Command::spawn. It's also unusual in that it uses clone with vfork and a new stack; e.g. from strace: clone3({flags=CLONE_VM|CLONE_VFORK, exit_signal=SIGCHLD, stack=0x7efc570b3000, stack_size=0x9000}, 88)

Feasibility

Implementing the shim-side code for vfork is tricky. Unlike spawning a new thread, the child process continues running on the same stack. Unlike fork, modifications to that stack are seen in the parent as well. Therefore we can't return from our syscall handling functions, since this would corrupt the stack in the parent. We also can't long jump to the point where the syscall was made (as we do when spawning a new thread) and have the parent return normally, since this would also corrupt the stack in the parent.

We might be able to return normally in the child process, and later long-jump in the parent process when it gets to run again. This seems pretty tricky, though.

One possibility is to just treat vfork exactly like fork (and treat the CLONE_VFORK flag as a no-op). In principle this would break code that relies on implementation details of vfork under Linux, e.g. by intentionally writing to parent memory from the child, but relying on such implementation details is already pretty fragile and not-portable. e.g. the vfork man page states that POSIX.1 specifies that

behavior is undefined if the process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value from vfork(), or returns from the function in which vfork() was called, or calls any other function before successfully calling _exit(2) or one of the exec(3) family of functions.".

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: EnhancementNew functionality or improved design

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions