Glibc and the kernel user-space API
Benefits for LWN subscribersThe primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!
We are accustomed to thinking of a system call as being a direct service request to the kernel. However, in reality, most system call invocations are mediated by wrapper functions in the GNU C library (glibc). These wrapper functions eliminate work that the programmer would otherwise need to do in order to employ a system call. But it turns out that glibc does not provide wrapper functions for all system calls, including a few that see somewhat frequent use. The question of what (if anything) to do about this situation has arisen a few times in the last few months on the libc-alpha mailing list, and has recently surfaced once more.
A system call allows a program to request a service—for example, open a file or create a new process—from the kernel. At the assembler level, making a system call requires the caller to assign the unique system call number and the argument values to particular registers, and then execute a special instruction (e.g., SYSENTER on modern x86 architectures) that switches the processor to kernel mode to execute the system-call handling code. Upon return, the kernel places the system call's result status into a particular register and executes a special instruction (e.g., SYSEXIT on x86) that returns the processor to user mode. The usual convention for the result status is that a non-negative value means success, while a negative value means failure. A negative result status is the negated error number (errno) that indicates the cause of the failure.
All of the details of making a system call are normally hidden from the user by the C library, which provides a corresponding wrapper function and header file definitions for most system calls. The wrapper function accepts the system call arguments as function arguments on the stack, initializes registers using those arguments, and executes the assembler instruction that switches to kernel mode. When the kernel returns control to user mode, the wrapper function examines the result status, assigns the (negated) error number to errno in the case of a negative result, and returns either -1 to indicate an error or the non-negative result status as the return value of the wrapper function. In many cases, the wrapper function is quite simple, performing only the steps just described. (In those cases, the wrapper is actually autogenerated from syscalls.list files in the glibc source that tabulate the types of each system call's return value and arguments.) However, in a few cases the wrapper function may do some extra work such as repackaging arguments or maintaining some state information inside the C library.
The C library thus acts as a kind of gatekeeper on the API that the kernel presents to user space. Until the C library provides a wrapper function, along with suitable header files that define the calling signature and any constant and structure definitions used by the system call, users must do some manual work to make a system call.
That manual work includes defining the structures and constants needed by the system call and then invoking the syscall() library function, which handles the details of making the system call—copying arguments to registers, switching to kernel mode, and then setting errno once the kernel returns control to user space. Any system call can be invoked in this manner, including those for which the C library already provides a wrapper. Thus for example, one can bypass the wrapper function for read() and invoke the system call directly by writing:
nread = syscall(SYS_read, fd, buf, len);
The first argument to syscall() is the number of the system call to be invoked; SYS_read is a constant whose definition is provided by including <unistd.h>
The C library used by most Linux developers is of course the GNU C library. Normally, glibc tracks kernel system call changes quite closely, adding wrapper functions and suitable header file definitions to the library as new system calls are added to the kernel. Thus, manually coding system calls is normally only needed when trying to use the latest system calls that have not yet appeared in the most recent iteration of glibc's six-month release cycle or when using a recent kernel on a system that has a significantly older version of glibc.
However, for some system calls, glibc support never appears. The
question of how the decision is made on whether to support a particular
system call in glibc has once again become a topic of discussion on the
libc-alpha mailing list. The most recent discussion started when Kees Cook,
the implementer of the recently added
finit_module() system call, submitted a rudimentary patch to add glibc
support for the system call. In response, Joseph Myers and Mike Frysinger
noted various pieces that were missing from the patch, with Joseph
adding that "in the
kexec_load discussion last May / June, doubts were expressed about whether
some existing module-related syscalls really should have had functions in
glibc.
"
The module-related system calls—init_module(), delete_module(), and so on—are among those for which glibc does not provide support. The situation is in fact slightly more complex in the case of these system calls: glibc does not provide any header file support for these system calls but does, through an accident of history, export a wrapper function ABI for the calls.
The earlier discussion that Joseph referred to took place when
Maximilian Attems attempted to add a header file to glibc to provide
support for the kexec_load() system call, stating that his aim was "to axe the
syscall maze in kexec-tools itself and have this syscall supported in
glibc.
" One of the primary glibc maintainers, Roland McGrath, had a rather different take on the
necessity of such a change, stating "I'm not really convinced this
is worthwhile. Calling 'syscall' seems quite sufficient for such arcane
and rarely-used calls.
" In other words, adding support for these
system calls clutters the glibc ABI and requires (a small amount of) extra
code in order to satisfy the needs of a handful of users who could just use
the syscall() mechanism.
Andreas Jaeger, who had reviewed earlier versions of Maximilian's
patch, noted that
"linux/syscalls.list already [has] similar esoteric syscalls like
create_module without any header support. I wouldn't object to do this for
kexec_load as well
". Roland agreed
that the kexec_load() system call is a similar case, but felt that
this point wasn't quite germane, since adding the module system calls to
the glibc ABI was a "dubious" historical step that can't be reversed for
compatibility reasons.
But in the recent discussion of finit_module(), Mike Frysinger spoke in favor of adding full glibc support for module-related system calls such as init_module(). Dave Miller made a similar argument even more succinctly:
In other words, employing syscall() can be error prone: there is no checking of argument types nor even checking that sufficient arguments have been passed.
Joseph Myers felt that the earlier kexec_load() discussions hadn't fully settled the issue, and was interested in having some concrete data on how many system calls don't have glibc wrappers. Your editor subsequently donned his man-pages maintainer hat and grepped the man pages in section 2 to determine which system calls do not have full glibc support in the form of a wrapper function and header files. The resulting list turns out to be quite long, running to nearly 40 Linux system calls. However, the story is not quite so simple, since some of those system calls are obsolete (e.g., tkill(), sysctl(), and query_module()) and others are intended for use only by the kernel or glibc (e.g., restart_syscall()). Yet others have wrappers in the C library, although the wrappers have a significantly different names and provide some piece of extra functionality on top of the system call (e.g., rt_sigqueueinfo() has a wrapper in the form of the sigqueue() library function). Clearly, no wrapper is required for those system calls, and once they are excluded there remain perhaps 15 to 20 system calls that might be candidates to have glibc support added.
Motohiro Kosaki considered that the remaining system calls could be separated into two categories: those with only one or a few applications uses and those that seemed to him to have more widespread application use. Motohiro was agnostic about whether the former category (which includes the module-related system calls, kcmp(), and kcmp_load()) required a wrapper. However, in his opinion the system calls in the latter category (which includes system calls such as ioprio_set(), ioprio_get(), and gettid()) clearly merited having full glibc support.
The lack of glibc support for gettid(), which returns the caller's kernel thread ID, is an especially noteworthy case. A long-standing glibc bug report requesting that glibc add support for this system gained little traction with the previous glibc maintainer. However, excluding that system call is rather anomalous, since it is quite frequently used and the kernel exposes thread IDs via various /proc interfaces, and glibc exposes various kernel APIs that can employ kernel thread IDs (for example, sched_setaffinity(), fcntl(), and the SIGEV_THREAD_ID notification mode for POSIX timers).
The discussion has petered out in the last few days, despite Mike
Frysinger's attempt to further push the debate along by reading and
summarizing the various pro and contra arguments in a single email. As noted by various
participants in the discussion, adding glibc wrappers for some currently
unsupported system calls would seem to have some worthwhile benefits. It
would also help to avoid the confusing situation where programmers
sometimes end up searching for a glibc wrapper function and header file
definitions that don't exist. It remains to be seen whether these arguments
will be sufficient to persuade Roland in the face of his concerns about
cluttering the glibc ABI and adding extra code to the library for the
benefit of what he believes is a relatively small number of users.
| Index entries for this article | |
|---|---|
| Kernel | Development model/User-space ABI |
