Landlock (finally) sets sail
Please consider subscribing to LWNKernel development is not for people who lack persistence; changes can take a number of revisions and a lot of time to make it into a mainline release. Even so, the story of the Landlock security module, developed by Mickaël Salaün, seems like an extreme case; this code was merged for 5.13 after more than five years of development and 34 versions of the patch set. This sandboxing mechanism has evolved considerably since LWN covered version 3 of the patch set in 2016, so a look at what Landlock has become is warranted.Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.
Like seccomp(), Landlock is an unprivileged sandboxing mechanism; it allows a process to confine itself. The long-term vision has always included adding controls for a wide range of possible actions, but those in the actual patches have been limited to filesystem access. In the early days, Landlock worked by allowing a process to attach BPF programs to various security hooks in the kernel; those programs would then make access-control decisions when asked. BPF maps would be used to associate specific programs with portions of the filesystem, and a special seccomp() mode was used to control the whole thing.
The goals behind Landlock have never been particularly controversial, but the implementation is a different story. The use of BPF was questioned even before making BPF available to unprivileged users in any context fell out of favor. It was also felt that seccomp(), which controls access to system calls, was a poor fit for Landlock, which does not work at the system-call level. For some time, Salaün was encouraged by reviewers to add a set of dedicated system calls instead; it took him a while to give that approach a try.
In the end, though, dedicated system calls turned out to be the winning formula. Version 14 of the patch set, posted in February 2020, dropped BPF in favor of a mechanism for defining access-control rules and added a multiplexing landlock() system call to put those rules into force. The 20th version split the multiplexer into four separate system calls, but one of those was dropped in the next revision. So Landlock, as it will appear in 5.13, will bring three system calls with it.
The 5.13 Landlock API
The first of those system calls creates a rule set that will be used for access-control decisions. Each rule set must be given a set of access types that it will handle. To define a rule set that can handle all action types, one would start like this:
struct landlock_ruleset_attr ruleset_attr = {
.handled_access_fs =
LANDLOCK_ACCESS_FS_EXECUTE |
LANDLOCK_ACCESS_FS_WRITE_FILE |
LANDLOCK_ACCESS_FS_READ_FILE |
LANDLOCK_ACCESS_FS_READ_DIR |
LANDLOCK_ACCESS_FS_REMOVE_DIR |
LANDLOCK_ACCESS_FS_REMOVE_FILE |
LANDLOCK_ACCESS_FS_MAKE_CHAR |
LANDLOCK_ACCESS_FS_MAKE_DIR |
LANDLOCK_ACCESS_FS_MAKE_REG |
LANDLOCK_ACCESS_FS_MAKE_SOCK |
LANDLOCK_ACCESS_FS_MAKE_FIFO |
LANDLOCK_ACCESS_FS_MAKE_BLOCK |
LANDLOCK_ACCESS_FS_MAKE_SYM,
};
(This example and those that follow were all taken from the Landlock documentation).
Once that structure is defined, it can be used to create the rule set itself:
int landlock_create_ruleset(struct landlock_ruleset_attr *attr,
size_t attr_size, unsigned int flags);
The attr_size parameter must be the size of the landlock_ruleset_attr structure (which allows for future expansion in a compatible manner); flags must be zero (with one exception, described below). If all goes well, the return value will be a file descriptor representing the newly created rule set.
That set does not actually contain any rules, yet, so it is of limited utility. The 5.13 version of Landlock only supports a single type of rule, controlling access to everything contained within (and below) a given directory. The first step is to define a structure describing what accesses will be allowed for a given subtree; to limit access to reading and executing, one could use something like this:
struct landlock_path_beneath_attr path_beneath = {
.allowed_access =
LANDLOCK_ACCESS_FS_EXECUTE |
LANDLOCK_ACCESS_FS_READ_FILE |
LANDLOCK_ACCESS_FS_READ_DIR,
};
The landlock_path_beneath_attr structure also contains a field called parent_fd that should be set to a file descriptor for the directory where the rule is to be applied. So, for example, to limit access to /usr to the above operations, a process could open /usr as an O_PATH file descriptor, assigning the result to path_beneath.parent_fd. Finally, this rule should be added to the rule set with:
int landlock_add_rule(int ruleset_fd, enum landlock_rule_type rule_type,
void *rule_attr, unsigned int flags);
Where ruleset_fd is the file descriptor representing the rule set, rule_type is LANDLOCK_RULE_PATH_BENEATH (the only supported value, currently), rule_attr is a pointer to the structure created above, and flags is zero. The return value will be zero if all goes well. Multiple rules can be added to a single rule set.
The rule set has now been defined, but is not yet active. To bind itself to a given set, a process will call:
int landlock_restrict_self(int ruleset_fd, unsigned int flags);
Once again, flags must be zero. This operation will fail unless the process has previously called prctl() with the PR_SET_NO_NEW_PRIVS operation to prevent the acquisition of capabilities through setuid programs. Multiple calls may be made to landlock_restrict_self(), each of which will increase the number of restrictions in force. Once a rule set has been made active, it cannot be removed for the life of the process. Rules enforced by Landlock will be applied to any child processes or threads as well.
For the curious, there is a sample sandboxing program using Landlock that was added in this commit.
After 5.13
Landlock is useful in its current form, but it can be expected to gain a number of new features in future kernel releases now that the core infrastructure is in place. That could present a problem for sandboxing programs, which would like to use those newer features but must be prepared to cope with older kernels that lack them. To help future application developers, Salaün added a mechanism to help determine which features are available. If landlock_create_ruleset() is called with flags set to LANDLOCK_CREATE_RULESET_VERSION, it will return an integer value indicating which version of the Landlock API is supported; currently that value will always be one. When new features are added, the version number will be increased; developers will thus be able to use the version to know which features are supported on any given system.
Landlock has clearly reached an important milestone after more than five
years of work, but it seems just as clear that this story is not yet done.
After perhaps taking a well-deserved break, Salaün can be expected to start
fleshing out the set of Landlock features; with luck, these will not take
as long to find acceptance in the kernel community. There may come a time
when Landlock can do much of what seccomp() can do, but perhaps in
a way that is easier for application developers to use.
| Index entries for this article | |
|---|---|
| Kernel | Releases/5.13 |
| Kernel | Security/Security modules |
| Security | Linux Security Modules (LSM) |
