Skip to content

somewhat confusing bind,remount behaviour  #2433

@cyphar

Description

@cyphar

There are a few aspects of mount(8)'s behaviour when dealing with bind-mounts and remounting that are quite strange. I started looking at writing some patches to fix them, but given that these changes would affect every libmount user, it seemed prudent to ask the maintainer's option before writing patches.

In short, the issue is that due to the way the old mount(2) API is implemented and the understandable-but-somewhat-naive way that mount flags are handled by libmount; mount --bind -o ..., mount --bind -o remount,..., and mount -o remount,... have confusing behaviour. I haven't yet tested whether the fsconfig-based libmount hooks have the same behaviour, but given that the implementation was designed to be compatible it seems pretty likely that the behaviour is the same.

  • mount --bind -o rw on a read-only mount will silently produce a read-only bind-mount. This applies to all other clearing flags (exec, dev, suid, etc) because libmount will only try to do a MS_BIND|MS_REMOUNT reconfigure if there are any flags requested -- but the code doesn't account for the fact that the user might've requested a MNT_INVERT flag. Basically, it seems that this should be handled like mount --bind -o remount,... where the current flags are read and then the options are applied to the existing flags. Or, if the intended behaviour is "I only want the requested flags, ignore the old flags" then MS_BIND|MS_REMOUNT needs to be done unconditionally.
    • This behaviour also means that mount --bind -o ro on a nosuid mount will clear the nosuid bit silently.
  • For mount --bind -o remount,..., the way libmount treats atime (by treating them as though they are standard mount flags) leads to confusing behaviour. The current behaviour handles the most trivial case -- mount --bind -o remount,atime on a noatime mount works, but the following cases do not:
    • mount --bind -o remount,relatime on a noatime mount doesn't work (it silently produces an noatime mount) because libmount will do mount(MS_NOATIME|MS_RELATIME) but MS_RELATIME is actually ignored by the kernel if MS_NOATIME is set.
    • mount --bind -o remount,nodiratime,norelatime on a strictatime mount will produce a diratime,relatime mount because norelatime in this case should be replaced with stricatime because a MS_DIRATIME mount ends up implying MS_RELATIME internally.
    • mount --bind -o remount,ro of a nodiratime,strictatime mount produces as a diratime,relatime mount because MS_STRICTATIME is not an actual mount flag shown in statfs and /proc/self/mountinfo -- so libmount ends up passing just MS_NODIRATIME as the "previous" mount flags, which the kernel then turns in to MS_NODIRATIME|MS_RELATIME.
    • mount --bind -o remount,strictatime on a nodiratime mount will produce a nodiratime mount because MS_STRICTATIME doesn't clear MS_NODIRATIME. To be fair, this is arguably "expected" behaviour depending on what semantics you want.

I don't know what the expected semantics of mount --bind -o are supposed to be, but if the idea is that you specify all of the flags (and the old mount flags are completely ignored), then you need to unconditionally call mount(MS_BIND|MS_REMOUNT) in order to ensure that you clear flags when requested. The downside is that this also means that mount --bind and mount --bind -o might have different behaviour (presumably the behaviour that mount --bind retains the old mount flags is something people want). In addition, you probably want to pass MS_RELATIME by default (if no atime flags are specified) in order to force the default kernel setting, rather than inheriting the old atime flag (to match the other mount flags).

For mount --bind -o remount,... the atime semantics should probably be something like:

  • If MS_RELATIME|MS_NOATIME are not present in /proc/self/mountinfo or statfs then add MS_STRICTATIME to correctly handle the strictatime cases.
  • If an atime flag (strictatime, relatime, noatime) is requested, clear all other atime flags from the "old" set. Currently the old atime flags are kept but this results in weird behaviour because some of the atime flags are technically an enum and so passing multiple values produces incorrect behaviour (most notably in the MS_RELATIME|MS_NOATIME case).
  • norelatime should probably be converted to strictatime in some cases.
  • The interaction of atime, diratime, norelatime and nostrictatime needs to be reconsidered. Does atime just mean "no MS_NOATIME" or should it also clear MS_NODIRATIME? What is nostrictatime supposed to mean? You cannot clear the MS_STRICTATIME flag as it is not a real flag -- should it instead mean MS_RELATIME (the default atime since 2009)?

I discovered this while working on runc's mount handling code. The OCI runtime-spec requires us to mirror mount(8) semantics, and I then discovered that several bugs present in runc are also present in libmount but slightly different.

I will write up a simple test script to help make it easier to understand the multitude of issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions