Skip to content

Add initial support for rsvd accounting hugetlb cgroup  #1050

@odinuge

Description

@odinuge

The previous non-rsvd max/limit_in_bytes does not account for reserved
huge page memory, making it possible for a processes to reserve all the
huge page memory, without being able to allocate it (due to cgroup
restrictions).

In practice this makes it possible to successfully mmap more huge page
memory than allowed via the cgroup settings, but when using the memory
the process will get a SIGBUS and crash. This is bad for applications
trying to mmap at startup (and it succeeds), but the program crashes
when starting to use the memory. eg. postgres is doing this by default.

This has lead to strange segfaults like these: patroni/patroni#1393

More info can be found here: https://lkml.org/lkml/2020/2/3/1153

In order to solve this, I think we have to main ways to do it:

  • Add writes (when supported) to rsvd for the current hugepageLimits found here. Silently ignore when rsvd is not supported.
  • Add another element called something like hugepageLimitsRsvd to enforce the rsvd. value, silently fail or return error when rsvd is not supported.

I lean toward the first approach, since adding a new item makes it harder to understand, and may lead into "bad" implementations, but am an not sure at all. The pro for the last one, for having a separate entity is that it is then up to the user of the runtime to decide, giving the "user" a full choice, even tho. i see no real reason to enforce the "old" value and not the reserved one. The current behavior makes a cgroup limited process able to reserve all the huge page memory available on a node, making it inaccessible to others.

No matter the decition, we should then update the config-linux.md docs to clarify how it should work.

Any thoughts?

Simple WIP in runc to add support for enforcing it using the hugepageLimits is here: https://github.com/opencontainers/runc/pull/2360/files

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions