libct/cg: support hugetlb rsvd#4073
Conversation
ef87488 to
de92768
Compare
This adds support for hugetlb.<pagesize>.rsvd limiting and accounting. The previous non-rsvd max/limit_in_bytes does not account for reserved huge page memory, making it possible for a processes to reserve all the huge page memory, without being able to allocate it (due to cgroup restrictions). In practice this makes it possible to successfully mmap more huge page memory than allowed via the cgroup settings, but when using the memory the process will get a SIGBUS and crash. This is bad for applications trying to mmap at startup (and it succeeds), but the program crashes when starting to use the memory. eg. postgres is doing this by default. This also keeps writing to the old max/limit_in_bytes, for backward compatibility. More info can be found here: https://lkml.org/lkml/2020/2/3/1153 (commit message mostly written by Odin Ugedal) Co-authored-by: Odin Ugedal <odin@ugedal.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
de92768 to
4a7d3ae
Compare
|
I'm on the verge of backporting this to 1.1 -- there is not too much code, and this helps a lot to fix issues with e.g. postgres (see #3859 (comment)) |
| if err := cgroups.WriteFile(path, prefix+".rsvd"+suffix, val); err != nil { | ||
| if errors.Is(err, os.ErrNotExist) { | ||
| skipRsvd = true |
There was a problem hiding this comment.
the idea here, those .rsvd. files either exist or not, so if the first such file doesn't exist, we set the skipRsvd=true and do not try to use .rsvd. files any more.
| if rsvd != "" && errors.Is(err, os.ErrNotExist) { | ||
| rsvd = "" | ||
| goto again | ||
| } |
There was a problem hiding this comment.
For getting the stats, we prefer .rsvd. files, if they exist.
| return err | ||
| } | ||
| if skipRsvd { | ||
| continue |
There was a problem hiding this comment.
This can be a break instead, so the loop stops.
There was a problem hiding this comment.
We don't want to stop the loop here. We want to write all hugetlb.XXX.limit_in_bytes and, if available, all hugetlb.XXX.rsvd.limit_in_bytes as well (and we're looping over XXX).
|
Thanks for pushing this @kolyshkin! Life and work happened, and I haven't had much time to spend on runc and kernel stuff after graduating from University. Very nice seeing you pick this up and pushing it over the edge! |
This adds support for
hugetlb.<pagesize>.rsvdlimiting and accounting.The previous non-rsvd max/limit_in_bytes does not account for reserved
huge page memory, making it possible for a processes to reserve all the
huge page memory, without being able to allocate it (due to cgroup
restrictions).
In practice this makes it possible to successfully mmap more huge page
memory than allowed via the cgroup settings, but when using the memory
the process will get a SIGBUS and crash. This is bad for applications
trying to mmap at startup (and it succeeds), but the program crashes
when starting to use the memory. eg. postgres is doing this by default.
This also keeps writing to the old max/limit_in_bytes, for backward
compatibility.
More info can be found here: https://lkml.org/lkml/2020/2/3/1153
(commit message mostly written by @odinuge)
Fixes: #3859.