Add initial support for rsvd accounting hugetlb cgroup#2360
Add initial support for rsvd accounting hugetlb cgroup#2360odinuge wants to merge 1 commit intoopencontainers:masterfrom
Conversation
The previous non-rsvd max/limit_in_bytes does not account for reserved huge page memory, making it possible for a processes to reserve all the huge page memory, without being able to allocate it (due to cgroup restrictions). In practice this makes it possible to successfully mmap more huge page memory than allowed via the cgroup settings, but when using the memory the process will get a SIGBUS and crash. This is bad for applications trying to mmap at startup (and it succeeds), but the program crashes when starting to use the memory. eg. postgres is doing this by default. This also keeps writing to the old max/limit_in_bytes, to make sure some applications read the wrong value. More info can be found here: https://lkml.org/lkml/2020/2/3/1153 Signed-off-by: Odin Ugedal <odin@ugedal.com>
d8fe1b1 to
5c84b1a
Compare
| } | ||
|
|
||
| func (s *HugetlbGroup) Set(path string, cgroup *configs.Cgroup) error { | ||
| supportsReservationAccounting := s.HasReservationAccountingSupport(path) |
There was a problem hiding this comment.
Not sure if this is the best way to check, or should we try to "cache" the value like we do with HugePageSizes?
|
|
||
| for _, pagesize := range hugePageSizes { | ||
| usage := strings.Join([]string{"hugetlb", pagesize, "current"}, ".") | ||
| filenamePrefix := strings.Join([]string{"hugetlb", pagesize}, ".") |
There was a problem hiding this comment.
nit: maybe it would be better to have it as
filenamePrefix := "hugetlb."+pagesize(for readability)
| usage := fmt.Sprintf("%s.current", filenamePrefix) | ||
| value, err := fscommon.GetCgroupParamUint(dirPath, usage) | ||
| if err != nil { | ||
| return errors.Wrapf(err, "failed to parse hugetlb.%s.current file", pagesize) |
There was a problem hiding this comment.
The error message from GetCgroupParamUint already contain file name, so you can return the error as-is, no need to wrap it.
There was a problem hiding this comment.
also the error now returns the wrong file name in case supportsReservationAccounting is set
There was a problem hiding this comment.
Fixing this should be done as a separate first patch I think.
| hugetlbStats.Usage = value | ||
|
|
||
| fileName := strings.Join([]string{"hugetlb", pagesize, "events"}, ".") | ||
| fileName := fmt.Sprintf("%s.events", filenamePrefix) |
There was a problem hiding this comment.
nit: using fileName := filenamePrefix + ".events" would be faster
but either way is fine
| // is supported. This is supported from linux 5.7 | ||
| // https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/hugetlb.html | ||
| func HasReservationAccountingSupport(dirPath string) bool { | ||
| hugePageSizes, err := cgroups.GetHugePageSize() |
There was a problem hiding this comment.
Yes, I think it makes sense to do this check once, using sync.Once.
There was a problem hiding this comment.
or not... since different cgroups can have different controls I guess...
| } | ||
|
|
||
| func TestHugetlbSetHugetlbWithReservedAccounting(t *testing.T) { | ||
| helper := NewCgroupTestUtil("hugetlb", t) |
There was a problem hiding this comment.
shouldn't this test be skipped if !HasReservationAccountingSupport()?
| if len(HugePageSizes) == 0 { | ||
| return false | ||
| } | ||
| _, err := fscommon.ReadFile(path, strings.Join([]string{"hugetlb", HugePageSizes[0], "rsvd", "limit_in_bytes"}, ".")) |
There was a problem hiding this comment.
use cgroups.PathExists here
| if err != nil || len(hugePageSizes) == 0 { | ||
| return false | ||
| } | ||
| _, err = fscommon.ReadFile(dirPath, strings.Join([]string{"hugetlb", hugePageSizes[0], "rsvd", "max"}, ".")) |
There was a problem hiding this comment.
use cgroups.PathExists()
I'm afraid yes. Reservation and use are two different properties, and we should not mix them together. |
|
So, @odinuge, I think this should start with a PR to https://github.com/opencontainers/runtime-spec. Once merged, we can open a PR here (and most of the comments that I left reviewing this are still valid). |
|
I'm working on reviving this PR now, once the spec is merged. |
The previous non-rsvd max/limit_in_bytes does not account for reserved
huge page memory, making it possible for a processes to reserve all the
huge page memory, without being able to allocate it (due to cgroup
restrictions).
In practice this makes it possible to successfully mmap more huge page
memory than allowed via the cgroup settings, but when using the memory
the process will get a SIGBUS and crash. This is bad for applications
trying to mmap at startup (and it succeeds), but the program crashes
when starting to use the memory. eg. postgres is doing this by default.
This also keeps writing to the old max/limit_in_bytes, to make sure some
applications read the wrong value.
More info can be found here: https://lkml.org/lkml/2020/2/3/1153
Do we have to edit the runtime-spec in order to do this?
Also, this will fix patroni/patroni#1393 (ref. the postgres part at the top ^)