Skip to content

Commit 7ddb37b

Browse files
committed
fix: make OOM expression a bit less sensitive
In addition to derivative of full PSI for the affected cgroups, also look at avg10 value to provide some hysteresis against small spikes. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com> (cherry picked from commit 38e280c)
1 parent e438ec2 commit 7ddb37b

File tree

3 files changed

+6
-3
lines changed

3 files changed

+6
-3
lines changed
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
apiVersion: v1alpha1
22
kind: OOMConfig
33
triggerExpression: |-
4-
multiply_qos_vectors(d_qos_memory_full_total, {System: 8.0, Podruntime: 4.0}) > 3000.0 ||
4+
multiply_qos_vectors(d_qos_memory_full_total, {System: 8.0, Podruntime: 4.0}) > 3000.0 &&
5+
multiply_qos_vectors(qos_memory_full_avg10, {System: 1.0, Podruntime: 1.0}) > 5.0 ||
56
memory_full_avg10 > 75.0 && time_since_trigger > duration("10s")
67
cgroupRankingExpression: 'memory_max.hasValue() ? 0.0 : ({Besteffort: 1.0, Burstable: 0.5, Guaranteed: 0.0, Podruntime: 0.0, System: 0.0}[class] * double(memory_current.orValue(0u)))'
78
sampleInterval: 100ms

pkg/machinery/constants/constants.go

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1320,7 +1320,8 @@ const (
13201320
ContainerMarkerFilePath = "/usr/etc/in-container"
13211321

13221322
// DefaultOOMTriggerExpression is the default CEL expression used to determine whether to trigger OOM.
1323-
DefaultOOMTriggerExpression = `(multiply_qos_vectors(d_qos_memory_full_total, {System: 8.0, Podruntime: 4.0}) > 3000.0) ||
1323+
DefaultOOMTriggerExpression = `(multiply_qos_vectors(d_qos_memory_full_total, {System: 8.0, Podruntime: 4.0}) > 3000.0 &&
1324+
multiply_qos_vectors(qos_memory_full_avg10, {System: 1.0, Podruntime: 1.0}) > 5.0) ||
13241325
(memory_full_avg10 > 75.0 && time_since_trigger > duration("10s"))`
13251326

13261327
// DefaultOOMCgroupRankingExpression is the default CEL expression used to rank cgroups for OOM killer.

website/content/v1.12/reference/configuration/runtime/oomconfig.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@ title: OOMConfig
1717
apiVersion: v1alpha1
1818
kind: OOMConfig
1919
triggerExpression: |- # This expression defines when to trigger OOM action.
20-
multiply_qos_vectors(d_qos_memory_full_total, {System: 8.0, Podruntime: 4.0}) > 3000.0 ||
20+
multiply_qos_vectors(d_qos_memory_full_total, {System: 8.0, Podruntime: 4.0}) > 3000.0 &&
21+
multiply_qos_vectors(qos_memory_full_avg10, {System: 1.0, Podruntime: 1.0}) > 5.0 ||
2122
memory_full_avg10 > 75.0 && time_since_trigger > duration("10s")
2223
cgroupRankingExpression: 'memory_max.hasValue() ? 0.0 : ({Besteffort: 1.0, Burstable: 0.5, Guaranteed: 0.0, Podruntime: 0.0, System: 0.0}[class] * double(memory_current.orValue(0u)))' # This expression defines how to rank cgroups for OOM handler.
2324
sampleInterval: 100ms # How often should the trigger expression be evaluated.

0 commit comments

Comments
 (0)