Skip to content

[Monitoring] Report pressure stall information in metrics (Linux) #41054

@andrewkroh

Description

@andrewkroh

Describe the enhancement:

Report Linux Pressure Stall Information (PSI) metrics in the beat metrics. Include PSI info when

Describe a specific use case for the enhancement or feature:

When reviewing diagnostic snapshots this information would be used to detect if CPU, Memory, or IO pressure could be causing processing to stall. If this is occurring a lot (i.e. high percentage over avg300) then this would be strong signal that some aspect of the host is overloaded. And that might be a cause of the reported issue.


Here's are some example of reading PSI info from the CLI.

System Level PSI

cat /proc/pressure/{cpu,io,irq,memory}
some avg10=0.09 avg60=0.23 avg300=0.34 total=29210385599
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
some avg10=0.00 avg60=0.00 avg300=0.00 total=2022562678
full avg10=0.00 avg60=0.00 avg300=0.00 total=1843728046
full avg10=0.00 avg60=0.00 avg300=0.00 total=16487083525
some avg10=0.00 avg60=0.00 avg300=0.00 total=2107930
full avg10=0.00 avg60=0.00 avg300=0.00 total=1986722

Cgroup V2 PSI

If the host is using cgroup v2 and the process is a member of a cgroup, then we can get PSI information scoped to the tasks in the group.

cat /etc/mtab | grep cgroup
cgroup /sys/fs/cgroup cgroup2 ro,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0

ecf21fadb472:~# cat /proc/self/cgroup
0::/

cat /sys/fs/cgroup/{cpu,io,irq,memory}.pressure 
some avg10=0.00 avg60=0.00 avg300=0.00 total=41743102
full avg10=0.00 avg60=0.00 avg300=0.00 total=40889871
some avg10=0.01 avg60=0.09 avg300=0.03 total=63542837
full avg10=0.01 avg60=0.09 avg300=0.03 total=63432991
full avg10=0.00 avg60=0.00 avg300=0.00 total=64963754
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
full avg10=0.00 avg60=0.00 avg300=0.00 total=0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions