Skip to content

[metricbeat/system module] should differentiate kernel and user-space threads #40537

@AndersonQ

Description

@AndersonQ

Describe the enhancement:

The system module collects information about the PIDs on a given host. However not all PIDs map to a "running binary". Some are kernel threads and some user-spaced threads. The former does not have as much information available as the latter. Metricbeat fails to report information about the kernel threads because the /proc/PID/exe does not exist for those.

Those metrics are collected by elastic-agent-system-metrics, the linux implementation is here:

https://github.com/elastic/elastic-agent-system-metrics/blob/e622b665e0661c6616b89c0f5d95962ee9dd916f/metric/system/process/process_linux_common.go#L238-L243

This is the error reported by metricbeat if running with debug logs

Error fetching PID info for 472325, skipping: FillPidMetrics: error getting metadata for pid 472325: error fetching exe from pid 472325: readlink /hostfs/proc/472325/exe: no such file or directory

{"@timestamp":"2024-08-15T08:22:08.169Z","metricbeat":"daemonset","agent":{"version":"8.15.0","ephemeral_id":"275ce895-3d08-43f4-8e6b-42a589b96500","id":"7117a0e0-b023-47df-b271-70fd15361e17","name":"crc-vlf7c-master-0","type":"filebeat"},"log.level":"debug","log.origin":{"function":"github.com/elastic/elastic-agent-system-metrics/metric/system/process.(*Stats).pidIter","file.name":"process/process.go","file.line":173},"log":{"offset":32871196,"file":{"inode":"69209548","fingerprint":"3a4f6d52dcc5163f2fbbf73cba16b6d8c37040afec62bfe866228eaf1f738933","path":"/var/log/containers/metricbeat-daemonset-vrzcc_kube-system_metricbeat-a2159eed71d09d88c79ab6b3b83c696d09d278189ed2ffa0d9c42b9c512f9d49.log","device_id":"64516"}},"input":{"type":"filestream"},"message":"Error fetching PID info for 472325, skipping: FillPidMetrics: error getting metadata for pid 472325: error fetching exe from pid 472325: readlink /hostfs/proc/472325/exe: no such file or directory","ecs":{"version":"8.0.0"},"host":{"mac":["02-6B-03-49-D1-CC","FA-AC-46-2C-DB-D4"],"hostname":"crc-vlf7c-master-0","architecture":"x86_64","os":{"codename":"focal","type":"linux","platform":"ubuntu","version":"20.04.6 LTS (Focal Fossa)","family":"debian","name":"Ubuntu","kernel":"5.14.0-284.52.1.el9_2.x86_64"},"containerized":true,"name":"crc-vlf7c-master-0","ip":["192.168.130.11","fe80::68cf:9e8e:13a3:51bb","192.168.126.11","fe80::10ee:f0ff:fe9e:4b64","10.217.0.1"]},"stream":"stderr","service.name":"metricbeat","ecs.version":"1.6.0","log.logger":"processes"}

This is not a permission issue, it's just what information is reported.

ps differentiate well between then:

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     1733172  0.0  0.0      0     0 ?        I    15:50   0:00 [kworker/8:0]
myuser  1734838  1.8  0.3 1212381208 218784 ?   Sl   15:50   0:11 /opt/brave.com/brave/brave --type=renderer --crashpad-handler-pid=1352615 --enable-crash-reporter=4817c36d-6e2e-46c1-8112
myuser  1736448  0.0  0.0  27376 11088 pts/0    SNs  15:51   0:00 /usr/bin/zsh
PROCESS STATE CODES
       Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will display to describe the state of a process:

               I    Idle kernel thread
               S    interruptible sleep (waiting for an event to complete)

       For BSD formats and when the stat keyword is used, additional characters may be displayed:

               N    low-priority (nice to other users)
               s    is a session leader
               l    is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)

without fist checking which type each process is, the same error is found:

root@vm-elastic:~# readlink -v /proc/1733172/exe
readlink: /proc/1733172/exe: No such file or directory

root@vm-elastic:~# readlink -v /proc/1734838/exe
/opt/brave.com/brave/brave (deleted)

root@vm-elastic:~# readlink -v /proc/1736448/exe
/usr/bin/zsh

checking first /proc/PID/stat might be the solution. Observer the I and S presented just as ps shows:

root@vm-elastic:~# cat /proc/1733172/stat
1733172 (kworker/8:0) I 2 0 0 0 -1 69238880 0 0 0 0 0 0 0 0 20 0 1 0 61771418 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 8 0 0 0 0 0 0 0 0 0 0 0 0 0

root@vm-elastic:~# cat /proc/1736448/stat
1736448 (zsh) S 1383728 1736448 1736448 34816 1739969 4194304 10330 53068 0 3 17 6 21 28 26 6 1 0 61777492 28033024 2770 18446744073709551615 97814444871680 97814445646726 140736763118816 0 0 0 2 3686400 134295555 1 0 0 17 10 0 0 0 0 0 97814445763296 97814445792492 97815484731392 140736763123242 140736763123255 140736763123255 140736763125739 0

Some resources this:

Describe a specific use case for the enhancement or feature:

Metricbeat would correctly show process information for all process running, just like ps does.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions