Describe the enhancement:
The system module collects information about the PIDs on a given host. However not all PIDs map to a "running binary". Some are kernel threads and some user-spaced threads. The former does not have as much information available as the latter. Metricbeat fails to report information about the kernel threads because the /proc/PID/exe does not exist for those.
Those metrics are collected by elastic-agent-system-metrics, the linux implementation is here:
https://github.com/elastic/elastic-agent-system-metrics/blob/e622b665e0661c6616b89c0f5d95962ee9dd916f/metric/system/process/process_linux_common.go#L238-L243
This is the error reported by metricbeat if running with debug logs
Error fetching PID info for 472325, skipping: FillPidMetrics: error getting metadata for pid 472325: error fetching exe from pid 472325: readlink /hostfs/proc/472325/exe: no such file or directory
{"@timestamp":"2024-08-15T08:22:08.169Z","metricbeat":"daemonset","agent":{"version":"8.15.0","ephemeral_id":"275ce895-3d08-43f4-8e6b-42a589b96500","id":"7117a0e0-b023-47df-b271-70fd15361e17","name":"crc-vlf7c-master-0","type":"filebeat"},"log.level":"debug","log.origin":{"function":"github.com/elastic/elastic-agent-system-metrics/metric/system/process.(*Stats).pidIter","file.name":"process/process.go","file.line":173},"log":{"offset":32871196,"file":{"inode":"69209548","fingerprint":"3a4f6d52dcc5163f2fbbf73cba16b6d8c37040afec62bfe866228eaf1f738933","path":"/var/log/containers/metricbeat-daemonset-vrzcc_kube-system_metricbeat-a2159eed71d09d88c79ab6b3b83c696d09d278189ed2ffa0d9c42b9c512f9d49.log","device_id":"64516"}},"input":{"type":"filestream"},"message":"Error fetching PID info for 472325, skipping: FillPidMetrics: error getting metadata for pid 472325: error fetching exe from pid 472325: readlink /hostfs/proc/472325/exe: no such file or directory","ecs":{"version":"8.0.0"},"host":{"mac":["02-6B-03-49-D1-CC","FA-AC-46-2C-DB-D4"],"hostname":"crc-vlf7c-master-0","architecture":"x86_64","os":{"codename":"focal","type":"linux","platform":"ubuntu","version":"20.04.6 LTS (Focal Fossa)","family":"debian","name":"Ubuntu","kernel":"5.14.0-284.52.1.el9_2.x86_64"},"containerized":true,"name":"crc-vlf7c-master-0","ip":["192.168.130.11","fe80::68cf:9e8e:13a3:51bb","192.168.126.11","fe80::10ee:f0ff:fe9e:4b64","10.217.0.1"]},"stream":"stderr","service.name":"metricbeat","ecs.version":"1.6.0","log.logger":"processes"}
This is not a permission issue, it's just what information is reported.
ps differentiate well between then:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1733172 0.0 0.0 0 0 ? I 15:50 0:00 [kworker/8:0]
myuser 1734838 1.8 0.3 1212381208 218784 ? Sl 15:50 0:11 /opt/brave.com/brave/brave --type=renderer --crashpad-handler-pid=1352615 --enable-crash-reporter=4817c36d-6e2e-46c1-8112
myuser 1736448 0.0 0.0 27376 11088 pts/0 SNs 15:51 0:00 /usr/bin/zsh
PROCESS STATE CODES
Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will display to describe the state of a process:
I Idle kernel thread
S interruptible sleep (waiting for an event to complete)
For BSD formats and when the stat keyword is used, additional characters may be displayed:
N low-priority (nice to other users)
s is a session leader
l is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)
without fist checking which type each process is, the same error is found:
root@vm-elastic:~# readlink -v /proc/1733172/exe
readlink: /proc/1733172/exe: No such file or directory
root@vm-elastic:~# readlink -v /proc/1734838/exe
/opt/brave.com/brave/brave (deleted)
root@vm-elastic:~# readlink -v /proc/1736448/exe
/usr/bin/zsh
checking first /proc/PID/stat might be the solution. Observer the I and S presented just as ps shows:
root@vm-elastic:~# cat /proc/1733172/stat
1733172 (kworker/8:0) I 2 0 0 0 -1 69238880 0 0 0 0 0 0 0 0 20 0 1 0 61771418 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 8 0 0 0 0 0 0 0 0 0 0 0 0 0
root@vm-elastic:~# cat /proc/1736448/stat
1736448 (zsh) S 1383728 1736448 1736448 34816 1739969 4194304 10330 53068 0 3 17 6 21 28 26 6 1 0 61777492 28033024 2770 18446744073709551615 97814444871680 97814445646726 140736763118816 0 0 0 2 3686400 134295555 1 0 0 17 10 0 0 0 0 0 97814445763296 97814445792492 97815484731392 140736763123242 140736763123255 140736763123255 140736763125739 0
Some resources this:
Describe a specific use case for the enhancement or feature:
Metricbeat would correctly show process information for all process running, just like ps does.
Describe the enhancement:
The system module collects information about the PIDs on a given host. However not all PIDs map to a "running binary". Some are kernel threads and some user-spaced threads. The former does not have as much information available as the latter. Metricbeat fails to report information about the kernel threads because the
/proc/PID/exedoes not exist for those.Those metrics are collected by elastic-agent-system-metrics, the linux implementation is here:
https://github.com/elastic/elastic-agent-system-metrics/blob/e622b665e0661c6616b89c0f5d95962ee9dd916f/metric/system/process/process_linux_common.go#L238-L243
This is the error reported by metricbeat if running with debug logs
This is not a permission issue, it's just what information is reported.
psdifferentiate well between then:without fist checking which type each process is, the same error is found:
checking first
/proc/PID/statmight be the solution. Observer theIandSpresented just aspsshows:Some resources this:
Describe a specific use case for the enhancement or feature:
Metricbeat would correctly show process information for all process running, just like
psdoes.