-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Closed
Labels
Description
Host operating system
Linux brian-kit 4.15.0-99-generic #100-Ubuntu SMP Wed Apr 22 20:32:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Hardware is an Intel Skull Canyon NUC
node_exporter version
node_exporter, version 1.0.0-rc.1 (branch: HEAD, revision: 3cedd344fd4ea8c1e6e1fb0854e824d5d8b2f24a)
build user: root@1e01740d5299
build date: 20200514-15:02:31
go version: go1.14.2
node_exporter command line flags
--collector.textfile.directory=/var/lib/node_exporter --collector.systemd
Are you running node_exporter in Docker?
No
What did you do that produced an error?
Scrape node_exporter looking for thermal metrics
What did you expect to see?
Thermal metrics such as:
node_thermal_zone_temp{type="XXX",zone="YYY"} 43
node_cooling_device_cur_state{name="0",type="Processor"} 0
What did you see instead?
No thermal metrics, and collector_success 0.
# curl -s localhost:9100/metrics | egrep 'therm|cool'
node_scrape_collector_duration_seconds{collector="thermal_zone"} 0.001691836
node_scrape_collector_success{collector="thermal_zone"} 0
node_systemd_unit_state{name="thermald.service",state="activating",type="dbus"} 0
node_systemd_unit_state{name="thermald.service",state="active",type="dbus"} 1
node_systemd_unit_state{name="thermald.service",state="deactivating",type="dbus"} 0
node_systemd_unit_state{name="thermald.service",state="failed",type="dbus"} 0
node_systemd_unit_state{name="thermald.service",state="inactive",type="dbus"} 0
Debug information
The scrape triggers node_exporter to generate a single debug log line:
May 14 17:32:13 brian-kit node_exporter[6434]: level=error ts=2020-05-14T16:32:13.754Z caller=collector.go:161 msg="collector failed" name=thermal_zone duration_seconds=0.001691836 err="read /sys/class/thermal/thermal_zone4/temp: no data available"
Now, this device does have thermal info, but there is an error when reading the final zone (zone4):
# ls /sys/class/thermal/
cooling_device0 cooling_device11 cooling_device14 cooling_device4 cooling_device7 thermal_zone0 thermal_zone3
cooling_device1 cooling_device12 cooling_device2 cooling_device5 cooling_device8 thermal_zone1 thermal_zone4
cooling_device10 cooling_device13 cooling_device3 cooling_device6 cooling_device9 thermal_zone2
# cat /sys/class/thermal/thermal_zone*/temp
27800
29800
41000
53000
cat: /sys/class/thermal/thermal_zone4/temp: No data available
# cat /sys/class/thermal/thermal_zone4/type
iwlwifi_1
# cat /sys/class/thermal/thermal_zone4/policy
step_wise
# cat /sys/class/thermal/thermal_zone4/temp
cat: /sys/class/thermal/thermal_zone4/temp: No data available
strace of node_exporter shows:
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone4/temp", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] <... openat resumed> ) = 18
[pid 6469] epoll_ctl(4, EPOLL_CTL_ADD, 18, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1146761160, u64=139935476232136}} <unfinished ...>
[pid 6469] <... epoll_ctl resumed> ) = 0
[pid 6469] fcntl(18, F_GETFL <unfinished ...>
[pid 6469] <... fcntl resumed> ) = 0x8000 (flags O_RDONLY|O_LARGEFILE)
[pid 6469] fcntl(18, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE <unfinished ...>
[pid 6469] <... fcntl resumed> ) = 0
[pid 6469] fstat(18, <unfinished ...>
[pid 6469] <... fstat resumed> {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
[pid 6469] read(18, <unfinished ...>
[pid 6469] <... read resumed> 0xc000420600, 4608) = -1 ENODATA (No data available)
[pid 6469] epoll_ctl(4, EPOLL_CTL_DEL, 18, 0xc00027a75c <unfinished ...>
[pid 6469] <... epoll_ctl resumed> ) = 0
[pid 6469] close(18 <unfinished ...>
[pid 6469] <... close resumed> ) = 0
[pid 6469] write(2, "level=error ts=2020-05-14T16:46:08.184Z caller=collector.go:161 msg=\"collector failed\" name=thermal_zone duration_seconds=0.0080"..., 202 <unfinished ...>
You can see that it is able to open the thermal_zone4/temp entry and gets a valid fd, but a read gives ENODATA.
It's therefore my working hypothesis that:
- This is a wifi card with a thermal zone but no temperature sensor
- This error is causing the entire collector to give up and not return the results it already collected before this point.
Aside: there's cooling info available too, and no errors:
# cat /sys/class/thermal/cooling_device*/cur_state
0
0
0
0
-1
0
0
0
0
0
0
0
0
0
0
However, strace shows no attempt to access those. grepping the strace output for /sys/class/thermal shows only:
[pid 6469] newfstatat(AT_FDCWD, "/sys/class/thermal", <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone0/type", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone0/policy", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone0/temp", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone0/mode", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone0/passive", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone1/type", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone1/policy", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone1/temp", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone1/mode", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone1/passive", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone2/type", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone2/policy", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone2/temp", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone2/mode", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone2/passive", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone3/type", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone3/policy", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone3/temp", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone3/mode", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone3/passive", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone4/type", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone4/policy", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 6469] openat(AT_FDCWD, "/sys/class/thermal/thermal_zone4/temp", O_RDONLY|O_CLOEXEC <unfinished ...>
Reactions are currently unavailable