Skip to content

Bug: collector.arp.netlink is super slow to retrieve metrics #3075

@maxime1907

Description

@maxime1907

Host operating system: output of uname -a

Linux node 6.1.96-flatcar #1 SMP PREEMPT_DYNAMIC Mon Jul  1 23:29:55 -00 2024 x86_64 AMD EPYC 9654 96-Core Processor AuthenticAMD GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.8.1 (branch: HEAD, revision: 400c3979931613db930ea035f39ce7b377cdbb5b)
  build user:       root@7afbff271a3f
  build date:       20240521-18:36:22
  go version:       go1.22.3
  platform:         linux/amd64
  tags:             unknown

node_exporter command line flags

            - '--path.procfs=/host/proc'
            - '--path.sysfs=/host/sys'
            - '--path.rootfs=/host/root'
            - '--path.udev.data=/host/root/run/udev/data'
            - '--web.listen-address=[$(HOST_IP)]:9100'
            - '--collector.textfile.directory=/var/lib/node_exporter'
            - '--no-collector.ipvs'
            - '--collector.netclass.netlink'

node_exporter log output

ts=2024-07-17T14:32:56.130Z caller=node_exporter.go:193 level=info msg="Starting node_exporter" version="(version=1.8.1, branch=HEAD, revision=400c3979931613db930ea035f39ce7b377cdbb5b)"
ts=2024-07-17T14:32:56.131Z caller=node_exporter.go:194 level=info msg="Build context" build_context="(go=go1.22.3, platform=linux/amd64, user=root@7afbff271a3f, date=20240521-18:36:22, tags=unknown)"
ts=2024-07-17T14:32:56.133Z caller=filesystem_common.go:111 level=info collector=filesystem msg="Parsed flag --collector.filesystem.mount-points-exclude" flag=^/(dev|proc|run/credentials/.+|sys|var/lib/docker/.+|var/lib/containers/storage/.+)($|/)
ts=2024-07-17T14:32:56.133Z caller=filesystem_common.go:113 level=info collector=filesystem msg="Parsed flag --collector.filesystem.fs-types-exclude" flag=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
ts=2024-07-17T14:32:56.135Z caller=diskstats_common.go:111 level=info collector=diskstats msg="Parsed flag --collector.diskstats.device-exclude" flag=^(z?ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p)\d+$
ts=2024-07-17T14:32:56.138Z caller=node_exporter.go:111 level=info msg="Enabled collectors"
ts=2024-07-17T14:32:56.138Z caller=node_exporter.go:118 level=info collector=arp
ts=2024-07-17T14:32:56.138Z caller=node_exporter.go:118 level=info collector=bcache
ts=2024-07-17T14:32:56.138Z caller=node_exporter.go:118 level=info collector=bonding
ts=2024-07-17T14:32:56.138Z caller=node_exporter.go:118 level=info collector=btrfs
ts=2024-07-17T14:32:56.138Z caller=node_exporter.go:118 level=info collector=conntrack
ts=2024-07-17T14:32:56.138Z caller=node_exporter.go:118 level=info collector=cpu
ts=2024-07-17T14:32:56.138Z caller=node_exporter.go:118 level=info collector=cpufreq
ts=2024-07-17T14:32:56.138Z caller=node_exporter.go:118 level=info collector=diskstats
ts=2024-07-17T14:32:56.138Z caller=node_exporter.go:118 level=info collector=dmi
ts=2024-07-17T14:32:56.138Z caller=node_exporter.go:118 level=info collector=edac
ts=2024-07-17T14:32:56.138Z caller=node_exporter.go:118 level=info collector=entropy
ts=2024-07-17T14:32:56.138Z caller=node_exporter.go:118 level=info collector=fibrechannel
ts=2024-07-17T14:32:56.138Z caller=node_exporter.go:118 level=info collector=filefd
ts=2024-07-17T14:32:56.138Z caller=node_exporter.go:118 level=info collector=filesystem
ts=2024-07-17T14:32:56.138Z caller=node_exporter.go:118 level=info collector=hwmon
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=infiniband
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=loadavg
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=mdadm
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=meminfo
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=netclass
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=netdev
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=netstat
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=nfs
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=nfsd
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=nvme
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=os
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=powersupplyclass
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=pressure
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=rapl
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=schedstat
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=selinux
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=sockstat
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=softnet
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=stat
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=tapestats
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=textfile
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=thermal_zone
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=time
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=timex
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=udp_queues
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=uname
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=vmstat
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=watchdog
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=xfs
ts=2024-07-17T14:32:56.139Z caller=node_exporter.go:118 level=info collector=zfs
ts=2024-07-17T14:32:56.182Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9100
ts=2024-07-17T14:32:56.182Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9100

Are you running node_exporter in Docker?

No, we use containerd directly

What did you do that produced an error?

Run node exporter which uses by default the collector collector.arp.netlink

What did you expect to see?

Normal scrape times like 300ms

What did you see instead?

Scrape times of more than 20 seconds

Here is a graph of before and after the deactivation of the arp netlink collector with this flag:

            - '--no-collector.arp.netlink'

image

Here is the output of pprof with:

go tool pprof -svg http://localhost:9100/debug/pprof/profile > node_exporter.svg

node_exporter

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions