Skip to content

Infiniband metrics not collected if irdma kernel module is loaded #2769

@evandhoffman

Description

@evandhoffman

Host operating system: output of uname -a

Linux xxxxx 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.6.1 (branch: HEAD, revision: 4a1b77600c1873a8233f3ffb55afcedbb63b8d84)
  build user:       root@586879db11e5
  build date:       20230717-12:10:52
  go version:       go1.20.6
  platform:         linux/amd64
  tags:             netgo osusergo static_build

node_exporter command line flags

/usr/local/bin/node_exporter --web.listen-address=0.0.0.0:9100 --collector.textfile.directory=/var/lib/prometheus/textfile --collector.buddyinfo --collector.processes --collector.ntp --collector.network_route

node_exporter log output

Are you running node_exporter in Docker?

No

What did you do that produced an error?

n/a

What did you expect to see?

Infiniband metrics, and node_scrape_collector_success would return 1

What did you see instead?

node_scrape_collector_success returns 0

More info

This seems like it might be a bug in the procfs package, since with irdma loaded, there are irdmaX devices in /sys/class/infiniband:

# ls -l /sys/class/infiniband/
total 0
lrwxrwxrwx 1 root root 0 Jul 25 01:25 irdma0 -> ../../devices/pci0000:80/0000:80:05.0/0000:82:00.0/infiniband/irdma0
lrwxrwxrwx 1 root root 0 Jul 25 01:25 irdma1 -> ../../devices/pci0000:80/0000:80:05.0/0000:82:00.1/infiniband/irdma1
lrwxrwxrwx 1 root root 0 Jul 25 01:25 mlx5_0 -> ../../devices/pci0000:15/0000:15:01.0/0000:16:00.0/0000:17:00.0/0000:18:00.0/infiniband/mlx5_0
lrwxrwxrwx 1 root root 0 Jul 25 01:25 mlx5_1 -> ../../devices/pci0000:24/0000:24:01.0/0000:25:00.0/0000:26:00.0/0000:27:00.0/0000:28:00.0/0000:29:00.0/infiniband/mlx5_1

Once I rmmod irdma these devices (irdma0, irdma1) no longer show up in /sys/class/infiniband, and node_exporter begins emitting infiniband metrics, and node_scrape_collector_success returns 1 as expected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions