-
Notifications
You must be signed in to change notification settings - Fork 10.2k
Description
What did you do?
I need to monitor the traffic changes of the infiniband ports on the node. However, I noticed that the value returned by the rate function is different from the value calculated from the system files.
I found that the two values(system data、 from promql data) differ by about 5 times.
I think system data is correct
Could you please help me understand the reason for this? Or is it possible that I am using the wrong formula?
What did you expect to see?
The data obtained by promql should be not much different from the system data
What did you see instead? Under which circumstances?
first, Let's see their current value :
# cat /sys/class/infiniband/mlx5_0/ports/1/counters/port_rcv_data
59969305956875996930595687 * 4 ≈ 23987722386204,They're very close。
But When I use prometheus functions, They are very different.
Data obtained using the rate function in prometheus web UI
Script used to fetch system data:
Running below script in my node 10.10.1.84 :
while true; do
RX1=$(cat /sys/class/infiniband/mlx5_0/ports/1/counters/port_rcv_data)
TX1=$(cat /sys/class/infiniband/mlx5_0/ports/1/counters/port_xmit_data)
sleep 1
RX2=$(cat /sys/class/infiniband/mlx5_0/ports/1/counters/port_rcv_data)
TX2=$(cat /sys/class/infiniband/mlx5_0/ports/1/counters/port_xmit_data)
RX_RATE=$(( (RX2 - RX1) * 4 )) # 1 word = 4 bytes
TX_RATE=$(( (TX2 - TX1) * 4 ))
echo "RX: $((RX_RATE * 8 / (1024*1024*1024))) Gbps, TX: $((TX_RATE * 8 / (1024*1024*1024))) Gbps"
done
RX: 18 Gbps, TX: 18 Gbps
RX: 18 Gbps, TX: 18 Gbps
RX: 20 Gbps, TX: 20 Gbps
RX: 22 Gbps, TX: 22 Gbps
RX: 20 Gbps, TX: 20 GbpsI found that the two values(system data、 promql) differ by about 5 times.
System information
PRETTY_NAME="Ubuntu 22.04.4 LTS"
Prometheus version
I use promtheus-operoter, and get verion info from promtheus web UI:
version 3.0.1
revision 1f56e8492c31a558ccea833027db4bd7f8b6d0e9
branch HEAD
buildUser root@9c13055ffc3c
buildDate 20241128-17:20:55
goVersion go1.23.3
platform: linux/amd64
Prometheus configuration file
open-telemetry-collector config file:
...
scrape_configs:
- job_name: node-exporter
scrape_interval: 2s
static_configs:
- targets: [ 10.10.1.84:9100 ] # 修改处
labels:
host_ip: '10.10.1.84'
...ServiceMonitor:
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: open-telemetry-collector
namespace: monitoring
labels:
app: open-telemetry-collector
release: prometheus
spec:
selector:
matchLabels: #Service选择器
app: open-telemetry-collector
namespaceSelector: #Namespace选择器
matchNames:
- monitoring
endpoints:
- port: metrics #采集节点端口(svc定义)
interval: 5s #采集频率根据实际需求配置,prometheus默认15s
path: /metrics #默认地址/metrics