Skip to content

Feature request: offline CPU handling #873

@mjtrangoni

Description

@mjtrangoni

Host operating system: output of uname -a

Linux xxxx 3.10.0-693.2.2.el7.ppc64le #1 SMP Sat Sep 9 03:58:38 EDT 2017 ppc64le ppc64le ppc64le GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 0.16.0-rc.0 (branch: build, revision: 8ec35dfcd0aaa05b6039fc3c4bef7a675d419f6b)
  go version:       go1.10

node_exporter command line flags

default

Are you running node_exporter in Docker?

no

What did you do that produced an error?

none

What did you expect to see?

This PPC server has SMT=2 (Simultaneous multithreading) which can scale on-the-fly up to 8x.

# ppc64_cpu --smt
SMT=2
# ppc64_cpu --info
Core   0:    0*    1*    2     3     4     5     6     7
Core   1:    8*    9*   10    11    12    13    14    15
Core   2:   16*   17*   18    19    20    21    22    23
Core   3:   24*   25*   26    27    28    29    30    31
Core   4:   32*   33*   34    35    36    37    38    39
Core   5:   40*   41*   42    43    44    45    46    47
Core   6:   48*   49*   50    51    52    53    54    55
Core   7:   56*   57*   58    59    60    61    62    63
Core   8:   64*   65*   66    67    68    69    70    71
Core   9:   72*   73*   74    75    76    77    78    79
Core  10:   80*   81*   82    83    84    85    86    87
Core  11:   88*   89*   90    91    92    93    94    95
Core  12:   96*   97*   98    99   100   101   102   103
Core  13:  104*  105*  106   107   108   109   110   111
Core  14:  112*  113*  114   115   116   117   118   119
Core  15:  120*  121*  122   123   124   125   126   127
Core  16:  128*  129*  130   131   132   133   134   135
Core  17:  136*  137*  138   139   140   141   142   143
Core  18:  144*  145*  146   147   148   149   150   151
Core  19:  152*  153*  154   155   156   157   158   159

# lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                160
On-line CPU(s) list:   0,1,8,9,16,17,24,25,32,33,40,41,48,49,56,57,64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121,128,129,136,137,144,145,152,153
Off-line CPU(s) list:  2-7,10-15,18-23,26-31,34-39,42-47,50-55,58-63,66-71,74-79,82-87,90-95,98-103,106-111,114-119,122-127,130-135,138-143,146-151,154-159
Thread(s) per core:    2
Core(s) per socket:    5
Socket(s):             4
NUMA node(s):          4
Model:                 2.1 (pvr 004b 0201)
Model name:            POWER8E (raw), altivec supported
L1d cache:             64K
L1i cache:             32K
L2 cache:              512K
L3 cache:              8192K
NUMA node0 CPU(s):     0,1,8,9,16,17,24,25,32,33
NUMA node1 CPU(s):     40,41,48,49,56,57,64,65,72,73
NUMA node16 CPU(s):    80,81,88,89,96,97,104,105,112,113
NUMA node17 CPU(s):    120,121,128,129,136,137,144,145,152,153

# ppc64_cpu --smt=8
# ppc64_cpu --info
Core   0:    0*    1*    2*    3*    4*    5*    6*    7*
Core   1:    8*    9*   10*   11*   12*   13*   14*   15*
Core   2:   16*   17*   18*   19*   20*   21*   22*   23*
Core   3:   24*   25*   26*   27*   28*   29*   30*   31*
Core   4:   32*   33*   34*   35*   36*   37*   38*   39*
Core   5:   40*   41*   42*   43*   44*   45*   46*   47*
Core   6:   48*   49*   50*   51*   52*   53*   54*   55*
Core   7:   56*   57*   58*   59*   60*   61*   62*   63*
Core   8:   64*   65*   66*   67*   68*   69*   70*   71*
Core   9:   72*   73*   74*   75*   76*   77*   78*   79*
Core  10:   80*   81*   82*   83*   84*   85*   86*   87*
Core  11:   88*   89*   90*   91*   92*   93*   94*   95*
Core  12:   96*   97*   98*   99*  100*  101*  102*  103*
Core  13:  104*  105*  106*  107*  108*  109*  110*  111*
Core  14:  112*  113*  114*  115*  116*  117*  118*  119*
Core  15:  120*  121*  122*  123*  124*  125*  126*  127*
Core  16:  128*  129*  130*  131*  132*  133*  134*  135*
Core  17:  136*  137*  138*  139*  140*  141*  142*  143*
Core  18:  144*  145*  146*  147*  148*  149*  150*  151*
Core  19:  152*  153*  154*  155*  156*  157*  158*  159*
# lscpu 
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                160
On-line CPU(s) list:   0-159
Thread(s) per core:    8
Core(s) per socket:    5
Socket(s):             4
NUMA node(s):          4
Model:                 2.1 (pvr 004b 0201)
Model name:            POWER8E (raw), altivec supported
L1d cache:             64K
L1i cache:             32K
L2 cache:              512K
L3 cache:              8192K
NUMA node0 CPU(s):     0-39
NUMA node1 CPU(s):     40-79
NUMA node16 CPU(s):    80-119
NUMA node17 CPU(s):    120-159

In the 'SMT=2' case there are 960 metrics we could ignore (4 sockets * 5 cores * 6 (8-2) threads * 8 modes).

# curl -s localhost:9100/metrics | egrep -w -v -e '(HELP|TYPE)' | grep node_cpu_seconds_total | wc -l
1280

My feature request is to reduce the amount of CPU metrics. There are 2 alternatives that come to mind,

  1. Ignoring the offline CPUs in the node_exporter
  2. Introducing a new label, online="0|1", and filtering during Prometheus scrape process.

What did you want to see instead?

# curl -s localhost:9100/metrics | egrep -w -v -e '(HELP|TYPE)' | grep node_cpu_seconds_total | grep 'online=1 ' | wc -l
320

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions