-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
I'd love to see the CPU flags exported. /proc/cpuinfo that is already used to export a bunch of metrics has flags and bugs fields:
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid rdseed adx smap xsaveopt arat md_clear flush_l1d arch_capabilities
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
I guess a pragmatic solution would be to add each of those as a (potentially long) label to node_cpu_info, or add a new node_cpu_flags if we do not want to blow up the info metrics.
Why? Observation shows that even configuration and cluster management had specific CPU flags not enabled for VMs or even physical servers in some cases. Running without AVX or virtualization features enabled results in very weird performance issues depending on where your application is scheduled.
This will add a none or a single time series per node, and even fewer new key/value sets per Prometheus server -- this is expected to be the same for at least large sets of servers per cluster in usual setups...
I'd go hands on for this and provide the implementation, but wanted to check on agreement first.