-
-
Notifications
You must be signed in to change notification settings - Fork 145
Collecting hypervisor metrics from Nova fails when a new Ironic node is enrolled but not fully set up #428
Description
We have been seeing gaps in gathering these metrics about hypervisors:
openstack-exporter/exporters/nova.go
Lines 195 to 227 in a993770
| for _, hypervisor := range allHypervisors { | |
| availabilityZone := "" | |
| if val, ok := hostToAzMap[hypervisor.Service.Host]; ok { | |
| availabilityZone = val | |
| } | |
| ch <- prometheus.MustNewConstMetric(exporter.Metrics["running_vms"].Metric, | |
| prometheus.GaugeValue, float64(hypervisor.RunningVMs), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap)) | |
| ch <- prometheus.MustNewConstMetric(exporter.Metrics["current_workload"].Metric, | |
| prometheus.GaugeValue, float64(hypervisor.CurrentWorkload), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap)) | |
| ch <- prometheus.MustNewConstMetric(exporter.Metrics["vcpus_available"].Metric, | |
| prometheus.GaugeValue, float64(hypervisor.VCPUs), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap)) | |
| ch <- prometheus.MustNewConstMetric(exporter.Metrics["vcpus_used"].Metric, | |
| prometheus.GaugeValue, float64(hypervisor.VCPUsUsed), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap)) | |
| ch <- prometheus.MustNewConstMetric(exporter.Metrics["memory_available_bytes"].Metric, | |
| prometheus.GaugeValue, float64(hypervisor.MemoryMB*MEGABYTE), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap)) | |
| ch <- prometheus.MustNewConstMetric(exporter.Metrics["memory_used_bytes"].Metric, | |
| prometheus.GaugeValue, float64(hypervisor.MemoryMBUsed*MEGABYTE), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap)) | |
| ch <- prometheus.MustNewConstMetric(exporter.Metrics["local_storage_available_bytes"].Metric, | |
| prometheus.GaugeValue, float64(hypervisor.LocalGB*GIGABYTE), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap)) | |
| ch <- prometheus.MustNewConstMetric(exporter.Metrics["local_storage_used_bytes"].Metric, | |
| prometheus.GaugeValue, float64(hypervisor.LocalGBUsed*GIGABYTE), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap)) | |
| ch <- prometheus.MustNewConstMetric(exporter.Metrics["free_disk_bytes"].Metric, | |
| prometheus.GaugeValue, float64(hypervisor.FreeDiskGB*GIGABYTE), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap)) | |
| } |
The exporter was reporting the following error:
exporter=nova error="failed to collect metric: running_vms, error: Free disk GB has unexpected type: <nil>"
(Raised from here: https://github.com/gophercloud/gophercloud/blob/d03d5d1008765750d755d14236bcad23cac90c8a/openstack/compute/v2/hypervisors/results.go#L208)
Gophercloud is making essentially the following request to Nova to gather stats for each hypervisor:
curl -H "X-Auth-Token: $token" https://<vip>:8774/v2.1/os-hypervisors/detail
There is one hypervisor in this which indeed has free_disk_gb: null:
{
"id": 1240,
"hypervisor_hostname": "bab0a652-2260-467d-b2cf-b760053a3257",
"state": "up",
"status": "enabled",
"hypervisor_type": "ironic",
"hypervisor_version": 1,
"host_ip": "<omitted>",
"service": {
"id": 61,
"host": "controller3-ironic",
"disabled_reason": null
},
"vcpus": 0,
"memory_mb": 0,
"local_gb": 0,
"vcpus_used": 0,
"memory_mb_used": 0,
"local_gb_used": 0,
"free_ram_mb": null,
"free_disk_gb": null,
"current_workload": null,
"running_vms": null,
"disk_available_least": 0,
"cpu_info": ""
}
This is a baremetal node currently in enroll state:
openstack hypervisor show bab0a652-2260-467d-b2cf-b760053a3257
+---------------------+--------------------------------------+
| Field | Value |
+---------------------+--------------------------------------+
| aggregates | [] |
| cpu_info | None |
| host_ip | <omitted> |
| hypervisor_hostname | bab0a652-2260-467d-b2cf-b760053a3257 |
| hypervisor_type | ironic |
| hypervisor_version | 1 |
| id | bab0a652-2260-467d-b2cf-b760053a3257 |
| service_host | controller3-ironic |
| service_id | 9a9752e6-d6ce-474e-a60c-edacf6f7b0d6 |
| state | up |
| status | enabled |
+---------------------+--------------------------------------+
openstack baremetal node show bab0a652-2260-467d-b2cf-b760053a3257 -c provision_state
+-----------------+--------+
| Field | Value |
+-----------------+--------+
| provision_state | enroll |
+-----------------+--------+
Essentially whenever we have a baremetal node in enroll state, we are unable to get metrics for the hypervisors from openstack-exporter.
I'm aware that these metrics are technically deprecated by Nova and Placement should be used for gathering hypervisor resources instead. Mainly raising this bug for visibility.