Skip to content

Collecting hypervisor metrics from Nova fails when a new Ironic node is enrolled but not fully set up #428

@MoteHue

Description

@MoteHue

We have been seeing gaps in gathering these metrics about hypervisors:

for _, hypervisor := range allHypervisors {
availabilityZone := ""
if val, ok := hostToAzMap[hypervisor.Service.Host]; ok {
availabilityZone = val
}
ch <- prometheus.MustNewConstMetric(exporter.Metrics["running_vms"].Metric,
prometheus.GaugeValue, float64(hypervisor.RunningVMs), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap))
ch <- prometheus.MustNewConstMetric(exporter.Metrics["current_workload"].Metric,
prometheus.GaugeValue, float64(hypervisor.CurrentWorkload), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap))
ch <- prometheus.MustNewConstMetric(exporter.Metrics["vcpus_available"].Metric,
prometheus.GaugeValue, float64(hypervisor.VCPUs), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap))
ch <- prometheus.MustNewConstMetric(exporter.Metrics["vcpus_used"].Metric,
prometheus.GaugeValue, float64(hypervisor.VCPUsUsed), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap))
ch <- prometheus.MustNewConstMetric(exporter.Metrics["memory_available_bytes"].Metric,
prometheus.GaugeValue, float64(hypervisor.MemoryMB*MEGABYTE), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap))
ch <- prometheus.MustNewConstMetric(exporter.Metrics["memory_used_bytes"].Metric,
prometheus.GaugeValue, float64(hypervisor.MemoryMBUsed*MEGABYTE), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap))
ch <- prometheus.MustNewConstMetric(exporter.Metrics["local_storage_available_bytes"].Metric,
prometheus.GaugeValue, float64(hypervisor.LocalGB*GIGABYTE), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap))
ch <- prometheus.MustNewConstMetric(exporter.Metrics["local_storage_used_bytes"].Metric,
prometheus.GaugeValue, float64(hypervisor.LocalGBUsed*GIGABYTE), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap))
ch <- prometheus.MustNewConstMetric(exporter.Metrics["free_disk_bytes"].Metric,
prometheus.GaugeValue, float64(hypervisor.FreeDiskGB*GIGABYTE), hypervisor.HypervisorHostname, availabilityZone, aggregatesLabel(hypervisor.Service.Host, hostToAggrMap))
}

The exporter was reporting the following error:
exporter=nova error="failed to collect metric: running_vms, error: Free disk GB has unexpected type: <nil>"
(Raised from here: https://github.com/gophercloud/gophercloud/blob/d03d5d1008765750d755d14236bcad23cac90c8a/openstack/compute/v2/hypervisors/results.go#L208)

Gophercloud is making essentially the following request to Nova to gather stats for each hypervisor:

curl -H "X-Auth-Token: $token" https://<vip>:8774/v2.1/os-hypervisors/detail

There is one hypervisor in this which indeed has free_disk_gb: null:

{
      "id": 1240,
      "hypervisor_hostname": "bab0a652-2260-467d-b2cf-b760053a3257",
      "state": "up",
      "status": "enabled",
      "hypervisor_type": "ironic",
      "hypervisor_version": 1,
      "host_ip": "<omitted>",
      "service": {
        "id": 61,
        "host": "controller3-ironic",
        "disabled_reason": null
      },
      "vcpus": 0,
      "memory_mb": 0,
      "local_gb": 0,
      "vcpus_used": 0,
      "memory_mb_used": 0,
      "local_gb_used": 0,
      "free_ram_mb": null,
      "free_disk_gb": null,
      "current_workload": null,
      "running_vms": null,
      "disk_available_least": 0,
      "cpu_info": ""
    }

This is a baremetal node currently in enroll state:

openstack hypervisor show bab0a652-2260-467d-b2cf-b760053a3257
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| aggregates          | []                                   |
| cpu_info            | None                                 |
| host_ip             | <omitted>                            |
| hypervisor_hostname | bab0a652-2260-467d-b2cf-b760053a3257 |
| hypervisor_type     | ironic                               |
| hypervisor_version  | 1                                    |
| id                  | bab0a652-2260-467d-b2cf-b760053a3257 |
| service_host        | controller3-ironic                   |
| service_id          | 9a9752e6-d6ce-474e-a60c-edacf6f7b0d6 |
| state               | up                                   |
| status              | enabled                              |
+---------------------+--------------------------------------+
openstack baremetal node show bab0a652-2260-467d-b2cf-b760053a3257 -c provision_state
+-----------------+--------+
| Field           | Value  |
+-----------------+--------+
| provision_state | enroll |
+-----------------+--------+

Essentially whenever we have a baremetal node in enroll state, we are unable to get metrics for the hypervisors from openstack-exporter.

I'm aware that these metrics are technically deprecated by Nova and Placement should be used for gathering hypervisor resources instead. Mainly raising this bug for visibility.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions