Proposal: collect mdraid metrics from sysfs instead of parsing /proc/mdstat

## Proposal
I just discovered the that `node_md_blocks_synced` metric, which is currently parsed from `/proc/mdstat` is not very useful in trying to determine resync / check progress percentage of an array. To be honest, I'm not really sure what it can be used for, because the way in which it is parsed has little bearing to the other metrics that are parsed from `/proc/mdstat`.

Take the following example of an array rebuilding:
```
md300 : active raid5 sdh[8] sdc[0] sdj[7] sdi[6] sdg[4] sdf[3] sde[2] sdd[1]
      3281959800 blocks super 1.0 level 5, 8k chunk, algorithm 2 [8/7] [UUUUU_UU]
      [=====>...............]  recovery = 26.4% (124074752/468851400) finish=207.9min speed=27635K/sec
      bitmap: 3/4 pages [12KB], 65536KB chunk
```
Parsing this, node_exporter provides me with two _nearly useful_ metrics:
node_md_blocks = 3281959800
node_md_blocks_synced = 124074752

One might be forgiven for thinking that one can simply divide node_md_blocks_synced by node_md_blocks and multiply by 100 to get a resync completion percentage. The problem is however that node_md_blocks is the whole array size expressed in 1 KB blocks (this is a ~3.1 TB array), whereas node_md_blocks_synced is the number of synced 1 KB blocks of the _drive which is being rebuilt_., i.e. one seventh of the of entire array (raid5, 8 members).

Dividing node_md_blocks_synced by node_md_blocks is going to yield a number that is seven times too small, and the only way you could possibly know that you need to multiply _that_ number by seven would be if you knew a) the raid level, and b) the number of raid members.

The mdraid information contained in sysfs is a lot more detailed, and a lot easier to parse. The information that one would need to accurately calculate current rebuild progress can be read from `/sys/block/md*/md/sync_completed`, e.g.:
```
$ cat /sys/block/md300/md/sync_completed 
248115968 / 937702800
```
These numbers are in sectors, and the [md admin guide](https://www.kernel.org/doc/html/v5.11/admin-guide/md.html) literally says that you can divide these numbers to get a "fraction of the process that is complete."

**Use case. Why is this important?**
The mdraid information in sysfs is substantially more detailed and more machine-readable than `/proc/mdstat`. I don't know whether there are any guarantees about the stability of the format of `/proc/mdstat`, and I've never seen any indication that it is intended to be machine-readable; on the contrary, it is formatted to be human-friendly.

The only information from `/proc/mdstat` which I don't think can be found in sysfs is the very first line, e.g. `Personalities : [raid0] [raid1] [raid6] [raid5] [raid4]`, but node_exporter does not expose that information anyway. Everything else can be found in `/sys/block/md*/md/*`.

I am prepared to provide a PR which would completely refactor the mdadm collector (with a healthy dose of unit tests), if the Prometheus developers are interested. Ultimately I suspect this should find its way into the prometheus/procfs package, alongside the other sysfs parsers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: collect mdraid metrics from sysfs instead of parsing /proc/mdstat #1085

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: collect mdraid metrics from sysfs instead of parsing /proc/mdstat #1085

Description

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions