fix: pool detection and metrics gathering for ZFS >= 2.1.x#10099
fix: pool detection and metrics gathering for ZFS >= 2.1.x#10099MyaLongmire merged 3 commits intoinfluxdata:masterfrom
Conversation
|
Did the lint rules change recently? It seems to be flagged a lot of things in the ZFS plugin README that I didn't touch. |
|
Hey @aaronjwood, thanks for looking into this. From your changes it looks like the PR is breaking backward compatibility, i.e. you cannot use a zfs < 2.1.x with the PR (e.g. by changing the proc-file path). Is my understanding correct? And if so, can we somehow test for the present zfs version to be compatible with both formats? |
Hi, it should not. I fall back to checking the old kstat path if the new one doesn't exist. There is also a fallback for how the actual file contents are parsed. I've also added some tests that write out a kstat with the new format/filename to exercise the new code paths. Do you see an issue with it? |
srebhan
left a comment
There was a problem hiding this comment.
@aaronjwood I now see your approach and it's fine IMO. Sorry for having missed it in the first pass...
Given that the format changed I suspect/fear that this might happen again in the future. So I
wonder if we should be prepared and approach this in a slightly more general way. How about moving the probing to a probeVersion function returning a format identifier (e.g. 1, 2, 3...). Based on this version we have different gather..._vX() functions for X in {1,2,3,...} that are tailored to the format of this version. This way we would have a clear structure and avoid complex if/else flow handlings dependent on versions. If there is common code, we can factor that into more general functions parameterized by the version-specific functions. What do you think?
No worries. What you suggest sounds good to me. I'll try and find some time in the coming weeks to change this. Things are a bit tight with the holidays coming up :) |
090ab3c to
dd60a24
Compare
995a6c5 to
1c77f17
Compare
|
FWIW I'll try out the changes I pushed up just now over the weekend. I don't have the ability to put a new binary on my server at the moment :) |
|
Yep, still looking good on my server. |
srebhan
left a comment
There was a problem hiding this comment.
This is an amazing update @aaronjwood! Some small leftovers, but we are almost good to go. I really love the new structure.
Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
|
Should be all set now :) Thanks for the reviews. Looking forward to having this upstreamed! |
|
📦 Looks like new artifacts were built from this PR. Expand this list to get them here ! 🐯Artifact URLs |
srebhan
left a comment
There was a problem hiding this comment.
Looks good to me. Thanks for the nice work @aaronjwood!
MyaLongmire
left a comment
There was a problem hiding this comment.
Thank you for this pr!
(cherry picked from commit 666bfe3)
* origin/master: (133 commits) chore: restart service if it is already running and upgraded via RPM (influxdata#9970) feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10237) fix: Handle duplicate registration of protocol-buffer files gracefully. (influxdata#10188) fix(http_listener_v2): fix panic on close (influxdata#10132) feat: add Vault input plugin (influxdata#10198) feat: support aws managed service for prometheus (influxdata#10202) fix: Make telegraf compile on Windows with golang 1.16.2 (influxdata#10246) Update changelog feat: Modbus add per-request tags (influxdata#10231) fix: Implement NaN and inf handling for elasticsearch output (influxdata#10196) feat: add nomad input plugin (influxdata#10106) fix: Print loaded plugins and deprecations for once and test (influxdata#10205) fix: eliminate MIB dependency for ifname processor (influxdata#10214) feat: Optimize locking for SNMP MIBs loading. (influxdata#10206) feat: Add SMART plugin concurrency configuration option, nvme-cli v1.14+ support and lint fixes. (influxdata#10150) feat: update configs (influxdata#10236) fix(inputs/kube_inventory): set TLS server name config properly (influxdata#9975) fix: Sudden close of Telegraf caused by OPC UA input plugin (influxdata#10230) fix: bump github.com/eclipse/paho.mqtt.golang from 1.3.0 to 1.3.5 (influxdata#9913) fix: json_v2 parser timestamp setting (influxdata#10221) fix: ensure graylog spec fields not prefixed with '_' (influxdata#10209) docs: remove duplicate links in CONTRIBUTING.md (influxdata#10218) fix: pool detection and metrics gathering for ZFS >= 2.1.x (influxdata#10099) fix: parallelism fix for ifname processor (influxdata#10007) chore: Forbids "log" package only for aggregators, inputs, outputs, parsers and processors (influxdata#10191) docs: address documentation gap when running telegraf in k8s (influxdata#10215) feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10211) fix: mqtt topic extracting no longer requires all three fields (influxdata#10208) fix: windows service - graceful shutdown of telegraf (influxdata#9616) feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10201) feat: Modbus support multiple slaves (gateway feature) (influxdata#9279) fix: Revert unintented corruption of the Makefile from influxdata#10200. (influxdata#10203) chore: remove triggering update-config bot in CI (influxdata#10195) Update changelog feat: Implement deprecation infrastructure (influxdata#10200) fix: extra lock on init for safety (influxdata#10199) fix: resolve influxdata#10027 (influxdata#10112) fix: register bigquery to output plugins influxdata#10177 (influxdata#10178) fix: sysstat use unique temp file vs hard-coded (influxdata#10165) refactor: snmp to use gosmi (influxdata#9518) ...
Required for all PRs:
partially resolves #8862
I moved to ZFS 2.1.1 the other day and noticed my pool metrics broke. After a bit of investigation I found out that things have changed significantly under the hood. I have this diff deployed to one of my servers and can show it working:
