Skip to content

fix: pool detection and metrics gathering for ZFS >= 2.1.x#10099

Merged
MyaLongmire merged 3 commits intoinfluxdata:masterfrom
aaronjwood:zfs2x
Dec 6, 2021
Merged

fix: pool detection and metrics gathering for ZFS >= 2.1.x#10099
MyaLongmire merged 3 commits intoinfluxdata:masterfrom
aaronjwood:zfs2x

Conversation

@aaronjwood
Copy link
Copy Markdown
Contributor

@aaronjwood aaronjwood commented Nov 12, 2021

Required for all PRs:

partially resolves #8862

I moved to ZFS 2.1.1 the other day and noticed my pool metrics broke. After a bit of investigation I found out that things have changed significantly under the hood. I have this diff deployed to one of my servers and can show it working:
Screenshot from 2021-11-12 12-44-09

@telegraf-tiger telegraf-tiger bot added the fix pr to fix corresponding bug label Nov 12, 2021
@aaronjwood
Copy link
Copy Markdown
Contributor Author

Did the lint rules change recently? It seems to be flagged a lot of things in the ZFS plugin README that I didn't touch.

@aaronjwood aaronjwood changed the title Fix pool detection and metrics gathering for ZFS >= 2.1.x fix (inputs/zfs): pool detection and metrics gathering for ZFS >= 2.1.x Nov 12, 2021
@srebhan
Copy link
Copy Markdown
Member

srebhan commented Nov 15, 2021

Hey @aaronjwood, thanks for looking into this. From your changes it looks like the PR is breaking backward compatibility, i.e. you cannot use a zfs < 2.1.x with the PR (e.g. by changing the proc-file path). Is my understanding correct? And if so, can we somehow test for the present zfs version to be compatible with both formats?

@aaronjwood
Copy link
Copy Markdown
Contributor Author

Hey @aaronjwood, thanks for looking into this. From your changes it looks like the PR is breaking backward compatibility, i.e. you cannot use a zfs < 2.1.x with the PR (e.g. by changing the proc-file path). Is my understanding correct? And if so, can we somehow test for the present zfs version to be compatible with both formats?

Hi, it should not. I fall back to checking the old kstat path if the new one doesn't exist. There is also a fallback for how the actual file contents are parsed. I've also added some tests that write out a kstat with the new format/filename to exercise the new code paths.

Do you see an issue with it?

Copy link
Copy Markdown
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronjwood I now see your approach and it's fine IMO. Sorry for having missed it in the first pass...

Given that the format changed I suspect/fear that this might happen again in the future. So I
wonder if we should be prepared and approach this in a slightly more general way. How about moving the probing to a probeVersion function returning a format identifier (e.g. 1, 2, 3...). Based on this version we have different gather..._vX() functions for X in {1,2,3,...} that are tailored to the format of this version. This way we would have a clear structure and avoid complex if/else flow handlings dependent on versions. If there is common code, we can factor that into more general functions parameterized by the version-specific functions. What do you think?

@aaronjwood
Copy link
Copy Markdown
Contributor Author

@aaronjwood I now see your approach and it's fine IMO. Sorry for having missed it in the first pass...

Given that the format changed I suspect/fear that this might happen again in the future. So I
wonder if we should be prepared and approach this in a slightly more general way. How about moving the probing to a probeVersion function returning a format identifier (e.g. 1, 2, 3...). Based on this version we have different gather..._vX() functions for X in {1,2,3,...} that are tailored to the format of this version. This way we would have a clear structure and avoid complex if/else flow handlings dependent on versions. If there is common code, we can factor that into more general functions parameterized by the version-specific functions. What do you think?

No worries. What you suggest sounds good to me. I'll try and find some time in the coming weeks to change this. Things are a bit tight with the holidays coming up :)

@srebhan srebhan self-assigned this Nov 17, 2021
@aaronjwood aaronjwood force-pushed the zfs2x branch 3 times, most recently from 090ab3c to dd60a24 Compare November 27, 2021 01:06
@aaronjwood aaronjwood changed the title fix (inputs/zfs): pool detection and metrics gathering for ZFS >= 2.1.x fix: pool detection and metrics gathering for ZFS >= 2.1.x Nov 27, 2021
@aaronjwood aaronjwood force-pushed the zfs2x branch 2 times, most recently from 995a6c5 to 1c77f17 Compare November 27, 2021 01:13
@aaronjwood
Copy link
Copy Markdown
Contributor Author

FWIW I'll try out the changes I pushed up just now over the weekend. I don't have the ability to put a new binary on my server at the moment :)

@aaronjwood
Copy link
Copy Markdown
Contributor Author

Yep, still looking good on my server.

Copy link
Copy Markdown
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an amazing update @aaronjwood! Some small leftovers, but we are almost good to go. I really love the new structure.

aaronjwood and others added 2 commits November 30, 2021 10:03
Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
@aaronjwood
Copy link
Copy Markdown
Contributor Author

Should be all set now :) Thanks for the reviews. Looking forward to having this upstreamed!

@telegraf-tiger
Copy link
Copy Markdown
Contributor

Copy link
Copy Markdown
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for the nice work @aaronjwood!

@srebhan srebhan added plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. labels Dec 1, 2021
Copy link
Copy Markdown
Contributor

@MyaLongmire MyaLongmire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this pr!

@MyaLongmire MyaLongmire merged commit 666bfe3 into influxdata:master Dec 6, 2021
MyaLongmire pushed a commit that referenced this pull request Dec 8, 2021
powersj pushed a commit to powersj/telegraf that referenced this pull request Jan 21, 2022
phemmer added a commit to phemmer/telegraf that referenced this pull request Feb 18, 2022
* origin/master: (133 commits)
  chore: restart service if it is already running and upgraded via RPM (influxdata#9970)
  feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10237)
  fix: Handle duplicate registration of protocol-buffer files gracefully. (influxdata#10188)
  fix(http_listener_v2): fix panic on close (influxdata#10132)
  feat: add Vault input plugin (influxdata#10198)
  feat: support aws managed service for prometheus (influxdata#10202)
  fix: Make telegraf compile on Windows with golang 1.16.2 (influxdata#10246)
  Update changelog
  feat: Modbus add per-request tags (influxdata#10231)
  fix: Implement NaN and inf handling for elasticsearch output (influxdata#10196)
  feat: add nomad input plugin (influxdata#10106)
  fix: Print loaded plugins and deprecations for once and test (influxdata#10205)
  fix: eliminate MIB dependency for ifname processor (influxdata#10214)
  feat: Optimize locking for SNMP MIBs loading. (influxdata#10206)
  feat: Add SMART plugin concurrency configuration option, nvme-cli v1.14+ support and lint fixes. (influxdata#10150)
  feat: update configs (influxdata#10236)
  fix(inputs/kube_inventory): set TLS server name config properly (influxdata#9975)
  fix: Sudden close of Telegraf caused by OPC UA input plugin (influxdata#10230)
  fix: bump github.com/eclipse/paho.mqtt.golang from 1.3.0 to 1.3.5 (influxdata#9913)
  fix: json_v2 parser timestamp setting (influxdata#10221)
  fix: ensure graylog spec fields not prefixed with '_' (influxdata#10209)
  docs: remove duplicate links in CONTRIBUTING.md (influxdata#10218)
  fix: pool detection and metrics gathering for ZFS >= 2.1.x (influxdata#10099)
  fix: parallelism fix for ifname processor (influxdata#10007)
  chore: Forbids "log" package only for aggregators, inputs, outputs, parsers and processors (influxdata#10191)
  docs: address documentation gap when running telegraf in k8s (influxdata#10215)
  feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10211)
  fix: mqtt topic extracting no longer requires all three fields (influxdata#10208)
  fix: windows service - graceful shutdown of telegraf (influxdata#9616)
  feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10201)
  feat: Modbus support multiple slaves (gateway feature) (influxdata#9279)
  fix: Revert unintented corruption of the Makefile from influxdata#10200. (influxdata#10203)
  chore: remove triggering update-config bot in CI (influxdata#10195)
  Update changelog
  feat: Implement deprecation infrastructure (influxdata#10200)
  fix: extra lock on init for safety (influxdata#10199)
  fix: resolve influxdata#10027 (influxdata#10112)
  fix: register bigquery to output plugins influxdata#10177 (influxdata#10178)
  fix: sysstat use unique temp file vs hard-coded (influxdata#10165)
  refactor: snmp to use gosmi (influxdata#9518)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix pr to fix corresponding bug plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Updated ZFS measurements for OpenZFS 2.0

3 participants