Skip to content

feat: Add rocm_smi input to monitor AMD GPUs#9602

Merged
reimda merged 9 commits intoinfluxdata:masterfrom
mconcas:rocm-smi-plugin
Sep 2, 2021
Merged

feat: Add rocm_smi input to monitor AMD GPUs#9602
reimda merged 9 commits intoinfluxdata:masterfrom
mconcas:rocm-smi-plugin

Conversation

@mconcas
Copy link
Copy Markdown
Contributor

@mconcas mconcas commented Aug 7, 2021

Required for all PRs:

  • Updated associated README.md.
  • Wrote appropriate unit tests.
  • Pull request title or commits are in conventional commit format (e.g. feat: or fix:)

resolves #9601

Added an input plugin to gather AMD GPUs metrics using ROCm System Management Interface rocm-smi.

@telegraf-tiger telegraf-tiger bot added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin new plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Aug 7, 2021
@mconcas mconcas force-pushed the rocm-smi-plugin branch 9 times, most recently from 7fe0ca0 to 961218a Compare August 10, 2021 12:02
@mconcas
Copy link
Copy Markdown
Contributor Author

mconcas commented Aug 19, 2021

Hello, any feedback on this?

Copy link
Copy Markdown
Contributor

@Hipska Hipska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically, this is just inputs.exec + parsers.json into one? I don't see the need to add this plugin, while the same can already be achieved by setting some configurations.

@mconcas
Copy link
Copy Markdown
Contributor Author

mconcas commented Aug 31, 2021

So basically, this is just inputs.exec + parsers.json into one? I don't see the need to add this plugin, while the same can already be achieved by setting some configurations.

As a general comment: the approach I chose was heavily inspired to the nvidia-smi plugin that was already present in Telegraf, so I went straight with that, I admittedly was not aware of the possibility to combine plugins so I can't tell if the combination of plugins you mention is enough to fullfill the case.
A part from the parsing there is indeed some data digestion to produce indirect metrics and tag/fields names, in future iteration this can become more complex. I'll have a look into it.

@reimda
Copy link
Copy Markdown
Contributor

reimda commented Aug 31, 2021

So basically, this is just inputs.exec + parsers.json into one? I don't see the need to add this plugin, while the same can already be achieved by setting some configurations.

As a general comment: the approach I chose was heavily inspired to the nvidia-smi plugin that was already present in Telegraf, so I went straight with that, I admittedly was not aware of the possibility to combine plugins so I can't tell if the combination of plugins you mention is enough to fullfill the case.
A part from the parsing there is indeed some data digestion to produce indirect metrics and tag/fields names, in future iteration this can become more complex. I'll have a look into it.

I don't mind adding a plugin that just runs a program and parses the json if it's commonly used. Without a plugin for this, each user needs to go through the config process, including learning to use the exec and json plugins and coming up with a good metric format. AMD GPUs are very common so a dedicated plugin could be useful.

If you can quickly make the changes Hipska requested I think we can get this into the 1.20.0 release candidate which comes out Sept 1 (tomorrow).

@mconcas mconcas force-pushed the rocm-smi-plugin branch 3 times, most recently from 7c0cd85 to 053aea5 Compare August 31, 2021 21:44
@mconcas mconcas requested a review from reimda September 1, 2021 05:10
Copy link
Copy Markdown
Contributor

@Hipska Hipska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you still include an example metric in influx format please? (It could be with data from one of the tests if you like)

@reimda reimda merged commit 04c3e9b into influxdata:master Sep 2, 2021
@mconcas mconcas deleted the rocm-smi-plugin branch September 3, 2021 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin new plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add plugin to monitor AMD GPUs using ROCm System Management Interface rocm-smi

3 participants