Conversation
Looks like new artifacts were built from this PR. Get them here!Artifact URLs |
|
I am unsure of the progress of this PR - is it only the tests missing? |
|
@henriknoerr did you also test how long it takes to startup alone? The reason I'm asking is because if you have lots of unresponding snmp devices, it takes a while for telegraf to shut down since it will wait for timeouts and retries on those.. |
I did not know the shutdown process waited for all timeouts - I will look into this. |
390765c to
c91ce58
Compare
And yes indeed, I was blinded by this snmp MIB PR. Fixing retries and timeout values made a huge difference. |
|
If a node can't be translated it gives an error. For example |
|
Correct, but we have the OID, so telegraf can continue and do its work. |
|
I changed to error to be a warning so telegraf can continue doing its thing. I also added .999 back to the tests. Sorry for the confusion, I was miss understanding the problem :) |
|
That is super weird. I have no idea what happened. Dave and I will try and fix it. Sorry for the confusion :) |
34d34a7 to
0ea43b3
Compare
|
🥳 This pull request decreases the Telegraf binary size by -2.47 % for linux amd64 (new size: 127.5 MB, nightly size 130.7 MB) 📦 Looks like new artifacts were built from this PR. Expand this list to get them here! 🐯Artifact URLs |
|
Why can't it be added without changing functionality? |
…th variable, clean up init, update readme
|
@Hipska
Our solution was to have one global Hopefully, this solution puts your worries to rest about the global config paths :) |
Hipska
left a comment
There was a problem hiding this comment.
I would rename the filename of internal/snmp/mib.go into translate.go or smi.go which better fits the contained code.
Hipska
left a comment
There was a problem hiding this comment.
Perfect, seems ready to me. Don't forget to also revert the other section.
srebhan
left a comment
There was a problem hiding this comment.
Hey @MyaLongmire, I think LoadMibsFromPath requires proper locking and protection of gosmi.Init() to be executed only once.
| // must init, append path for each directory, load module for every file | ||
| // or gosmi will fail without saying why | ||
| func LoadMibsFromPath(paths []string, log telegraf.Logger) error { | ||
| gosmi.Init() |
There was a problem hiding this comment.
AFAICS, this function is called from each inputs.snmp plugin. Is this correct? If so, this means that you call gosmi.Init() multiple time with gosmi having a global state if my interpretation is correct...
If this is the case, chances are high that you get side-effects between the different plugin-instances potentially ranging from race-conditions when adding the modules to mutual overwriting of the loaded plugins when Init() is called multiple times... To reduce those effects you should init gosmi only once (using sync.Once) and guard this loading function with a lock. I don't know SNMP, but let's hope that there are no collisions possible between multiple (contradicting) mib-definitions...
There was a problem hiding this comment.
we do call gosmi.Init more than once but the library has a built-in check to see if it has already been initialized or not.
There was a problem hiding this comment.
I have also added a mutex
There was a problem hiding this comment.
Hey @MyaLongmire, I don't see any check... When calling gosmi.Init() it calls this function which in turn calls smi.Init() that can be found here. Already at this point, multiple files are reread etc. Additionally internal.Init() is called also modifying data. While I agree that currently nothing "critical" is changed there it won't hurt to be on the safe side by
| gosmi.Init() | |
| once.Do(gosmi.Init) |
and place var once sync.Once in the package scope, to be sure we call that function only once at least in that package. It would be even better to make sure it's not called elsewhere, but that's probably out of scope for this PR...
There was a problem hiding this comment.
gosmi.Init() always calls smi.Init("gosmi") which always calls internal.Init(handleName) with the same handleName. That function does check whether it already has an internal handle of that name. That's why it looked to me like calling gosmi.Init() multiple times is safe.
It's a good idea to be defensive in our code and add sync.Once() anyway. I'm not sure gosmi intends it to be safe to call gosmi.Init() multiple times. I didn't see a guarantee in the docs, so even if it is now it might not be in a future version of gosmi.
|
📦 Looks like new artifacts were built from this PR. Expand this list to get them here ! 🐯Artifact URLs |
This reverts commit 7675ce6.
* origin/master: (133 commits) chore: restart service if it is already running and upgraded via RPM (influxdata#9970) feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10237) fix: Handle duplicate registration of protocol-buffer files gracefully. (influxdata#10188) fix(http_listener_v2): fix panic on close (influxdata#10132) feat: add Vault input plugin (influxdata#10198) feat: support aws managed service for prometheus (influxdata#10202) fix: Make telegraf compile on Windows with golang 1.16.2 (influxdata#10246) Update changelog feat: Modbus add per-request tags (influxdata#10231) fix: Implement NaN and inf handling for elasticsearch output (influxdata#10196) feat: add nomad input plugin (influxdata#10106) fix: Print loaded plugins and deprecations for once and test (influxdata#10205) fix: eliminate MIB dependency for ifname processor (influxdata#10214) feat: Optimize locking for SNMP MIBs loading. (influxdata#10206) feat: Add SMART plugin concurrency configuration option, nvme-cli v1.14+ support and lint fixes. (influxdata#10150) feat: update configs (influxdata#10236) fix(inputs/kube_inventory): set TLS server name config properly (influxdata#9975) fix: Sudden close of Telegraf caused by OPC UA input plugin (influxdata#10230) fix: bump github.com/eclipse/paho.mqtt.golang from 1.3.0 to 1.3.5 (influxdata#9913) fix: json_v2 parser timestamp setting (influxdata#10221) fix: ensure graylog spec fields not prefixed with '_' (influxdata#10209) docs: remove duplicate links in CONTRIBUTING.md (influxdata#10218) fix: pool detection and metrics gathering for ZFS >= 2.1.x (influxdata#10099) fix: parallelism fix for ifname processor (influxdata#10007) chore: Forbids "log" package only for aggregators, inputs, outputs, parsers and processors (influxdata#10191) docs: address documentation gap when running telegraf in k8s (influxdata#10215) feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10211) fix: mqtt topic extracting no longer requires all three fields (influxdata#10208) fix: windows service - graceful shutdown of telegraf (influxdata#9616) feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10201) feat: Modbus support multiple slaves (gateway feature) (influxdata#9279) fix: Revert unintented corruption of the Makefile from influxdata#10200. (influxdata#10203) chore: remove triggering update-config bot in CI (influxdata#10195) Update changelog feat: Implement deprecation infrastructure (influxdata#10200) fix: extra lock on init for safety (influxdata#10199) fix: resolve influxdata#10027 (influxdata#10112) fix: register bigquery to output plugins influxdata#10177 (influxdata#10178) fix: sysstat use unique temp file vs hard-coded (influxdata#10165) refactor: snmp to use gosmi (influxdata#9518) ...
Required for all PRs:
resolves #3784
resolves #6450
resolves #5720
No longer calls
snmptranslateand parses the output. Now it relies ongosmilibrary to parse mib files directly to increase performance. Moved all gosmi code into internal/snmp