class_infiniband: continue on syscall EINVAL#705
Merged
SuperQ merged 1 commit intoprometheus:masterfrom Apr 19, 2025
Merged
Conversation
prometheus#704 In some very bleeding-edge configurations, syscalls against some IB counters will return `invalid argument`. This bubbles back to callers (e.g. node_exporter) in a very bad way -- e.g. all IB metrics collection failing when a single or group of IB ports returns this. Suspect this has always been an error case and possible to experience, but it's the first time we're seeing it in a very new hardware deployment. Signed-off-by: Michael Fuller <mfuller@lambdal.com>
|
@SuperQ @pgier @discordianfish Could you please take a look? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addresses #704, in a very naive way.
In some very bleeding-edge configurations, syscalls against some IB counters will return
invalid argument. This bubbles back to callers (e.g. node_exporter) in a very bad way -- e.g. all IB metrics collection failing when a single or group of IB ports returns this.Suspect this has always been an error case and possible to experience (the syscall code path in procfs has been in place for ages), but it's the first time we're seeing it in a very new hardware deployment.