fix(datapath): disable VF info collection in netlink#43517
Conversation
|
overall code looks good to me. there's a failed test, but it could be a flake. havent looked into it yet. hopefully someone can hit the retry |
|
/ci-integration |
|
/test |
|
Hi, what problem does this solve, exactly? Is there an existing issue that points out why the lock being held is a problem? Thanks. |
|
@pasteley , can you add a release note message to the pr description? |
|
/test |
|
/ci-ginkgo |
|
https://github.com/cilium/cilium/actions/runs/20848915321 looks like a flake. i see a similar failure on other branches too |
|
/ci-l7 |
ti-mo
left a comment
There was a problem hiding this comment.
Thanks, not sure how I missed the issue link. Left a few comments outlining my thoughts and the way forward for this package.
Disable Virtual Function information collection in netlink handle operations to avoid hold of `rtnl_mutex` on systems with SR-IOV devices. Fixes: cilium#43516 Signed-off-by: pasteley <ceasebeing@gmail.com>
|
/test |
ti-mo
left a comment
There was a problem hiding this comment.
I just realized this completely sidesteps the purpose of the safenetlink package, since NewHandle returns a netlink.Handle, whose methods aren't intercepted by the safenetlink package. I think we need to introduce a safenetlink.Handle that wraps the netlink one and also wraps its methods with a retry loop.
@gandro PTAL
|
Took a look - since we moved to Calling unsafe functions on the handle require you to use If we introduce a wrapper, then we have to make sure that all wrapped functions are safe. So from a safety perspective, I think this PR is fine as is. But it would probably be a potential improvement if we wrapped |
|
On a only slightly related node: What we could consider is making sure that Line 63 in dcf00f5 That way, new code would have to always use |
|
@pasteley I'd like to hold off on this PR a bit as it runs the risk of regressing. In its current state, it doesn't handle |
I'm not sure I follow? Our linter does complain if you call a function on |
Ohhh, I see. We already have a bunch of places where Handle is used, forbidigo silenced, and an explicit retry loop added. Of course, then we can merge this as-is. Just need to figure out how to disable VF collection by default if/when we manage to get a retry mechanism added upstream. I'll spend some time on that when I make an upstream proposal. |
|
@cilium/sig-datapath ptal |
smagnani96
left a comment
There was a problem hiding this comment.
LGTM. I don't see us using the retrieved link.Attrs().Vfs in our codebase nowadays.
Disable Virtual Function information collection in netlink handle operations to avoid hold of
rtnl_mutexon systems with SR-IOV devices.Fixes: #43516