Skip to content

[nvmdev] fix bug in construction of parent PCI device#43

Merged
cdesiniotis merged 2 commits intoNVIDIA:mainfrom
cdesiniotis:fix-nvmdev-new-parent-device
Jul 16, 2024
Merged

[nvmdev] fix bug in construction of parent PCI device#43
cdesiniotis merged 2 commits intoNVIDIA:mainfrom
cdesiniotis:fix-nvmdev-new-parent-device

Conversation

@cdesiniotis
Copy link
Contributor

When constructing NvidiaPCIDevice objects for each 'parent' device in the '/sys/class/mdev_bus' directory, use the default PCI devices root '/sys/bus/pci/devices'. All devices in '/sys/class/mdev_bus' will have a corresponding directory at '/sys/bus/pci/devices'.

Starting with bf3f431 the construction of the NvidiaPCIDevice object will fail when attempting to detect the physfn. When SRIOV is used, all the VFs will show up under '/sys/class/mdev_bus', but the physfn will only show up under '/sys/bus/pci/devices'.

When constructing NvidiaPCIDevice objects for each 'parent' device in the
'/sys/class/mdev_bus' directory, use the default PCI devices root
'/sys/bus/pci/devices'. All devices in '/sys/class/mdev_bus' will have
a corresponding directory at '/sys/bus/pci/devices'.

Starting with NVIDIA@bf3f431
the construction of the NvidiaPCIDevice object will fail when attempting to detect the physfn.
When SRIOV is used, all the VFs will show up under '/sys/class/mdev_bus', but the physfn will
only show up under '/sys/bus/pci/devices'.

Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
@cdesiniotis cdesiniotis marked this pull request as draft July 15, 2024 19:26
@klueska
Copy link
Collaborator

klueska commented Jul 15, 2024

The commit you linked is a broken link. What was the change that broke things?

@cdesiniotis
Copy link
Contributor Author

bf3f431

@klueska
Copy link
Collaborator

klueska commented Jul 15, 2024

@PiotrProkop can you take a look at this

@cdesiniotis
Copy link
Contributor Author

To provide more context, a call to nvmdev.GetAllParentDevices() results in the following error:

error getting all parent devices: error constructing NVIDIA parent device: failed to construct NVIDIA PCI device: unable to detect physfn for 0000:3b:00.4: unable to read PCI device vendor id for 0000:3b:00.0: open /sys/class/mdev_bus/0000:3b:00.0/vendor: no such file or directory"

0000:3b:00.0 is the PF for 0000:3b:00.4 (VF). 0000:3b:00.4 and all the other VFs will have entries at /sys/class/mdev_bus, but the PF does not.

Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
@cdesiniotis cdesiniotis force-pushed the fix-nvmdev-new-parent-device branch from 7374add to 7c3222d Compare July 15, 2024 21:35
@cdesiniotis cdesiniotis marked this pull request as ready for review July 15, 2024 21:36
@PiotrProkop
Copy link
Contributor

Good catch!
/lgtm

@cdesiniotis cdesiniotis merged commit d3091e7 into NVIDIA:main Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants