Fix when running nsdperf over RoCE
The original version used only the first GID of a specific network interface. This fix puts in a vector all GIDs it can find for a given interface, then it finds the first interface name and uses it to run the tests.
In order to show the GIDs of a Mellanox ConnectX-6 adapter use:
[root@localhost ~]# show_gids mlx5_0 DEV PORT INDEX GID IPv4 VER DEV
mlx5_0 1 0 fe80:0000:0000:0000:0e42:a1ff:fe5d:4db8 v1 ens1f0 mlx5_0 1 1 fe80:0000:0000:0000:0e42:a1ff:fe5d:4db8 v2 ens1f0 mlx5_0 1 2 fe80:0000:0000:0000:1186:a6b1:3f0b:c441 v1 ens1f0 mlx5_0 1 3 fe80:0000:0000:0000:1186:a6b1:3f0b:c441 v2 ens1f0 mlx5_0 1 4 0000:0000:0000:0000:0000:ffff:ac13:003c 172.19.0.60 v1 ens1f0 mlx5_0 1 5 0000:0000:0000:0000:0000:ffff:ac13:003c 172.19.0.60 v2 ens1f0 n_gids_found=6
Note that in this case the original nsdperf version fails because it uses only the first GID and nsdperf exists with this error below:
[root@localhost ~]# ./nsdperf-rdma -r mlx5_0/1 -s -d 05:32:39.904017 nsdperf-rdma 1.28 server started Connection from 172.19.4.2 05:32:49.594867 got msg Version ID 2 len 0 from 172.19.4.2/0 05:32:49.620581 got msg Parms ID 4 len 56 from 172.19.4.2/0 05:32:49.642896 RDMA port mlx5_0:1 has no address 05:32:49.643393 sending msg ReplyErr ID 4 len 24 to 172.19.4.2/0 05:33:22.192687 got msg Kill ID 6 len 0 from 172.19.4.2/0 Connection to 172.19.4.2/0 broken 05:33:22.193039 Closed connection to 172.19.4.2/0
thanks a lot. As soon we can test in our lab will merge. Many thanks for the work
@cristeab Sorry for the delay
We have tested it and works nicely, thanks you so much for the effort you put into this. You are going to bare with us here a little bit, let me explain.
We (and I specially) was not expecting to get collaborations on the code this "soon", we have an internal repository where we we have a nsdperf version 1.29 where RoCE is already there plus other things. But your collaboration has put show clearly a few things.
First and foremost current model we use use to develop this tool does not work as-is. We are de facto alienating non IBMers from helping as into the development at best by using internal repos instead of a public one
Second, for once we have a good collaboration we are not sure how to proceed, we have 1.29 there with this and other changes
And last but certainly not least, we need to change how we work here. It won't happen overnight but the conversations have started already and I hope we can come out with something more clear later on. To you but to any other collaborator please bare with us a bit longer.
Thanks a lot for your effort but for now I will not merge the changes until we have a more clear way how we proceed here if fully move 1.29 here, or a variation of your changes and 1.29
I really thank you what you have done once again, I hope you do not feel it is going to waste even if we go with 1.29. Even if that is the case this is clearly showing that we need to change how we move forward with this excellent network benchmark tool.