So SR-IOV support in my Intel I350-T2V2 decided to stop working recently (or maybe it never worked - I can’t be sure), meaning it was time to pick up a new SR-IOV NIC for testing/development. I settled on a used Mellanox ConnectX-3 from eBay because it supported SR-IOV and other cool things like RDMA over Ethernet (RoCE), the Mellanox guys I’ve dealt with in Nova have been a great bunch and, most crucially at the time, it was cheap. For what it’s worth, I also got a pair of SFP cables
The first time I configured this, I followed the instructions from the
Mellanox website. This mandated downloading a tarball and
using their custom installer script, mlnxofedinstall
, to install the
drivers and various tools. It was only when I later reinstalled the OS on this
machine that I discovered this was wholly unnecessary: Ubuntu 16.04 (and
presumably 18.04) already include everything you need to configure and use
these NICs. As such, here is “Stephen’s Guide to Using Mellanox ConnectX-3
Cards Without All That C***”.
Prerequisites
It should go without saying, but you need a Mellanox ConnectX-3 card for this to be of any use. In addition, I’m using Ubuntu 16.04 because that’s what that OpenStack gate uses, but I think most of this stuff is packaged on Fedora too.
Enable SR-IOV in the firmware
The ConnectX-3 allows you to configure the amount of VFs available on the
device. To do this, the official guide would have you run the mlxconfig
tool, which is installed by the aforementioned mlnxofedinstall
tool.
However, Mellanox have an open source version of this tool, mstconfig
,
which fulfils the same purpose as is available as part of the
mstflint
package. Install this:
$ sudo apt install mstflint
Once installed, inspect the current configuration of the device. To do this, you need to find the PCI address of the device, which is pretty easy when you only have one such device in your system:
$ lspci | grep Mellanox
02:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
$ sudo mstconfig -d 02:00.0 query
Device #1:
----------
Device type: ConnectX3
PCI device: 02:00.0
Configurations: Current
SRIOV_EN 1
NUM_OF_VFS 8
LOG_BAR_SIZE 3
BOOT_OPTION_ROM_EN_P1 1
BOOT_VLAN_EN_P1 0
BOOT_RETRY_CNT_P1 0
LEGACY_BOOT_PROTOCOL_P1 1
BOOT_VLAN_P1 1
BOOT_OPTION_ROM_EN_P2 1
BOOT_VLAN_EN_P2 0
BOOT_RETRY_CNT_P2 0
LEGACY_BOOT_PROTOCOL_P2 1
BOOT_VLAN_P2 1
As you can see, I already have SR-IOV enabled (SRIOV_EN=1
) and have enabled
eight VFs (NUM_OF_VFS=8
). If this wasn’t the case though, you’d need to
configure these attributes. You can do so using the mstconfig
tool again.
For example:
$ sudo mstconfig -d 02:00.0 set SRIOV_EN=1 NUM_OF_VFS=8
Device #1:
----------
Device type: ConnectX3
PCI device: 02:00.0
Configurations: Current New
SRIOV_EN 1 1
NUM_OF_VFS 8 8
LOG_BAR_SIZE 3 3
BOOT_OPTION_ROM_EN_P1 1 1
BOOT_VLAN_EN_P1 0 0
BOOT_RETRY_CNT_P1 0 0
LEGACY_BOOT_PROTOCOL_P1 1 1
BOOT_VLAN_P1 1 1
BOOT_OPTION_ROM_EN_P2 1 1
BOOT_VLAN_EN_P2 0 0
BOOT_RETRY_CNT_P2 0 0
LEGACY_BOOT_PROTOCOL_P2 1 1
BOOT_VLAN_P2 1 1
Apply new Configuration? ? (y/n) [n] :
If applying configuration, you should now reboot and then inspect the configuration to ensure it has persisted:
$ sudo mstconfig -d 02:00.0 query
Once the device’s firmware is configured, we can move on to configuring the driver.
Enable SR-IOV in the driver
As with mstconfig
above, Ubuntu 16.04 also provides in-tree alternatives to
the drivers provided in the tarball o’ doom. Better yet, these drivers are
provided and enabled by default: all we need to do is configure them.
As noted in the original guide, this can be done by
creating (or editing) the /etc/modprobe.d/mlx4_core.conf
file. Add the
following to that file:
options mlx4_core num_vfs=4,4,0 port_type_array=2,2 probe_vf=4,4,0 probe_vf=4,4,0
Reproducing (in slightly modified form) from the guide, this means:
num_vfs - is the number of VF required for this server, in this example 8 VFs.
port_type_array - is the port type of the interface, 1 is for infiniBand, 2 for Ethernet. In this example, both ports are Ethernet.
probe_vf - is the number of VF to be probed in the hypervisor. Probed in the hypervisor means that the VF will also have interface in the hypervisor (e.g. can be seen using the command
ifconfig
). In this example there are no probed VFs. When runningifconfig
, no new interfaces will be added (per VF). In case,probe_vf
was equal to 1 for example, we would get 2 new interfaces in the hypervisor (checkifconfig -a
), one each port.Probed VFs can be used by the IT administrator to monitor the traffic on that hypervisor without the need of doing that via logging to the VM itself.
In this example, we will have 4 VFs on the first physical port and 4 on the other. The 0 indicates that you don’t want VF to be probed on both port. Refer to the Mellanox docs for more information.
Of these, the probe_vf
one is particularly important. Without this, you’ll
see the VFs listed under their parent PF with ip link
but each VF will not
have its own netdev. Nova requires that these devices do have their own netdev
so this is a necessity.
Once this is configured, save the file and reload the driver.
$ sudo modprobe -r mlx4_en mlx4_ib
$ sudo modprobe mlx4_en
You should now see the devices listed in ip link
:
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 0c:c4:7a:d8:bd:72 brd ff:ff:ff:ff:ff:ff
3: enp6s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 0c:c4:7a:d8:bd:73 brd ff:ff:ff:ff:ff:ff
6: enp2s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop portid e41d2d03004c47c0 state DOWN mode DEFAULT group default qlen 1000
link/ether e4:1d:2d:4c:47:c0 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 1 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 2 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 3 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 4 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 5 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 6 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 7 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
7: enp2s0d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop portid e41d2d03004c47c1 state DOWN mode DEFAULT group default qlen 1000
link/ether e4:1d:2d:4c:47:c1 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 1 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 2 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 3 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 4 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 5 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 6 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 7 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
8: enp2s0f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop portid e41d2d03004c47c0 state DOWN mode DEFAULT group default qlen 1000
link/ether ce:c9:04:d2:00:a4 brd ff:ff:ff:ff:ff:ff
9: enp2s0f1d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop portid e41d2d03004c47c1 state DOWN mode DEFAULT group default qlen 1000
link/ether ce:20:d7:8b:38:6c brd ff:ff:ff:ff:ff:ff
10: enp2s0f2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop portid e41d2d03004c47c0 state DOWN mode DEFAULT group default qlen 1000
link/ether fe:a0:dc:21:1f:4c brd ff:ff:ff:ff:ff:ff
11: enp2s0f2d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop portid e41d2d03004c47c1 state DOWN mode DEFAULT group default qlen 1000
link/ether 46:a5:f9:9c:ee:27 brd ff:ff:ff:ff:ff:ff
....
Next steps
Now that everything is configured, it’s time to start using it. I dove straight
in with OpenStack. Feel free to use the local.conf
I used to
deploy this with DevStack. The neutron SR-IOV docs are probably worth a look
too. These are based on the Rocky release (August 2018) so they probably won’t
age well, but they are a starting point.