Configuring SR-IOV for a Mellanox ConnectX-3 NIC

So SR-IOV support in my Intel I350-T2V2 decided to stop working recently (or maybe it never worked - I can’t be sure), meaning it was time to pick up a new SR-IOV NIC for testing/development. I settled on a used Mellanox ConnectX-3 from eBay because it supported SR-IOV and other cool things like RDMA over Ethernet (RoCE), the Mellanox guys I’ve dealt with in Nova have been a great bunch and, most crucially at the time, it was cheap. For what it’s worth, I also got a pair of SFP cables

The first time I configured this, I followed the instructions from the Mellanox website. This mandated downloading a tarball and using their custom installer script, mlnxofedinstall, to install the drivers and various tools. It was only when I later reinstalled the OS on this machine that I discovered this was wholly unnecessary: Ubuntu 16.04 (and presumably 18.04) already include everything you need to configure and use these NICs. As such, here is “Stephen’s Guide to Using Mellanox ConnectX-3 Cards Without All That C***”.

Prerequisites

It should go without saying, but you need a Mellanox ConnectX-3 card for this to be of any use. In addition, I’m using Ubuntu 16.04 because that’s what that OpenStack gate uses, but I think most of this stuff is packaged on Fedora too.

Enable SR-IOV in the firmware

The ConnectX-3 allows you to configure the amount of VFs available on the device. To do this, the official guide would have you run the mlxconfig tool, which is installed by the aforementioned mlnxofedinstall tool. However, Mellanox have an open source version of this tool, mstconfig, which fulfils the same purpose as is available as part of the mstflint package. Install this:

$ sudo apt install mstflint

Once installed, inspect the current configuration of the device. To do this, you need to find the PCI address of the device, which is pretty easy when you only have one such device in your system:

$ lspci | grep Mellanox
02:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
$ sudo mstconfig -d 02:00.0 query

Device #1:
----------

Device type:    ConnectX3
PCI device:     02:00.0

Configurations:                              Current
         SRIOV_EN                            1
         NUM_OF_VFS                          8
         LOG_BAR_SIZE                        3
         BOOT_OPTION_ROM_EN_P1               1
         BOOT_VLAN_EN_P1                     0
         BOOT_RETRY_CNT_P1                   0
         LEGACY_BOOT_PROTOCOL_P1             1
         BOOT_VLAN_P1                        1
         BOOT_OPTION_ROM_EN_P2               1
         BOOT_VLAN_EN_P2                     0
         BOOT_RETRY_CNT_P2                   0
         LEGACY_BOOT_PROTOCOL_P2             1
         BOOT_VLAN_P2                        1

As you can see, I already have SR-IOV enabled (SRIOV_EN=1) and have enabled eight VFs (NUM_OF_VFS=8). If this wasn’t the case though, you’d need to configure these attributes. You can do so using the mstconfig tool again. For example:

$ sudo mstconfig -d 02:00.0 set SRIOV_EN=1 NUM_OF_VFS=8

Device #1:
----------

Device type:    ConnectX3
PCI device:     02:00.0

Configurations:                              Current         New
         SRIOV_EN                            1               1
         NUM_OF_VFS                          8               8
         LOG_BAR_SIZE                        3               3
         BOOT_OPTION_ROM_EN_P1               1               1
         BOOT_VLAN_EN_P1                     0               0
         BOOT_RETRY_CNT_P1                   0               0
         LEGACY_BOOT_PROTOCOL_P1             1               1
         BOOT_VLAN_P1                        1               1
         BOOT_OPTION_ROM_EN_P2               1               1
         BOOT_VLAN_EN_P2                     0               0
         BOOT_RETRY_CNT_P2                   0               0
         LEGACY_BOOT_PROTOCOL_P2             1               1
         BOOT_VLAN_P2                        1               1

 Apply new Configuration? ? (y/n) [n] :

If applying configuration, you should now reboot and then inspect the configuration to ensure it has persisted:

$ sudo mstconfig -d 02:00.0 query

Once the device’s firmware is configured, we can move on to configuring the driver.

Enable SR-IOV in the driver

As with mstconfig above, Ubuntu 16.04 also provides in-tree alternatives to the drivers provided in the tarball o’ doom. Better yet, these drivers are provided and enabled by default: all we need to do is configure them.

As noted in the original guide, this can be done by creating (or editing) the /etc/modprobe.d/mlx4_core.conf file. Add the following to that file:

options mlx4_core num_vfs=4,4,0 port_type_array=2,2 probe_vf=4,4,0 probe_vf=4,4,0

Reproducing (in slightly modified form) from the guide, this means:

  • num_vfs - is the number of VF required for this server, in this example 8 VFs.

  • port_type_array - is the port type of the interface, 1 is for infiniBand, 2 for Ethernet. In this example, both ports are Ethernet.

  • probe_vf - is the number of VF to be probed in the hypervisor. Probed in the hypervisor means that the VF will also have interface in the hypervisor (e.g. can be seen using the command ifconfig). In this example there are no probed VFs. When running ifconfig, no new interfaces will be added (per VF). In case, probe_vf was equal to 1 for example, we would get 2 new interfaces in the hypervisor (check ifconfig -a), one each port.

    Probed VFs can be used by the IT administrator to monitor the traffic on that hypervisor without the need of doing that via logging to the VM itself.

    In this example, we will have 4 VFs on the first physical port and 4 on the other. The 0 indicates that you don’t want VF to be probed on both port. Refer to the Mellanox docs for more information.

Of these, the probe_vf one is particularly important. Without this, you’ll see the VFs listed under their parent PF with ip link but each VF will not have its own netdev. Nova requires that these devices do have their own netdev so this is a necessity.

Once this is configured, save the file and reload the driver.

$ sudo modprobe -r mlx4_en mlx4_ib
$ sudo modprobe mlx4_en

You should now see the devices listed in ip link:

$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 0c:c4:7a:d8:bd:72 brd ff:ff:ff:ff:ff:ff
3: enp6s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 0c:c4:7a:d8:bd:73 brd ff:ff:ff:ff:ff:ff
6: enp2s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop portid e41d2d03004c47c0 state DOWN mode DEFAULT group default qlen 1000
    link/ether e4:1d:2d:4c:47:c0 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 1 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 2 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 3 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 4 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 5 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 6 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 7 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
7: enp2s0d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop portid e41d2d03004c47c1 state DOWN mode DEFAULT group default qlen 1000
    link/ether e4:1d:2d:4c:47:c1 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 1 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 2 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 3 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 4 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 5 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 6 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 7 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
8: enp2s0f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop portid e41d2d03004c47c0 state DOWN mode DEFAULT group default qlen 1000
    link/ether ce:c9:04:d2:00:a4 brd ff:ff:ff:ff:ff:ff
9: enp2s0f1d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop portid e41d2d03004c47c1 state DOWN mode DEFAULT group default qlen 1000
    link/ether ce:20:d7:8b:38:6c brd ff:ff:ff:ff:ff:ff
10: enp2s0f2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop portid e41d2d03004c47c0 state DOWN mode DEFAULT group default qlen 1000
    link/ether fe:a0:dc:21:1f:4c brd ff:ff:ff:ff:ff:ff
11: enp2s0f2d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop portid e41d2d03004c47c1 state DOWN mode DEFAULT group default qlen 1000
    link/ether 46:a5:f9:9c:ee:27 brd ff:ff:ff:ff:ff:ff
....

Next steps

Now that everything is configured, it’s time to start using it. I dove straight in with OpenStack. Feel free to use the local.conf I used to deploy this with DevStack. The neutron SR-IOV docs are probably worth a look too. These are based on the Rocky release (August 2018) so they probably won’t age well, but they are a starting point.

comments powered by Disqus