In this article, We will describe how to procure NICs, how to configure them other than ovs-dpdk, and what issues we are facing.
Since the topics are diverse, you may read them in any order.
This is a sequel from the previous article.*1
The following is a list of related articles.
1.Topics
2.How to procure a NIC
There are three drivers that support vDPA on the Linux Kernel.
linux/drivers/vdpa at master · torvalds/linux · GitHub
The NICs that are supported by each driver are as follows.
ifcvf | Intel SmartNIC N3000 series, N5000 series | Expensive($4k) and difficult to purchase by individuals |
mlx5 | Mellanox ConnectX-6 Dx, BlueField SmartNIC | In the ConnectX-6 Dx, expensive($1.2k) but you can be purchased individuals |
vdpa_sim | The vdpa device simulator is installed on the CentOS8.3 standard kernel, and you can check the details with the modinfo vdpa_sim command. | Since it is a simulator, it cannot communicate with external NWs. |
Therefore, We think it is practical to purchase the ConnectX-6 Dx.
Also, for the ConnectX-6 Dx, the officially supported model numbers are listed on the following website.
5. MLX5 vDPA driver — Data Plane Development Kit 21.05.0 documentation
Mellanox® ConnectX®-6 200G MCX654106A-HCAT (2x200G)
Mellanox® ConnectX®-6 Dx EN 25G MCX621102AN-ADAT (2x25G)
Mellanox® ConnectX®-6 Dx EN 100G MCX623106AN-CDAT (2x100G)
Mellanox® ConnectX®-6 Dx EN 200G MCX623105AN-VDAT (1x200G)
Mellanox® BlueField SmartNIC 25G MBF1M332A-ASCAT (2x25G)
Of the above, "MCX623106AN-CDAT" can be purchased at the following site.
*2
NVIDIA Mellanox MCX623106AN-CDAT ConnectX®-6 Dx EN Network Interface Card
https://www.fs.com/products/119646.html
When We procured the NIC, we bought the HPE-branded "MCX623106AS-CDAT" on eBay because the above site did not yet carry it.*3
3.How to configure in Linux Bridge
In the previous article, we used ovs-dpdk for three reasons.
- In the case of ovs-dpdk, you need to explicitly specify the representor in a way that includes the bsf number. We used ovs-dpdk because we thought this would help you understand the relationship between PF, VF, and VF rep.
- Comparing ovs-dpdk, ovs, and Linux Bridge, ovs-dpdk requires the most time-consuming and complicated configuration. On the other hand, we thought that if you could understand the configuration of ovs-dpdk, you could easily understand the other cases.
- This is because we are planning to run TRex on a virtual machine in the future, and we wanted to run the load test in each environment of ovs-dpdk, ovs, and Linux Bridge.
Note
For example, in the case of vhost_vdpa, if you follow the red text below, you will understand the relationship between PF, VF, and VF rep.
ovs-vsctl add-port br30-ovs ens2f0_0 -- set Interface ens2f0_0 type=dpdk options:dpdk-devargs=0000:07:00.0,representor=[0] [root@c83g155 ~]# lshw -businfo -c network Bus info Device Class Description ======================================================== pci@0000:07:00.0 ens2f0 network MT2892 Family [ConnectX-6 Dx] pci@0000:07:00.1 ens2f1 network MT2892 Family [ConnectX-6 Dx] pci@0000:07:00.2 ens2f0v0 network ConnectX Family mlx5Gen Virtual Function pci@0000:07:01.2 ens2f1v0 network ConnectX Family mlx5Gen Virtual Function pci@0000:07:00.0 ens2f0_0 network Ethernet interface pci@0000:07:00.1 ens2f1_0 network Ethernet interface [root@c83g155 ~]# ls -Fal /sys/bus/vdpa/drivers/vhost_vdpa total 0 drwxr-xr-x 2 root root 0 Apr 8 00:06 ./ drwxr-xr-x 3 root root 0 Apr 7 23:49 ../ --w------- 1 root root 4096 Apr 8 00:07 bind lrwxrwxrwx 1 root root 0 Apr 8 00:07 module -> ../../../../module/vhost_vdpa/ --w------- 1 root root 4096 Apr 8 00:06 uevent --w------- 1 root root 4096 Apr 8 00:07 unbind lrwxrwxrwx 1 root root 0 Apr 8 00:07 vdpa0 -> ../../../../devices/pci0000:00/0000:00:03.0/0000:07:00.2/vdpa0/ lrwxrwxrwx 1 root root 0 Apr 8 00:07 vdpa1 -> ../../../../devices/pci0000:00/0000:00:03.0/0000:07:01.2/vdpa1/ qemu-system-x86_64 \ -enable-kvm \ -cpu host \ -m 8G \ -hda /var/lib/libvirt/images/c77g1532.qcow2 \ -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0 \ -device virtio-net-pci,netdev=vhost-vdpa0,page-per-vq=on,iommu_platform=on,disable-legacy=on \ -nographic \ 2>&1 | tee vm153.log
This section describes how to configure Linux Bridge.
Please make sure you have done "6. Change to SR-IOV switchdev mode" in the previous article.
3-1.Creating the Linux Bridge
nmcli connection add type bridge autoconnect yes con-name br30ens2f0 ifname br30ens2f0 nmcli connection modify br30ens2f0 bridge.stp no nmcli connection modify br30ens2f0 ipv4.method disabled ipv6.method ignore nmcli connection up br30ens2f0 nmcli connection add type bridge-slave ifname ens2f0 master br30ens2f0 nmcli connection add type bridge-slave ifname ens2f0_0 master br30ens2f0 nmcli connection add type bridge autoconnect yes con-name br31ens2f1 ifname br31ens2f1 nmcli connection modify br31ens2f1 bridge.stp no nmcli connection modify br31ens2f1 ipv4.method disabled ipv6.method ignore nmcli connection up br31ens2f1 nmcli connection add type bridge-slave ifname ens2f1 master br31ens2f1 nmcli connection add type bridge-slave ifname ens2f1_0 master br31ens2f1
3-2.Checking the Linux Bridge
nmcli con show [root@c83g155 ~]# nmcli con show NAME UUID TYPE DEVICE System eno1 c68c8589-0538-477f-8921-d1da3a50a7c1 ethernet eno1 br11eno2 15c513f2-8180-44e7-b5a5-af8a61574118 bridge br11eno2 br30ens2f0 c3d898f3-c3c4-4f84-ae94-4c0d663bd02c bridge br30ens2f0 br31ens2f1 0704f4ff-d502-4541-b2a9-b6bc7291ae47 bridge br31ens2f1 virbr0 28699aa1-57bf-4686-86f4-7625a2311628 bridge virbr0 bridge-slave-eno2 4e9db04a-0e3a-4099-8726-0c3fc5b09d78 ethernet eno2 bridge-slave-ens2f0 d612bf46-c387-4883-aef1-b37b5d29aca2 ethernet ens2f0 bridge-slave-ens2f0_0 173c7792-6df9-41af-b6ab-9a4f66567d95 ethernet ens2f0_0 bridge-slave-ens2f1 2a90162e-585a-4c5e-ac52-73ca935c80d0 ethernet ens2f1 bridge-slave-ens2f1_0 327e0276-f90c-4dd2-89fd-2aeeabd6c69f ethernet ens2f1_0 eno3 8bcc3718-cdd2-40f9-a484-d25ccc5b9a11 ethernet -- eno4 3e0a2034-acbb-49c6-a63f-6438213c51c3 ethernet -- ens1f0 fd0d2166-840e-4e1a-bfbd-55c0df2ad791 ethernet -- ens1f1 f9c8d336-f15b-4b8b-91c9-14d5094d7fd7 ethernet --
3-3.Enabling the vhost_vdpa module : For vhost_vdpa
If the virtio_vdpa module is enabled, please disable it using the rmmod virtio_vdpa command.
rmmod virtio_vdpa modprobe vhost_vdpa
3-4.Start the virtual machine : For vhost_vdpa
c77g153 (8) qemu-system-x86_64 \ -enable-kvm \ -cpu host \ -m 8G \ -hda /var/lib/libvirt/images/c77g1532.qcow2 \ -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0 \ -device virtio-net-pci,netdev=vhost-vdpa0,page-per-vq=on,iommu_platform=on,disable-legacy=on \ -nographic \ 2>&1 | tee vm153.log c77g159 (9) qemu-system-x86_64 \ -enable-kvm \ -cpu host \ -m 8G \ -hda /var/lib/libvirt/images/c77g1592.qcow2 \ -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-1,id=vhost-vdpa1 \ -device virtio-net-pci,netdev=vhost-vdpa1,page-per-vq=on,iommu_platform=on,disable-legacy=on \ -nographic \ 2>&1 | tee vm159.log
3-5.Enabling the virtio_vdpa module : For virtio_vdpa
If the vhost_vdpa module is enabled, disable it with the rmmod vhost_vdpa command.
rmmod vhost_vdpa modprobe virtio_vdpa
3-6.Configuring dpdk-vdpa : For virtio_vdpa
/usr/src/dpdk/build/examples/dpdk-vdpa \
--socket-mem 1024,1024 \
-a 0000:07:00.2,class=vdpa \
-a 0000:07:01.2,class=vdpa \
--log-level=pmd,debug -- -i
create /tmp/sock-virtio0 0000:07:00.2
create /tmp/sock-virtio1 0000:07:01.2
3-7.Start a virtual machine : For virtio_vdpa
c77g153 virsh start c77g153; virsh console c77g153 c77g159 virsh start c77g159; virsh console c77g159
4.How to configure ovs (without dpdk)
The configuration of ovs is very simple, just like a Linux Bridge.
We will not use dpdk, but we will configure HW offload.
4-1.Initial configuration of ovs
systemctl start openvswitch ovs-vsctl set Open_vSwitch . other_config:hw-offload=true other_config:tc-policy=none systemctl restart openvswitch ovs-vsctl get Open_vSwitch . other_config
The configuration values for other_config should look like the following.
Please refer to the previous article for how to delete other configuration values (dpdk-init, etc.).
ovs-vsctl get Open_vSwitch . other_config [root@c83g155 ~]# ovs-vsctl get Open_vSwitch . other_config {hw-offload="true", tc-policy=none}
4-2.Configuring ovs
ovs-vsctl add-br br30-ovs ovs-vsctl add-port br30-ovs ens2f0 ovs-vsctl add-port br30-ovs ens2f0_0 ovs-vsctl add-br br31-ovs ovs-vsctl add-port br31-ovs ens2f1 ovs-vsctl add-port br31-ovs ens2f1_0 ovs-vsctl show
It is set as follows
ovs-vsctl show [root@c83g155 ~]# ovs-vsctl show 59a34ea2-ca80-48b9-8b14-a656c79bc451 Bridge br31-ovs Port ens2f1 Interface ens2f1 Port ens2f1_0 Interface ens2f1_0 Port br31-ovs Interface br31-ovs type: internal Bridge br30-ovs Port br30-ovs Interface br30-ovs type: internal Port ens2f0 Interface ens2f0 Port ens2f0_0 Interface ens2f0_0 ovs_version: "2.14.1"
4-3.Start a virtual machine
The procedure for starting a virtual machine is the same as in 3-3 to 3-7.
4-4.Status check
ConsoleA
vhost_vdpa
Apr 15 10:00:53 c83g155 kernel: mlx5_core 0000:07:00.2: mlx5_vdpa_set_status:1786:(pid 8284): performing device reset Apr 15 10:00:53 c83g155 kvm[8292]: 1 guest now active Apr 15 10:00:53 c83g155 dbus-daemon[2123]: [session uid=0 pid=2123] Activating via systemd: service name='org.freedesktop.Tracker1.Miner.Extract' unit='tracker-extract.service' requested by ':1.73' (uid=0 pid=2781 comm="/usr/libexec/tracker-miner-fs ") Apr 15 10:00:53 c83g155 systemd[1964]: Starting Tracker metadata extractor... Apr 15 10:00:53 c83g155 dbus-daemon[2123]: [session uid=0 pid=2123] Successfully activated service 'org.freedesktop.Tracker1.Miner.Extract' Apr 15 10:00:53 c83g155 systemd[1964]: Started Tracker metadata extractor. Apr 15 10:01:16 c83g155 kernel: mlx5_core 0000:07:00.2: mlx5_vdpa_handle_set_map:473:(pid 8291): memory map update Apr 15 10:01:18 c83g155 ovs-vswitchd[8061]: ovs|00001|odp_util(handler4)|ERR|internal error parsing flow key recirc_id(0),dp_hash(0),skb_priority(0),in_port(3),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=00:11:22:33:44:00,dst=01:00:5e:00:00:16),eth_type(0x0800),ipv4(src=192.168.30.153,dst=224.0.0.22,proto=2,tos=0xc0,ttl=1,frag=no) Apr 15 10:01:18 c83g155 ovs-vswitchd[8061]: ovs|00002|odp_util(handler4)|ERR|internal error parsing flow key recirc_id(0),dp_hash(0),skb_priority(0),in_port(5),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=00:11:22:33:44:00,dst=01:00:5e:00:00:16),eth_type(0x0800),ipv4(src=192.168.30.153,dst=224.0.0.22,proto=2,tos=0xc0,ttl=1,frag=no) Apr 15 10:01:18 c83g155 kernel: tc mirred to Houston: device br30-ovs is down Apr 15 10:01:18 c83g155 kernel: tc mirred to Houston: device br31-ovs is down =====================snip===================== Apr 15 10:01:36 c83g155 systemd[1964]: tracker-extract.service: Succeeded.
You see the message "tc mirred to Houston: device br30-ovs is down", but there is no problem.
virtio_vdpa
The same message as ovs-dpdk will be output.
ConsoleB
[root@c83g155 ~]# ovs-ofctl -O OpenFlow14 dump-ports br30-ovs OFPST_PORT reply (OF1.4) (xid=0x2): 3 ports port LOCAL: rx pkts=0, bytes=0, drop=13, errs=0, frame=0, over=0, crc=0 tx pkts=0, bytes=0, drop=0, errs=0, coll=0 duration=302.134s port ens2f0: rx pkts=2557039, bytes=260820713, drop=0, errs=0, frame=0, over=0, crc=0 tx pkts=2557043, bytes=260821159, drop=0, errs=0, coll=0 duration=302.105s port "ens2f0_0": rx pkts=2557014, bytes=250589621, drop=0, errs=0, frame=0, over=0, crc=0 tx pkts=33992436, bytes=3162372357, drop=0, errs=0, coll=0 duration=302.085s
ConsoleC
[root@c83g155 ~]# ovs-ofctl -O OpenFlow14 dump-ports br31-ovs OFPST_PORT reply (OF1.4) (xid=0x2): 3 ports port LOCAL: rx pkts=0, bytes=0, drop=13, errs=0, frame=0, over=0, crc=0 tx pkts=0, bytes=0, drop=0, errs=0, coll=0 duration=308.302s port ens2f1: rx pkts=2557043, bytes=260821159, drop=0, errs=0, frame=0, over=0, crc=0 tx pkts=2557039, bytes=260820713, drop=0, errs=0, coll=0 duration=308.279s port "ens2f1_0": rx pkts=2557014, bytes=250589517, drop=0, errs=0, frame=0, over=0, crc=0 tx pkts=33999072, bytes=3162887345, drop=0, errs=0, coll=0 duration=308.258s
ConsoleD
The same message as ovs-dpdk will be output.
5.Issue 1: How to start a virtual machine
2021/06/01 Update
"How to boot VM with libvirtd" has been added to the following article.
How to set up vDPA with vhost_vdpa for VMs - Metonymical Deflection
From here, We will describe what we tried but did not work.
We have described the following two methods for starting a virtual machine.
- vhost_vdpa:Starting a virtual machine directly from qemu
- virtio_vdpa:Using libvirt to start a virtual machine
This is because the following combination failed to communicate.
starting method | vhost_vdpa | virtio_vdpa |
qemu | OK | NG |
libvirt | OK |
In this section, we will describe the specific settings where communication was not possible.
5-1.Starting a virtual machine with qemu and virtio_vdpa
It is possible to start the virtual machine with the following command, but could not communicate with it.
qemu-system-x86_64 \ -enable-kvm \ -cpu host \ -m 8G \ -hda /var/lib/libvirt/images/c77g1532.qcow2 \ -mem-prealloc \ -chardev socket,id=char0,path=/tmp/sock-virtio0 \ -netdev type=vhost-user,id=vdpa,chardev=char0 \ -device virtio-net-pci,netdev=vdpa,mac=00:11:22:33:44:00,page-per-vq=on \ -nographic \ 2>&1 | tee vm153.log
We would like to replace the following libvirt settings with qemu settings.
<currentMemory unit='KiB'>4194304</currentMemory> <memoryBacking> <hugepages> <page size='1048576' unit='KiB'/> </hugepages> </memoryBacking> <cpu mode='custom' match='exact' check='partial'> <numa> <cell id='0' cpus='0-1' memory='4194304' unit='KiB' memAccess='shared'/> </numa> </cpu>
However, after checking qemu's help, it seems to be possible to configure the NUMA, but we don't know how to configure Hugepages for qemu, so We would like to continue my investigation.
5-2.Starting a virtual machine with libvirt and vhost_vdpa
In the case of vhost_vdpa, vDPA devices can be configured in XML.*4
https://libvirt.org/formatdomain.html#vdpa-devices
The configurations are as follows.
<devices> ...... <interface type='vdpa'> <source dev='/dev/vhost-vdpa-0'/> </interface> ...... </devices>
But, in addition to the above settings, we tried configuration the following settings, but was unable to communicate.
virt-xml c77g153 --edit --qemu-commandline='-device' virt-xml c77g153 --edit --qemu-commandline='virtio-net-pci,netdev=vhost-vdpa0,page-per-vq=on,iommu_platform=on,disable-legacy=on'
Since we think it is better to unify to libvirt for the management of virtual machines, we are going to investigate the method 5-2 as a priority.
6.Issue 2: VF relay configuration
The following Mellanox site describes how to set up Hardware vDPA with VF Relay.
VirtIO Acceleration through VF Relay (Software & Hardware vDPA)
https://docs.mellanox.com/pages/viewpage.action?pageId=43718786#OVSOffloadUsingASAP%C2%B2Direct-swvdpaVirtIOAccelerationthroughVFRelay(Software&HardwarevDPA)
A sample configuration is provided, and the key point is "vdpa-socket-path=<sock path>".
ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa \ options:vdpa-socket-path=<sock path> \ options:vdpa-accelerator-devargs=\ options:dpdk-devargs= ,representor=[id] \ options: vdpa-max-queues = \ options: vdpa-sw=
Specifically, the configurations are as follows.
ovs-vsctl add-port br0-ovs vdpa0 -- set Interface vdpa0 type=dpdkvdpa options:vdpa-socket-path=/var/run/virtio-forwarder/sock0 options:vdpa-accelerator-devargs=0000:01:00.2 options:dpdk-devargs=0000:01:00.0,representor=[0] options: vdpa-max-queues=8
The sock file in "/var/run/virtio-forwarder/sock0" needs to be configured for VF relay in advance using "virtio-forwarder".
Virtio-forwarder Documentation — Virtio-forwarder 1.2.99.19 documentation
However, when We installed virtio-forwarder following the steps on the above site, We were unable to install it with the following error.
Error: Problem: package dpdk-19.11.3-1.el8.x86_64 requires libmlx4.so.1()(64bit), but none of the providers can be installed - package dpdk-19.11.3-1.el8.x86_64 requires libmlx4.so.1(MLX4_1.0)(64bit), but none of the providers can be installed ……
For this reason, We're going to try to build from source after linking each library of "dpdk21.02" that we're using this time.
As stated on the Mellanox site, this method is not approved by the upstream of ovs-dpdk, so we're going to investigate it with a lower priority.*5
7.Finally
The sites that we referred to are listed in the text.
We don't know if vDPA will become more popular or disappear in the future, but We predict that its true value will be realized when used in combination with technologies such as ScalableIOV.
01.org
In next article, We would like to describe in detail how to configure vDPA in Kubernetes environment.
Related Articles
No | vm(qemu)/k8s | k8s Pod/VMI | vDPA Framework | vDPA Type | SR-IOV mode | Related Articles |
1 | vm | - | kernel | vhost | lagacy | Not started |
2 | vm | - | kernel | vhost | switchdev | How to set up vDPA with vhost_vdpa for VMs - Metonymical Deflection |
3 | vm | - | kernel | virtio | lagacy | Not started |
4 | vm | - | kernel | virtio | switchdev | Not started |
5 | vm | - | dpdk | vhost | lagacy | Not started |
6 | vm | - | dpdk | vhost | switchdev | Not started |
7 | vm | - | dpdk | virtio | lagacy | Not started |
8 | vm | - | dpdk | virtio | switchdev | How to set up vDPA with virtio_vdpa for VMs - Metonymical Deflection |
9 | k8s | pod | kernel | vhost | lagacy | How to set up vDPA with vhost_vdpa for Kubernetes - Metonymical Deflection |
10 | k8s | pod | kernel | vhost | switchdev | How to set up vDPA with vhost_vdpa for Kubernetes + Accelerated Bridge CNI - Metonymical Deflection |
11 | k8s | pod | kernel | virtio | lagacy | Not started |
12 | k8s | pod | kernel | virtio | switchdev | Not started |
13 | k8s | pod | dpdk | client | lagacy | Not started |
14 | k8s | pod | dpdk | client | switchdev | Not started |
15 | k8s | pod | dpdk | server | lagacy | Not started |
16 | k8s | pod | dpdk | server | switchdev | Not started |
*1:This article is intended for those who have already read the previous article, so detailed explanations are omitted.
*2:The 100Gbps DACs used in previous article was purchased from this site.
*3:The MCX623106AN-CDAT (officially supported NIC) and MCX623106AS-CDAT (the NIC I bought) have slightly different model numbers, but there is no problem.
*4:libvirt v6.9.0 QEMU only, requires QEMU 5.1.0 or newer
*5:There is one more thing that we're curious about, "type=dpdkvdpa". When we built ovs v2.15.0 from source and specified this type, it was rejected with an error. Therefore, if you don't use the ovs installed in "2-3. Installing Mellanox driver (OFED)" in the previous article, you may not be able to configure VF relay.