Metonymical Deflection

ゆるく日々のコト・たまにITインフラ

How to set up vDPA with virtio_vdpa for VMs

In this article, we will describe how to set up communication between VMs (virtual machines) using the virtio_vdpa module.

The following is a list of related articles.

How to set up vDPA with vhost_vdpa for VMs - Metonymical Deflection
How to set up vDPA with vhost_vdpa for Kubernetes - Metonymical Deflection
How to set up vDPA with vhost_vdpa for Kubernetes + Accelerated Bridge CNI - Metonymical Deflection
How to set up vDPA - appendix - - Metonymical Deflection
How to set up Scalable Function with vdpa for VMs - Metonymical Deflection

1.Overview

1-1.Environment
IA server                        : ProLiant DL360p Gen8 or DL360 Gen9
System ROM                       : P71 01/22/2018
NIC                              : Mellanox ConnectX-6 Dx (MCX623106AS-CDAT)
OS                               : CentOS8.3(2011)
Kernel                           : 5.11.11-1.el8.elrepo.x86_64
Installed Environment Groups     : 
  @^graphical-server-environment
  @container-management
  @development
  @virtualization-client
  @virtualization-hypervisor
  @virtualization-tools 
Mellanox OFED                    : v5.2-2.2.0.0
qemu-kvm                         : v6.0.0-rc1
DPDK                             : v21.02
ovs                              : v2.14.1
1-2.Overall flow

Advance preparation
Kernel update
Building qemu
Building dpdk
Change to SR-IOV switchdev mode
Configure ovs-dpdk and VM : Different from previous article
Operation check : Different from previous article

Note
Since many items are the same as in the previous article, items that are different are written in bold blue text.
If your environment is already set up in the previous article, please reboot the host OS and start reading from "Change to SR-IOV switchdev mode".

1-3.overall structure

The following points are different from the previous article.

this article (1) /tmp/sock-virtio0
previous article (1) /dev/vhost-vdpa-0

fig.1
f:id:metonymical:20210412202928j:plain
fig.1 is a simplified description and omits the internal architecture. For this reason, please imagine the following configuration in reality.

fig.2
f:id:metonymical:20210413220050j:plain

Quoted from Red Hat's Blog
vDPA kernel framework part 3: usage for VMs and containers

The orange dotted lines (A) and (B) correspond to fig.1 and fig.2, respectively.
Furthermore, in fig.2, the actual traffic flow is described in blue and red letters. *1

In fig.2, PF and VF of SR-IOV are written respectively, and "VF rep" is written in addition to them.
It should be noted that the bsf (Bus, Slot, Function) numbers of PF and VF rep are the same.

PF VF0 VF0 rep
ens2f0 ens2f0v0 ens2f0_0
07:00.0 07:00.2 07:00.0

rep=representor is an interface specific to swtichdev mode in SR-IOV, and is created by enabling swtichdev mode.
In contrast to swtichdev mode, the conventional SR-IOV VF is called legacy mode and must be explicitly separated from it.
In addition, switchdev mode is a mandatory requirement for ConnectX-6 Dx to enable the vDPA HW offload.

2.Advance preparation

Although not specifically mentioned, SELinux disabling, FW disabling, and NTP time synchronization settings are done in advance.

2-1.Enabling HugePage and IOMMU
sed -i -e "/GRUB_CMDLINE_LINUX=/s/\"$/ default_hugepagesz=1G hugepagesz=1G hugepages=16\"/g" /etc/default/grub
sed -i -e "/GRUB_CMDLINE_LINUX=/s/\"$/ intel_iommu=on iommu=pt pci=realloc\"/g" /etc/default/grub
grub2-mkconfig -o /etc/grub2.cfg

Next, implement the mount settings for HugePage. It will be mounted automatically the next time the OS boots.

vi /etc/fstab

nodev  /dev/hugepages hugetlbfs pagesize=1GB    0 0
2-2.SR-IOV VF settings

Configure the SR-IOV VF settings; you can increase the number of VFs, but for the sake of simplicity, we have set the number of VFs to "1". In addition, setting the MAC address is mandatory. *2

vi /etc/rc.local

echo 1 > /sys/class/net/ens2f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens2f1/device/sriov_numvfs
sleep 1
ip link set ens2f0 vf 0 mac 00:11:22:33:44:00
ip link set ens2f1 vf 0 mac 00:11:22:33:44:10
sleep 1
exit 0

chmod +x /etc/rc.d/rc.local
2-3.Install the Mellanox driver (OFED)

You can download the iso file from the Mellanox website.Mellanox Download Site
Please save the downloaded iso file to /root/tmp/.
The following command will install the Mellanox driver, but it will also install ovs v2.14.1 at the same time.

dnf -y install tcl tk unbound && \
mount -t iso9660 -o loop /root/tmp/MLNX_OFED_LINUX-5.2-2.2.0.0-rhel8.3-x86_64.iso /mnt && \
/mnt/mlnxofedinstall --upstream-libs --dpdk --ovs-dpdk --with-mft --with-mstflint

After the installation is complete, reboot.

reboot

After the reboot is complete, check the HugePage.

cat /proc/meminfo | grep Huge
grep hugetlbfs /proc/mounts

[root@c83g155 ~]# cat /proc/meminfo | grep Huge
AnonHugePages:    452608 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:      16
HugePages_Free:       16
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:        16777216 kB

[root@c83g155 ~]# grep hugetlbfs /proc/mounts
nodev /dev/hugepages hugetlbfs rw,relatime,pagesize=1024M 0 0

3.Kernel update

As of April 8, 2021, the vDPA-related modules are updated at a high frequency, so install the latest Kernel.

3-2.Installation of Kernel
dnf list installed | grep kernel
dnf -y --enablerepo=elrepo-kernel install kernel-ml kernel-ml-devel
dnf list installed | grep kernel
reboot

Check the currently installed Kernel.
Install kernel-ml and kernel-ml-devel *3
Check the installed Kernel.
Reboot

3-3.Install Kernel headers, etc.
uname -r
dnf -y swap --enablerepo=elrepo-kernel kernel-headers -- kernel-ml-headers && \
dnf -y remove kernel-tools kernel-tools-libs && \
dnf -y --enablerepo=elrepo-kernel install kernel-ml-tools kernel-ml-tools-libs
dnf list installed | grep kernel

Check the currently running Kernel Version.
Install kernel-headers.
Remove the existing kernel-tools kernel-tools-libs
Install kernel-tools kernel-tools-libs
Check the installed Kernel.

If you get the following output, you are good to go.

[root@c83g155 ~]# dnf list installed | grep kernel
kernel.x86_64                                      4.18.0-240.el8                                @anaconda
kernel-core.x86_64                                 4.18.0-240.el8                                @anaconda
kernel-devel.x86_64                                4.18.0-240.el8                                @anaconda
kernel-ml.x86_64                                   5.11.11-1.el8.elrepo                          @elrepo-kernel
kernel-ml-core.x86_64                              5.11.11-1.el8.elrepo                          @elrepo-kernel
kernel-ml-devel.x86_64                             5.11.11-1.el8.elrepo                          @elrepo-kernel
kernel-ml-headers.x86_64                           5.11.11-1.el8.elrepo                          @elrepo-kernel
kernel-ml-modules.x86_64                           5.11.11-1.el8.elrepo                          @elrepo-kernel
kernel-ml-tools.x86_64                             5.11.11-1.el8.elrepo                          @elrepo-kernel
kernel-ml-tools-libs.x86_64                        5.11.11-1.el8.elrepo                          @elrepo-kernel
kernel-modules.x86_64                              4.18.0-240.el8                                @anaconda
kmod-kernel-mft-mlnx.x86_64                        4.16.1-1.rhel8u3                              @System
kmod-mlnx-ofa_kernel.x86_64                        5.2-OFED.5.2.2.2.0.1.rhel8u3                  @System
mlnx-ofa_kernel.x86_64                             5.2-OFED.5.2.2.2.0.1.rhel8u3                  @System
mlnx-ofa_kernel-devel.x86_64                       5.2-OFED.5.2.2.2.0.1.rhel8u3                  @System

4.Building qemu

4-1.Enabling the PowerTools Repository
vi /etc/yum.repos.d/CentOS-Linux-PowerTools.repo

enable=1
4-2.Install the necessary packages

In addition to qemu, we have also installed the packages that are required for the dpdk build.

dnf -y install cmake gcc libnl3-devel libudev-devel make numactl numactl-devel \
pkgconfig valgrind-devel pandoc libibverbs libmlx5 libmnl-devel meson ninja-build \
glibc-utils glib2 glib2-devel pixman pixman-devel zlib zlib-devel \
usbredir-devel spice-server-devel && \
wget https://cbs.centos.org/kojifiles/packages/pyelftools/0.26/1.el8/noarch/python3-pyelftools-0.26-1.el8.noarch.rpm && \
dnf -y localinstall python3-pyelftools-0.26-1.el8.noarch.rpm
4-3.Building qemu
cd /usr/src && \
git clone https://github.com/qemu/qemu.git && \
cd qemu/ && \
git checkout v6.0.0-rc1 && \
mkdir build && \
cd build/ && \
../configure --enable-vhost-vdpa --target-list=x86_64-softmmu && \
make -j && \
make install

Checking Version after Installation

/usr/local/bin/qemu-system-x86_64 --version

[root@c83g155 ~]# /usr/local/bin/qemu-system-x86_64 --version
QEMU emulator version 5.2.91 (v6.0.0-rc1)
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
4-4.Change qemu execution path
mv /usr/libexec/qemu-kvm /usr/libexec/qemu-kvm.org
ln -s /usr/local/bin/qemu-system-x86_64 /usr/libexec/qemu-kvm
4-5.Change the user to run qemu
vi /etc/libvirt/qemu.conf

user = "root"  #comment out
group = "root"  #comment out

5.Building dpdk

5-1.Building dpdk
cd /usr/src/ && \
git clone git://dpdk.org/dpdk && \
cd dpdk && \
git checkout v21.02 && \
meson -Dexamples=all build && \
ninja -C build && \
ninja -C build install
5-2.Links to dpdk-related libraries

Create a new file with vi and include the path of lib.

vi /etc/ld.so.conf.d/libdpdk.conf

/usr/src/dpdk/build/lib

After running ldconfig, make sure the libs are linked.

ldconfig
ldconfig -p |grep dpdk

It is OK if it is pointed as follows.

[root@c83g155 dpdk]# ldconfig -p |grep dpdk
        librte_vhost.so.21 (libc6,x86-64) => /usr/src/dpdk/build/lib/librte_vhost.so.21
        librte_vhost.so (libc6,x86-64) => /usr/src/dpdk/build/lib/librte_vhost.so
        librte_timer.so.21 (libc6,x86-64) => /usr/src/dpdk/build/lib/librte_timer.so.21
============ s n i p ============

Now, reboot once again.

reboot

6.Change to SR-IOV switchdev mode

6-1.Check the current operation mode.
lshw -businfo -c network
devlink dev eswitch show pci/0000:07:00.0
devlink dev eswitch show pci/0000:07:00.1

Check the bsf (bus, slot, function) number of the PCI device.
Check the status of 07:00.0 (ens2f0)
Check the status of 07:00.1 (ens2f1)

The output will look like the following

[root@c83g155 ~]# lshw -businfo -c network
Bus info          Device      Class          Description
========================================================
pci@0000:04:00.0  ens1f0      network        82599ES 10-Gigabit SFI/SFP+ Network Connection
pci@0000:04:00.1  ens1f1      network        82599ES 10-Gigabit SFI/SFP+ Network Connection
pci@0000:03:00.0  eno1        network        NetXtreme BCM5719 Gigabit Ethernet PCIe
pci@0000:03:00.1  eno2        network        NetXtreme BCM5719 Gigabit Ethernet PCIe
pci@0000:03:00.2  eno3        network        NetXtreme BCM5719 Gigabit Ethernet PCIe
pci@0000:03:00.3  eno4        network        NetXtreme BCM5719 Gigabit Ethernet PCIe
pci@0000:07:00.0  ens2f0      network        MT2892 Family [ConnectX-6 Dx]
pci@0000:07:00.1  ens2f1      network        MT2892 Family [ConnectX-6 Dx]
pci@0000:07:00.2  ens2f0v0    network        ConnectX Family mlx5Gen Virtual Function
pci@0000:07:01.2  ens2f1v0    network        ConnectX Family mlx5Gen Virtual Function

[root@c83g155 ~]# devlink dev eswitch show pci/0000:07:00.0
pci/0000:07:00.0: mode legacy inline-mode none encap disable

[root@c83g155 ~]# devlink dev eswitch show pci/0000:07:00.1
pci/0000:07:00.1: mode legacy inline-mode none encap disable
6-2.Changing the operating mode

Note that the bsf numbers are slightly different.*4

echo 0000:07:00.2 > /sys/bus/pci/drivers/mlx5_core/unbind && \
echo 0000:07:01.2 > /sys/bus/pci/drivers/mlx5_core/unbind && \
devlink dev eswitch set pci/0000:07:00.0 mode switchdev && \
devlink dev eswitch set pci/0000:07:00.1 mode switchdev && \
echo 0000:07:00.2 > /sys/bus/pci/drivers/mlx5_core/bind && \
echo 0000:07:01.2 > /sys/bus/pci/drivers/mlx5_core/bind

Unbind the mlx5_core driver for VF.

07:00.2 ens2f0v0
07:01.2 ens2f1v0

Change the PF operation mode to switchdev.

07:00.0 ens2f0
07:00.1 ens2f1

Rebind the mlx5_core driver of VF.

07:00.2 ens2f0v0
07:01.2 ens2f1v0
6-3.Check the operation mode after the change.
devlink dev eswitch show pci/0000:07:00.0
devlink dev eswitch show pci/0000:07:00.1

Changed to switchdev mode.

[root@c83g155 ~]# devlink dev eswitch show pci/0000:07:00.0
pci/0000:07:00.0: mode switchdev inline-mode none encap enable

[root@c83g155 ~]# devlink dev eswitch show pci/0000:07:00.1
pci/0000:07:00.1: mode switchdev inline-mode none encap enable

VF Representer has been added.

[root@c83g155 ~]# lshw -businfo -c network                                                                                   Bus info          Device      Class          Description
========================================================
pci@0000:04:00.0  ens1f0      network        82599ES 10-Gigabit SFI/SFP+ Network Connection
pci@0000:04:00.1  ens1f1      network        82599ES 10-Gigabit SFI/SFP+ Network Connection
pci@0000:03:00.0  eno1        network        NetXtreme BCM5719 Gigabit Ethernet PCIe
pci@0000:03:00.1  eno2        network        NetXtreme BCM5719 Gigabit Ethernet PCIe
pci@0000:03:00.2  eno3        network        NetXtreme BCM5719 Gigabit Ethernet PCIe
pci@0000:03:00.3  eno4        network        NetXtreme BCM5719 Gigabit Ethernet PCIe
pci@0000:07:00.0  ens2f0      network        MT2892 Family [ConnectX-6 Dx]
pci@0000:07:00.1  ens2f1      network        MT2892 Family [ConnectX-6 Dx]
pci@0000:07:00.2  ens2f0v0    network        ConnectX Family mlx5Gen Virtual Function
pci@0000:07:01.2  ens2f1v0    network        ConnectX Family mlx5Gen Virtual Function
pci@0000:07:00.0  ens2f0_0    network        Ethernet interface
pci@0000:07:00.1  ens2f1_0    network        Ethernet interface

In addition, make sure that the HW offload function of the NIC is enabled.

ethtool -k ens2f0 |grep tc
ethtool -k ens2f1 |grep tc

[root@c83g155 ~]# ethtool -k ens2f0 |grep tc
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
hw-tc-offload: on

[root@c83g155 ~]# ethtool -k ens2f1 |grep tc
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
hw-tc-offload: on

7.Configure ovs-dpdk and VM : Different from previous article

7-1.Overall Flow - Overview -

Configure the settings in the order (1)-(9) described in fig.1 below.
fig.1
f:id:metonymical:20210412202928j:plain

  1. Enabling the virtio_vdpa module and configuring dpdk-vdpa : (1) : Different from previous article
  2. Initial configuration of ovs
  3. Configuration of br30-ovs: (2)(3)(4)
  4. Configuration of br31-ovs: (5)(6)(7)
  5. Configure and start virtual machine c77g153: (8) : Different from previous article
  6. Configure and start virtual machine c77g159: (9) : Different from previous article
7-2.Overall flow - Commands only -

We will throw in the following commands.
Detailed explanations will follow, but if you don't need the explanations, just execute the commands.

1.Enabling the virtio_vdpa module and configuring dpdk-vdpa
(1)
modprobe virtio_vdpa

/usr/src/dpdk/build/examples/dpdk-vdpa \
--socket-mem 1024,1024 \
-a 0000:07:00.2,class=vdpa \
-a 0000:07:01.2,class=vdpa \
--log-level=pmd,debug -- -i

create /tmp/sock-virtio0 0000:07:00.2
create /tmp/sock-virtio1 0000:07:01.2

2.Initial configuration of ovs
systemctl start openvswitch
ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl set Open_vSwitch . other_config:hw-offload=true other_config:tc-policy=none
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=1024,1024
ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
ovs-vsctl set Open_vSwitch . other_config:dpdk-extra=" \
-w 0000:07:00.0,representor=[0],dv_flow_en=1,dv_esw_en=1,dv_xmeta_en=0 \
-w 0000:07:00.1,representor=[0],dv_flow_en=1,dv_esw_en=1,dv_xmeta_en=0"
systemctl restart openvswitch

3.Configuration of br30-ovs
(2)
ovs-vsctl add-br br30-ovs -- set bridge br30-ovs datapath_type=netdev
(3)
ovs-vsctl add-port br30-ovs ens2f0 -- set Interface ens2f0 type=dpdk options:dpdk-devargs=0000:07:00.0
(4)
ovs-vsctl add-port br30-ovs ens2f0_0 -- set Interface ens2f0_0 type=dpdk options:dpdk-devargs=0000:07:00.0,representor=[0]

4.Configuration of br31-ovs
(5)
ovs-vsctl add-br br31-ovs -- set bridge br31-ovs datapath_type=netdev
(6)
ovs-vsctl add-port br31-ovs ens2f1 -- set Interface ens2f1 type=dpdk options:dpdk-devargs=0000:07:00.1
(7)
ovs-vsctl add-port br31-ovs ens2f1_0 -- set Interface ens2f1_0 type=dpdk options:dpdk-devargs=0000:07:00.1,representor=[0]

5.Configure and start virtual machine c77g153
(8)
virsh edit c77g153
<currentMemory unit='KiB'>4194304</currentMemory>
<memoryBacking>
  <hugepages>
    <page size='1048576' unit='KiB'/>
  </hugepages>
</memoryBacking>

<cpu mode='custom' match='exact' check='partial'>
  <numa>
    <cell id='0' cpus='0-1' memory='4194304' unit='KiB' memAccess='shared'/>
  </numa>
</cpu>

virt-xml c77g153 --edit --qemu-commandline='-mem-prealloc'
virt-xml c77g153 --edit --qemu-commandline='-chardev'
virt-xml c77g153 --edit --qemu-commandline='socket,id=charnet1,path=/tmp/sock-virtio0'
virt-xml c77g153 --edit --qemu-commandline='-netdev'
virt-xml c77g153 --edit --qemu-commandline='vhost-user,chardev=charnet1,queues=16,id=hostnet1'
virt-xml c77g153 --edit --qemu-commandline='-device'
virt-xml c77g153 --edit --qemu-commandline='virtio-net-pci,mq=on,vectors=6,netdev=hostnet1,id=net1,mac=00:11:22:33:44:00,addr=0x6,page-per-vq=on,rx_queue_size=1024,tx_queue_size=1024'

6.Configure and start virtual machine c77g159
(9)
virsh edit c77g159
<currentMemory unit='KiB'>4194304</currentMemory>
<memoryBacking>
  <hugepages>
    <page size='1048576' unit='KiB'/>
  </hugepages>
</memoryBacking>

<cpu mode='custom' match='exact' check='partial'>
  <numa>
    <cell id='0' cpus='0-1' memory='4194304' unit='KiB' memAccess='shared'/>
  </numa>
</cpu>

virt-xml c77g159 --edit --qemu-commandline='-mem-prealloc'
virt-xml c77g159 --edit --qemu-commandline='-chardev'
virt-xml c77g159 --edit --qemu-commandline='socket,id=charnet2,path=/tmp/sock-virtio1'
virt-xml c77g159 --edit --qemu-commandline='-netdev'
virt-xml c77g159 --edit --qemu-commandline='vhost-user,chardev=charnet2,queues=16,id=hostnet2'
virt-xml c77g159 --edit --qemu-commandline='-device'
virt-xml c77g159 --edit --qemu-commandline='virtio-net-pci,mq=on,vectors=6,netdev=hostnet2,id=net1,mac=00:11:22:33:44:10,addr=0x7,page-per-vq=on,rx_queue_size=1024,tx_queue_size=1024'
7-3.Enabling the virtio_vdpa module and configuring dpdk-vdpa:(1) : Different from previous article

Enabling the virtio_vdpa module
We will check the changes before and after executing the modprobe virtio_vdpa command.

Before running modprobe virtio_vdpa

lsmod |grep vd
ls -Fal /sys/bus/vdpa/drivers/virtio_vdpa

[root@c83g155 ~]# lsmod |grep vd
mlx5_vdpa              45056  0
vhost_iotlb            16384  2 vhost,mlx5_vdpa
vdpa                   16384  1 mlx5_vdpa
mlx5_core            1216512  2 mlx5_vdpa,mlx5_ib

[root@c83g155 ~]# ls -Fal /sys/bus/vdpa/drivers/virtio_vdpa
ls: cannot access '/sys/bus/vdpa/drivers/virtio_vdpa': No such file or directory

After running modprobe virtio_vdpa

modprobe virtio_vdpa
lsmod |grep vd
ls -Fal /sys/bus/vdpa/drivers/virtio_vdpa

[root@c83g155 ~]# lsmod |grep vd
virtio_vdpa            16384  0
mlx5_vdpa              45056  0
vhost_iotlb            16384  1 mlx5_vdpa
vdpa                   16384  2 virtio_vdpa,mlx5_vdpa
mlx5_core            1216512  2 mlx5_vdpa,mlx5_ib

[root@c83g155 ~]# ls -Fal /sys/bus/vdpa/drivers/virtio_vdpa
total 0
drwxr-xr-x 2 root root    0 Apr 12 21:00 ./
drwxr-xr-x 3 root root    0 Apr 12 21:00 ../
--w------- 1 root root 4096 Apr 12 21:00 bind
lrwxrwxrwx 1 root root    0 Apr 12 21:00 module -> ../../../../module/virtio_vdpa/
--w------- 1 root root 4096 Apr 12 21:00 uevent
--w------- 1 root root 4096 Apr 12 21:00 unbind
lrwxrwxrwx 1 root root    0 Apr 12 21:00 vdpa0 -> ../../../../devices/pci0000:00/0000:00:03.0/0000:07:00.2/vdpa0/
lrwxrwxrwx 1 root root    0 Apr 12 21:00 vdpa1 -> ../../../../devices/pci0000:00/0000:00:03.0/0000:07:01.2/vdpa1/

From the above output results, we can confirm the following.

  • 0000:07:00.2/vdpa0 and 0000:07:01.2/vdpa1 are controlled by the virtio_vdpa driver

Configuring dpdk-vdpa
Next, run the dpdk-vdpa command.

/usr/src/dpdk/build/examples/dpdk-vdpa \
--socket-mem 1024,1024 \
-a 0000:07:00.2,class=vdpa \
-a 0000:07:01.2,class=vdpa \
--log-level=pmd,debug -- -i

When the prompt changes to "vdpa>", execute the following command.

create /tmp/sock-virtio0 0000:07:00.2
create /tmp/sock-virtio1 0000:07:01.2

Connect to the host OS via ssh in another terminal and confirm that the sock file has been generated using the following command.

[root@c83g155 ~]# ls -Fal /tmp
total 36
drwxrwxrwt. 17 root root 4096 Apr 12 21:08 ./
dr-xr-xr-x. 17 root root  244 Apr  7 20:30 ../
-rw-r--r--   1 root root 1874 Apr  7 20:30 anaconda.log
===================== s n i p =====================
srwxr-xr-x   1 root root    0 Apr 12 21:08 sock-virtio0=
srwxr-xr-x   1 root root    0 Apr 12 21:08 sock-virtio1=
drwx------   3 root root   17 Apr 12 19:56 systemd-private-f5b122148a7c4019be8cf0116bd9f2cc-chronyd.service-IEe7hb/
===================== s n i p =====================

Note
The following is an example of output from the dpdk-vdpa command.

[root@c83g155 ~]# /usr/src/dpdk/build/examples/dpdk-vdpa \
> --socket-mem 1024,1024 \
> -a 0000:07:00.2,class=vdpa \
> -a 0000:07:01.2,class=vdpa \
> --log-level=pmd,debug -- -i
EAL: Detected 16 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available 2048 kB hugepages reported
EAL: Probing VFIO support...
EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:07:00.2 (socket 0)
mlx5_vdpa: Checking device "mlx5_3"..
mlx5_vdpa: Checking device "mlx5_2"..
mlx5_vdpa: PCI information matches for device "mlx5_2".
common_mlx5: Netlink "devlink" family ID is 20.
common_mlx5: ROCE is enabled for device "0000:07:00.2".
common_mlx5: Device 0000:07:00.2 ROCE was disabled by Netlink successfully.
common_mlx5: Device "0000:07:00.2" was reloaded by Netlink successfully.
mlx5_vdpa: ROCE is disabled by Netlink successfully.
mlx5_vdpa: Checking device "mlx5_3"..
mlx5_vdpa: Checking device "mlx5_1"..
mlx5_vdpa: Checking device "mlx5_0"..
mlx5_vdpa: Checking device "mlx5_2"..
mlx5_vdpa: event mode is 1.
mlx5_vdpa: event_us is 0 us.
mlx5_vdpa: no traffic time is 2 s.
EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:07:01.2 (socket 0)
mlx5_vdpa: Checking device "mlx5_3"..
mlx5_vdpa: PCI information matches for device "mlx5_3".
common_mlx5: Netlink "devlink" family ID is 20.
common_mlx5: ROCE is enabled for device "0000:07:01.2".
common_mlx5: Device 0000:07:01.2 ROCE was disabled by Netlink successfully.
common_mlx5: Device "0000:07:01.2" was reloaded by Netlink successfully.
mlx5_vdpa: ROCE is disabled by Netlink successfully.
mlx5_vdpa: Checking device "mlx5_1"..
mlx5_vdpa: Checking device "mlx5_0"..
mlx5_vdpa: Checking device "mlx5_2"..
mlx5_vdpa: Checking device "mlx5_3"..
mlx5_vdpa: event mode is 1.
mlx5_vdpa: event_us is 0 us.
mlx5_vdpa: no traffic time is 2 s.
EAL: No legacy callbacks, legacy socket not created
Interactive-mode selected
vdpa>                                     < < < < After executing the command, the prompt changes to "vdpa>".
vdpa> create /tmp/sock-virtio0 0000:07:00.2
VHOST_CONFIG: vhost-user server: socket created, fd: 83
VHOST_CONFIG: bind to /tmp/sock-virtio0
vdpa> create /tmp/sock-virtio1 0000:07:01.2
VHOST_CONFIG: vhost-user server: socket created, fd: 86
VHOST_CONFIG: bind to /tmp/sock-virtio1
vdpa>

Please keep this terminal as it is, as we will use it in the operation check later.

7-4.Initial configuration of ovs

Since ovs has already been installed, start the service from systemctl.*5

systemctl start openvswitch
ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl set Open_vSwitch . other_config:hw-offload=true other_config:tc-policy=none
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=1024,1024
ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
ovs-vsctl set Open_vSwitch . other_config:dpdk-extra=" \
-w 0000:07:00.0,representor=[0],dv_flow_en=1,dv_esw_en=1,dv_xmeta_en=0 \
-w 0000:07:00.1,representor=[0],dv_flow_en=1,dv_esw_en=1,dv_xmeta_en=0"
systemctl restart openvswitch

Start the ovs service
Initialize dpdk
HW offload and tc-policy configuration
Memory allocation
IOMMU configuration for vhost
Configure representer
Restart the ovs service (to reflect the above settings)

Use the following command to check the settings.

ovs-vsctl get Open_vSwitch . other_config

[root@c83g155 ~]# ovs-vsctl get Open_vSwitch . other_config
{dpdk-extra=" -w 0000:07:00.0,representor=[0],dv_flow_en=1,dv_esw_en=1,dv_xmeta_en=0 -w 0000:07:00.1,representor=[0],dv_flow_en=1,dv_esw_en=1,dv_xmeta_en=0", dpdk-init="true", dpdk-socket-mem="1024,1024", hw-offload="true", tc-policy=none, vhost-iommu-support="true"}

Note 1:
Here is a supplementary explanation of other_config:dpdk-extra.
There is the following correspondence between the output results of "lshw -businfo -c network" and the commands configured in "other_config:dpdk-extra".

0000:07:00.0 ens2f0_0 -w 0000:07:00.0,representor=[0]
0000:07:00.1 ens2f1_0 -w 0000:07:00.1,representor=[0]

Note 2:
Here is a supplementary explanation of other_config:tc-policy.
The following options can be set for tc-policy.

none adds a TC rule to both the software and the hardware (default)
skip_sw adds a TC rule only to the hardware
skip_hw adds a TC rule only to the software

Note 3:
If you want to remove the configuration, execute the command as follows.
"dpdk-extra" is the key, so specify any key you want to delete, such as "dpdk-init" or "hw-offload".

ovs-vsctl remove Open_vSwitch . other_config dpdk-extra
7-5.Configuration of br30-ovs : (2)(3)(4)

Create the first bridge.

(2)
ovs-vsctl add-br br30-ovs -- set bridge br30-ovs datapath_type=netdev
(3)
ovs-vsctl add-port br30-ovs ens2f0 -- set Interface ens2f0 type=dpdk options:dpdk-devargs=0000:07:00.0
(4)
ovs-vsctl add-port br30-ovs ens2f0_0 -- set Interface ens2f0_0 type=dpdk options:dpdk-devargs=0000:07:00.0,representor=[0]

(2) Create a bridge
(3) Create the uplink (specify PF and set the interface for the external NW)
(4) Create downlink (specify VF Representer and set up the interface for VM)

Check the settings with the following command.

[root@c83g155 ~]# ovs-vsctl show
59a34ea2-ca80-48b9-8b14-a656c79bc451
    Bridge br30-ovs
        datapath_type: netdev
        Port br30-ovs
            Interface br30-ovs
                type: internal
        Port ens2f0_0
            Interface ens2f0_0
                type: dpdk
                options: {dpdk-devargs="0000:07:00.0,representor=[0]"}
        Port ens2f0
            Interface ens2f0
                type: dpdk
                options: {dpdk-devargs="0000:07:00.0"}
    ovs_version: "2.14.1"
7-6.Configuration of br31-ovs : (5)(6)(7)

Create the second bridge.

(5)
ovs-vsctl add-br br31-ovs -- set bridge br31-ovs datapath_type=netdev
(6)
ovs-vsctl add-port br31-ovs ens2f1 -- set Interface ens2f1 type=dpdk options:dpdk-devargs=0000:07:00.1
(7)
ovs-vsctl add-port br31-ovs ens2f1_0 -- set Interface ens2f1_0 type=dpdk options:dpdk-devargs=0000:07:00.1,representor=[0]

Same as (2), (3), and (4).

Check the settings with the following command. The blue text is the part that has been added.

[root@c83g155 ~]# ovs-vsctl show
59a34ea2-ca80-48b9-8b14-a656c79bc451
    Bridge br31-ovs
        datapath_type: netdev
        Port ens2f1_0
            Interface ens2f1_0
                type: dpdk
                options: {dpdk-devargs="0000:07:00.1,representor=[0]"}
        Port ens2f1
            Interface ens2f1
                type: dpdk
                options: {dpdk-devargs="0000:07:00.1"}
        Port br31-ovs
            Interface br31-ovs
                type: internal
    Bridge br30-ovs
        datapath_type: netdev
        Port br30-ovs
            Interface br30-ovs
                type: internal
        Port ens2f0_0
            Interface ens2f0_0
                type: dpdk
                options: {dpdk-devargs="0000:07:00.0,representor=[0]"}
        Port ens2f0
            Interface ens2f0
                type: dpdk
                options: {dpdk-devargs="0000:07:00.0"}
    ovs_version: "2.14.1"
7-7.Configure and start virtual machine c77g153 : (8) : Different from previous article

Please upload the qcow2 file to "/var/lib/libvirt/images/".
In this article, the qcow2 file with CentOS7.7 installed was prepared beforehand.
Additionally, once you have created a virtual machine with virt-manager, you will edit it with the "virsh edit" and "virt-xml" commands.*6

Login to the host OS via VNC or other means, and start virt-manager.
When creating a new virtual machine, delete the following [1]-[5] devices.*7
f:id:metonymical:20210412213049j:plain
After booting the VM, shutdown it once.
After shutdown, the device configuration should look like the following.
The NICs listed here are not used in vDPA, but they will allow you to ssh to them, so if you need to, assign a management IP to them.
f:id:metonymical:20210412212616j:plain

After shutdown, use the virsh edit command to perform the following settings.

(8)
virsh edit c77g153

<currentMemory unit='KiB'>4194304</currentMemory>
<memoryBacking>
  <hugepages>
    <page size='1048576' unit='KiB'/>
  </hugepages>
</memoryBacking>

<cpu mode='custom' match='exact' check='partial'>
  <numa>
    <cell id='0' cpus='0-1' memory='4194304' unit='KiB' memAccess='shared'/>
  </numa>
</cpu>

After returning to the bash, further configure the following settings using the virt-xml command.

(8)
virt-xml c77g153 --edit --qemu-commandline='-mem-prealloc'
virt-xml c77g153 --edit --qemu-commandline='-chardev'
virt-xml c77g153 --edit --qemu-commandline='socket,id=charnet1,path=/tmp/sock-virtio0'
virt-xml c77g153 --edit --qemu-commandline='-netdev'
virt-xml c77g153 --edit --qemu-commandline='vhost-user,chardev=charnet1,queues=16,id=hostnet1'
virt-xml c77g153 --edit --qemu-commandline='-device'
virt-xml c77g153 --edit --qemu-commandline='virtio-net-pci,mq=on,vectors=6,netdev=hostnet1,id=net1,mac=00:11:22:33:44:00,addr=0x6,page-per-vq=on,rx_queue_size=1024,tx_queue_size=1024'
-mem-prealloc We haven't been able to confirm the details, but it seems to be a mandatory setting since it is used for exchanging virtqueue with PlatformIOMMU from fig.2.
path=/tmp/sock-virtio0 Explicitly specify the sock file for dpdk-vdpa.
mq=on This is the setting for using multi-queue.
page-per-vq=on This setting is required to use virtqueue.

Note
When you run the virt-xml command, you will see the following WARNING message, please ignore it.

WARNING  XML did not change after domain define. You may have changed a value that libvirt is setting by default.
7-8.Configure and start virtual machine c77g159 : (9) : Different from previous article

Same as 7-7, except /tmp/sock-virtio1.

(9)
virsh edit c77g159

<currentMemory unit='KiB'>4194304</currentMemory>
<memoryBacking>
  <hugepages>
    <page size='1048576' unit='KiB'/>
  </hugepages>
</memoryBacking>

<cpu mode='custom' match='exact' check='partial'>
  <numa>
    <cell id='0' cpus='0-1' memory='4194304' unit='KiB' memAccess='shared'/>
  </numa>
</cpu>

After returning to the bash, further configure the following settings using the virt-xml command.

(9)
virt-xml c77g159 --edit --qemu-commandline='-mem-prealloc'
virt-xml c77g159 --edit --qemu-commandline='-chardev'
virt-xml c77g159 --edit --qemu-commandline='socket,id=charnet2,path=/tmp/sock-virtio1'
virt-xml c77g159 --edit --qemu-commandline='-netdev'
virt-xml c77g159 --edit --qemu-commandline='vhost-user,chardev=charnet2,queues=16,id=hostnet2'
virt-xml c77g159 --edit --qemu-commandline='-device'
virt-xml c77g159 --edit --qemu-commandline='virtio-net-pci,mq=on,vectors=6,netdev=hostnet2,id=net1,mac=00:11:22:33:44:10,addr=0x7,page-per-vq=on,rx_queue_size=1024,tx_queue_size=1024'

8.Operation check : Different from previous article

8-1.advance preparation

Prepare five consoles on hostOS c83g155.

ConsoleA Already activated at 7-3 To refer to the dpdk-vdpa log
ConsoleB watch ovs-ofctl -O OpenFlow14 dump-ports br30-ovs To check the packet count on c77g153
ConsoleC watch ovs-ofctl -O OpenFlow14 dump-ports br31-ovs To check the packet count on c77g159
ConsoleD virsh start c77g153; virsh console c77g153 For the console of virtual machine c77g153
ConsoleE virsh start c77g159; virsh console c77g159 For the console of virtual machine c77g159
8-2.Booting the VM

ConsoleA has been started in debug mode when the dpdk-vdpa command was executed in 7-3.
For ConsoleB and C, please run the above commands before starting the VM.
Then, for ConsoleD, start c77g153 with the above command.
After waiting for a few seconds, ConsoleE will start c77g159 with the above command.
Send a ping from c77g153 or c77g159.
As an example, follow fig.1 and execute ping 192.168.30.159 -f from c77g153.

fig.1
f:id:metonymical:20210412202928j:plain

The following is the output result. The points of interest are in red.

ConsoleA
The ConsoleA log is an excerpt.
The full output has been saved to this link.

vdpa> 
VHOST_CONFIG: new vhost user connection is 87
VHOST_CONFIG: new device, handle is 0
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
VHOST_CONFIG: read message VHOST_USER_GET_PROTOCOL_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_PROTOCOL_FEATURES
===================== s n i p =====================
VHOST_CONFIG: read message VHOST_USER_SET_FEATURES
VHOST_CONFIG: negotiated Virtio features: 0x140601803
VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE
VHOST_CONFIG: guest memory region size: 0x80000000
         guest physical addr: 0x0
         guest virtual  addr: 0x7faa40000000
         host  virtual  addr: 0x7f8080000000
         mmap addr : 0x7f8080000000
         mmap size : 0x80000000
         mmap align: 0x40000000
         mmap off  : 0x0
VHOST_CONFIG: guest memory region size: 0x80000000
         guest physical addr: 0x100000000
         guest virtual  addr: 0x7faac0000000
         host  virtual  addr: 0x7f8000000000
         mmap addr : 0x7f7f80000000
         mmap size : 0x100000000
         mmap align: 0x40000000
         mmap off  : 0x80000000
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
===================== s n i p =====================
new port /tmp/sock-virtio0, device : 0000:07:00.2
mlx5_vdpa: Cannot get vhost MTU - -95.
mlx5_vdpa: MTU cannot be set on device 0000:07:00.2.
mlx5_vdpa: Region 0: HVA 0x7f8080000000, GPA 0x0, size 0x80000000.
mlx5_vdpa: Region 1: HVA 0x7f8000000000, GPA 0x100000000, size 0x80000000.
mlx5_vdpa: Indirect mkey mode is KLM Fixed Buffer Size.
mlx5_vdpa: Memory registration information: nregions = 2, mem_size = 0x180000000, GCD = 0x80000000, klm_fbs_entries_num = 0x3, klm_entries_num = 0x3.
mlx5_vdpa: Dump fill Mkey = 1792.
mlx5_vdpa: Registered error interrupt for device0.
mlx5_vdpa: VAR address of doorbell mapping is 0x7f8157669000.
mlx5_vdpa: vid 0: Init last_avail_idx=0, last_used_idx=0 for virtq 0.
mlx5_vdpa: Register fd 123 interrupt for virtq 0.
mlx5_vdpa: vid 0 virtq 0 was created successfully.
mlx5_vdpa: Virtq 0 notifier state is enabled.
mlx5_vdpa: Ring virtq 0 doorbell.
mlx5_vdpa: vid 0: Init last_avail_idx=0, last_used_idx=0 for virtq 1.
mlx5_vdpa: Register fd 89 interrupt for virtq 1.
mlx5_vdpa: vid 0 virtq 1 was created successfully.
mlx5_vdpa: Virtq 1 notifier state is enabled.
mlx5_vdpa: Ring virtq 1 doorbell.
mlx5_vdpa: vDPA device 0 was configured.
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:127
mlx5_vdpa: Update virtq 1 status enable -> disable.
mlx5_vdpa: vid 0 virtq 1 was stopped.
mlx5_vdpa: Query vid 0 vring 1: hw_available_idx=0, hw_used_index=0
mlx5_vdpa: Update virtq 1 status disable -> enable.
mlx5_vdpa: vid 0: Init last_avail_idx=0, last_used_idx=0 for virtq 1.
mlx5_vdpa: Register fd 89 interrupt for virtq 1.
mlx5_vdpa: vid 0 virtq 1 was created successfully.
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: set queue enable: 1 to qp idx: 0
===================== s n i p =====================
mlx5_vdpa: Update virtq 2 status disable -> enable.
mlx5_vdpa: vid 0: Init last_avail_idx=0, last_used_idx=0 for virtq 2.
mlx5_vdpa: Register fd 90 interrupt for virtq 2.
mlx5_vdpa: vid 0 virtq 2 was created successfully.
mlx5_vdpa: Virtq 2 notifier state is enabled.
mlx5_vdpa: Ring virtq 2 doorbell.
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: set queue enable: 1 to qp idx: 3
mlx5_vdpa: Update virtq 3 status disable -> enable.
mlx5_vdpa: vid 0: Init last_avail_idx=0, last_used_idx=0 for virtq 3.
mlx5_vdpa: Register fd 91 interrupt for virtq 3.
mlx5_vdpa: vid 0 virtq 3 was created successfully.
===================== s n i p =====================
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: set queue enable: 0 to qp idx: 31
mlx5_vdpa: Virtq 3 notifier state is enabled.
mlx5_vdpa: Ring virtq 3 doorbell.
mlx5_vdpa: Device 0000:07:00.2 virtq 3 cq 2277 event was captured. Timer is off, cq ci is 1.
mlx5_vdpa: Device 0000:07:00.2 virtq 1 cq 2270 event was captured. Timer is on, cq ci is 1.
mlx5_vdpa: Device 0000:07:00.2 traffic was stopped.
mlx5_vdpa: Device 0000:07:00.2 virtq 3 cq 2277 event was captured. Timer is off, cq ci is 18.
mlx5_vdpa: Device 0000:07:00.2 traffic was stopped.

ConsoleB

[root@c83g155 ~]# ovs-ofctl -O OpenFlow14 dump-ports br30-ovs
OFPST_PORT reply (OF1.4) (xid=0x2): 3 ports
  port  ens2f0: rx pkts=159317, bytes=15614385, drop=0, errs=0, frame=?, over=?, crc=?
           tx pkts=159318, bytes=15614457, drop=0, errs=0, coll=?
           duration=173.964s
           rx rfc2819 broadcast_packets=2,
           tx rfc2819 multicast_packets=53, broadcast_packets=1,
           CUSTOM Statistics
                      ovs_tx_failure_drops=0, ovs_tx_mtu_exceeded_drops=0, ovs_tx_qos_drops=0,
                      ovs_rx_qos_drops=0, ovs_tx_invalid_hwol_drops=0, rx_missed_errors=0,
                      rx_errors=0, tx_errors=0, rx_mbuf_allocation_errors=0,
                      rx_q0_errors=0, rx_wqe_errors=0, rx_phy_crc_errors=0,
                      rx_phy_in_range_len_errors=0, rx_phy_symbol_errors=0, tx_phy_errors=0,
                      tx_pp_missed_interrupt_errors=0, tx_pp_rearm_queue_errors=0, tx_pp_clock_queue_errors=0,
                      tx_pp_timestamp_past_errors=0, tx_pp_timestamp_future_errors=0,
  port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=0, bytes=0, drop=54, errs=0, coll=0
           duration=173.957s
  port  "ens2f0_0": rx pkts=159318, bytes=15614457, drop=0, errs=0, frame=?, over=?, crc=?
           tx pkts=159317, bytes=15614385, drop=0, errs=0, coll=?
           duration=173.729s
           CUSTOM Statistics
                      ovs_tx_failure_drops=0, ovs_tx_mtu_exceeded_drops=0, ovs_tx_qos_drops=0,
                      ovs_rx_qos_drops=0, ovs_tx_invalid_hwol_drops=0, rx_missed_errors=0,
                      rx_errors=0, tx_errors=0, rx_mbuf_allocation_errors=0,
                      rx_q0_errors=0, tx_pp_missed_interrupt_errors=0, tx_pp_rearm_queue_errors=0,
                      tx_pp_clock_queue_errors=0, tx_pp_timestamp_past_errors=0, tx_pp_timestamp_future_errors=0,

ConsoleC

[root@c83g155 ~]# ovs-ofctl -O OpenFlow14 dump-ports br31-ovs
OFPST_PORT reply (OF1.4) (xid=0x2): 3 ports
  port  ens2f1: rx pkts=159318, bytes=15614493, drop=0, errs=0, frame=?, over=?, crc=?
           tx pkts=159317, bytes=15614349, drop=0, errs=0, coll=?
           duration=180.549s
           rx rfc2819 broadcast_packets=2,
           tx rfc2819 multicast_packets=53, broadcast_packets=1,
           CUSTOM Statistics
                      ovs_tx_failure_drops=0, ovs_tx_mtu_exceeded_drops=0, ovs_tx_qos_drops=0,
                      ovs_rx_qos_drops=0, ovs_tx_invalid_hwol_drops=0, rx_missed_errors=0,
                      rx_errors=0, tx_errors=0, rx_mbuf_allocation_errors=0,
                      rx_q0_errors=0, rx_wqe_errors=0, rx_phy_crc_errors=0,
                      rx_phy_in_range_len_errors=0, rx_phy_symbol_errors=0, tx_phy_errors=0,
                      tx_pp_missed_interrupt_errors=0, tx_pp_rearm_queue_errors=0, tx_pp_clock_queue_errors=0,
                      tx_pp_timestamp_past_errors=0, tx_pp_timestamp_future_errors=0,
  port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=0, bytes=0, drop=54, errs=0, coll=0
           duration=181.910s
  port  "ens2f1_0": rx pkts=159317, bytes=15614349, drop=0, errs=0, frame=?, over=?, crc=?
           tx pkts=159318, bytes=15614493, drop=0, errs=0, coll=?
           duration=180.861s
           CUSTOM Statistics
                      ovs_tx_failure_drops=0, ovs_tx_mtu_exceeded_drops=0, ovs_tx_qos_drops=0,
                      ovs_rx_qos_drops=0, ovs_tx_invalid_hwol_drops=0, rx_missed_errors=0,
                      rx_errors=0, tx_errors=0, rx_mbuf_allocation_errors=0,
                      rx_q0_errors=0, tx_pp_missed_interrupt_errors=0, tx_pp_rearm_queue_errors=0,
                      tx_pp_clock_queue_errors=0, tx_pp_timestamp_past_errors=0, tx_pp_timestamp_future_errors=0,

ConsoleD

[root@c77g153 ~]# ping 192.168.30.159 -f
PING 192.168.30.159 (192.168.30.159) 56(84) bytes of data.
.
--- 192.168.30.159 ping statistics ---
159288 packets transmitted, 159288 received, 0% packet loss, time 24357ms
rtt min/avg/max/mdev = 0.069/0.086/60.812/0.202 ms, pipe 5, ipg/ewma 0.152/0.101 ms

Note

mlx5_vdpa: Cannot get vhost MTU - -95. The MTU message is output, but there is no problem.
mlx5_vdpa: vid 0 virtq 0 was created successfully. Indicates that the creation of virtq was successful.
mlx5_vdpa: Device 0000:07:00.2 traffic was stopped. You will see this message after a while after starting the virtual machine, but it does not mean that sending and receiving traffic has been stopped, so there is no problem.
ens2f0 "ens2f0_0" You can see that the tx/rx packet count and byte count for each port are increasing.

That's all.

9.Finally

We referred to the following website.
https://www.redhat.com/en/blog?search=vdpa
https://docs.mellanox.com/pages/viewpage.action?pageId=43718786
https://community.mellanox.com/s/article/Basic-Debug-utilities-with-OVS-DPDK-offload-ASAP-Direct
https://static.sched.com/hosted_files/dpdkuserspace2020/ab/vDPA%20-%20DPDK%20Userspace%202020.pdf
https://netdevconf.info/1.2/slides/oct6/04_gerlitz_efraim_introduction_to_switchdev_sriov_offloads.pdf
https://www.mail-archive.com/dev@dpdk.org/msg175938.html
https://www.spinics.net/lists/netdev/msg693858.html
http://yunazuno.hatenablog.com/entry/2018/07/08/215118
https://ameblo.jp/makototgc/entry-12579674054.html
https://www.jianshu.com/p/091b60ea72dc
https://doc.dpdk.org/guides/sample_app_ug/vdpa.html#build <-added

In the next article, as an extra chapter, We plan to describe how to procure NICs, how to configure other than ovs-dpdk, and what issues we are facing.

No vm(qemu)/k8s k8s Pod/VMI vDPA Framework vDPA Type SR-IOV mode Related Articles
1 vm - kernel vhost lagacy Not started
2 vm - kernel vhost switchdev How to set up vDPA with vhost_vdpa for VMs - Metonymical Deflection
3 vm - kernel virtio lagacy Not started
4 vm - kernel virtio switchdev Not started
5 vm - dpdk vhost lagacy Not started
6 vm - dpdk vhost switchdev Not started
7 vm - dpdk virtio lagacy Not started
8 vm - dpdk virtio switchdev How to set up vDPA with virtio_vdpa for VMs - Metonymical DeflectionThis article
9 k8s pod kernel vhost lagacy How to set up vDPA with vhost_vdpa for Kubernetes - Metonymical Deflection
10 k8s pod kernel vhost switchdev How to set up vDPA with vhost_vdpa for Kubernetes + Accelerated Bridge CNI - Metonymical Deflection
11 k8s pod kernel virtio lagacy Not started
12 k8s pod kernel virtio switchdev Not started
13 k8s pod dpdk client lagacy Not started
14 k8s pod dpdk client switchdev Not started
15 k8s pod dpdk server lagacy Not started
16 k8s pod dpdk server switchdev Not started

Other related articles
How to set up vDPA - appendix - - Metonymical Deflection

*1:This is a description of what I understand. If the content is incorrect, please point it out.

*2:We have confirmed that if the MAC address is not settings, the VM will not recognize the VF after VM startup.

*3:core and modules will be installed at the same time

*4:The "0000" in front of the bsf number is called the Domain number. As far as I know, I have never seen a value other than "0000", so I don't think you need to worry too much about it.

*5:It has already been installed in 2-3.

*6:We will describe the details in the extra chapter, but in the case of vhost_vdpa in the previous article, We were able to start the virtual machine with virt-manager, but We were not able to communicate with it. For this reason, in vhost_vdpa, we booted directly from qemu-kvm.

*7:This is because related packages such as spice were not installed when qemu was built, and the virtual machine could not be started without removing these devices. Since this is not directly related to vDPA, we will not discuss how to deal with these issues.