This article describes how to configure vDPA (virtio data path acceleration) in kubernetes environment. *1
For more information on vDPA, please refer to The official Red Hat blog.
In this blog, We will describe the communication between pods in kubernetes (referred to below as "k8s") using vhost_vdpa module.
The following is a list of related articles.
1.Overview
1-1.Environment
1.ControlPlane VMWare : VMware(R) Workstation 15 Pro 15.5.1 build-15018445 2.Worker IA server : ProLiant DL360p Gen8 or DL360 Gen9 System ROM : P71 01/22/2018 NIC : Mellanox ConnectX-6 Dx (MCX623106AS-CDAT) Mellanox OFED : v5.3-1.0.0.1 3.ControlPlane&Worker common OS : CentOS8.3(2011) Kernel(ControlPlane) : 4.18.0-240.el8.x86_64 Kernel(Worker) : 5.12.7-1.el8.elrepo.x86_64 Installed Environment Groups : @^graphical-server-environment @container-management @development @virtualization-client @virtualization-hypervisor @virtualization-tools Kubernetes : 1.21.1 Docker-CE : 20.10.6 flannel : latest Multus : latest sriov-cni : latest sriov-network-device-plugin : latest
1-2.Overall flow
- Advance preparation
- Kernel update
- Build k8s Cluster & flannel
- Build and deploy vDPA (SR-IOV) related
- Deploy Pod
- Operation check
- Advanced configuration
There is relatively a lot of documentation for 1-3, so we will skip the non-essential parts.
7 describes more detailed settings.
1-3.overall structure
Loop connection using a DAC(Direct Attached Cable). *2
fig.1
fig.1 is a simplified description and omits the internal architecture. For this reason, please imagine the following configuration in reality.
fig.2
Quoted from Red Hat's github
github.com
In the above github, it is implemented in legacy mode in SR-IOV.
For this reason, we will describe how to configure it in legacy mode in this blog. *3
2.Advance preparation
Although not specifically mentioned, SELinux disabling, FW disabling, and NTP time synchronization settings are done in advance.
2-1.Swap and Hosts file settings : CP (ControlPlane) & Worker
Disable Swap vi /etc/fstab #/dev/mapper/cl-swap swap swap defaults 0 0 Hosts file settings vi /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 c80g105.md.jp c80g105 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.11.151 c83g151 c83g151.md.jp 192.168.11.152 c83g152 c83g152.md.jp
2-2.Enabling HugePage and IOMMU : Worker
sed -i -e "/GRUB_CMDLINE_LINUX=/s/\"$/ default_hugepagesz=1G hugepagesz=1G hugepages=16\"/g" /etc/default/grub sed -i -e "/GRUB_CMDLINE_LINUX=/s/\"$/ intel_iommu=on iommu=pt pci=realloc\"/g" /etc/default/grub grub2-mkconfig -o /etc/grub2.cfg
Next, implement the mount settings for HugePage. It will be mounted automatically the next time the OS boots.
vi /etc/fstab nodev /dev/hugepages hugetlbfs pagesize=1GB 0 0
2-3.SR-IOV VF settings : Worker
Configure the SR-IOV VF settings; you can increase the number of VFs, but for the sake of simplicity, we have set the number of VFs to "1". In addition, setting the MAC address is mandatory. *4
vi /etc/rc.local echo 1 > /sys/class/net/ens2f0/device/sriov_numvfs echo 1 > /sys/class/net/ens2f1/device/sriov_numvfs sleep 1 ip link set ens2f0 vf 0 mac 00:11:22:33:44:00 ip link set ens2f1 vf 0 mac 00:11:22:33:44:10 sleep 1 exit 0 chmod +x /etc/rc.d/rc.local
2-4.Install the Mellanox driver (OFED) : Worker
You can download the iso file from the Mellanox website.Mellanox Download Site
Please save the downloaded iso file to /root/tmp/.
dnf -y install tcl tk unbound && \ mount -t iso9660 -o loop /root/tmp/MLNX_OFED_LINUX-5.3-1.0.0.1-rhel8.3-x86_64.iso /mnt && \ /mnt/mlnxofedinstall --upstream-libs --dpdk --ovs-dpdk --with-mft --with-mstflint
After the installation is complete, reboot.
reboot
After the reboot is complete, check the HugePage.
cat /proc/meminfo | grep Huge grep hugetlbfs /proc/mounts [root@c83g152 ~]# cat /proc/meminfo | grep Huge AnonHugePages: 452608 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 16 HugePages_Free: 16 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 16777216 kB [root@c83g152 ~]# grep hugetlbfs /proc/mounts nodev /dev/hugepages hugetlbfs rw,relatime,pagesize=1024M 0 0
3.Kernel update : Worker
As of June 11, 2021, the vDPA-related modules are updated at a high frequency, so install the latest Kernel.
3-1.Installing elrepo
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org dnf -y install https://www.elrepo.org/elrepo-release-8.el8.elrepo.noarch.rpm
3-2.Installation of Kernel
dnf list installed | grep kernel dnf -y --enablerepo=elrepo-kernel install kernel-ml kernel-ml-devel dnf list installed | grep kernel reboot
Check the currently installed Kernel.
Install kernel-ml and kernel-ml-devel *5
Check the installed Kernel.
Reboot
3-3.Install Kernel headers, etc.
uname -r dnf -y swap --enablerepo=elrepo-kernel kernel-headers -- kernel-ml-headers && \ dnf -y remove kernel-tools kernel-tools-libs && \ dnf -y --enablerepo=elrepo-kernel install kernel-ml-tools kernel-ml-tools-libs dnf list installed | grep kernel
Check the currently running Kernel Version.
Install kernel-headers.
Remove the existing kernel-tools kernel-tools-libs
Install kernel-tools kernel-tools-libs
Check the installed Kernel.
If you get the following output, you are good to go.
[root@c83g152 ~]# dnf list installed | grep kernel kernel.x86_64 4.18.0-240.el8 @anaconda kernel-core.x86_64 4.18.0-240.el8 @anaconda kernel-devel.x86_64 4.18.0-240.el8 @anaconda kernel-ml.x86_64 5.12.7-1.el8.elrepo @elrepo-kernel kernel-ml-core.x86_64 5.12.7-1.el8.elrepo @elrepo-kernel kernel-ml-devel.x86_64 5.12.7-1.el8.elrepo @elrepo-kernel kernel-ml-headers.x86_64 5.12.7-1.el8.elrepo @elrepo-kernel kernel-ml-modules.x86_64 5.12.7-1.el8.elrepo @elrepo-kernel kernel-ml-tools.x86_64 5.12.7-1.el8.elrepo @elrepo-kernel kernel-ml-tools-libs.x86_64 5.12.7-1.el8.elrepo @elrepo-kernel kernel-modules.x86_64 4.18.0-240.el8 @anaconda kmod-kernel-mft-mlnx.x86_64 4.16.1-1.rhel8u3 @System kmod-mlnx-ofa_kernel.x86_64 5.2-OFED.5.2.2.2.0.1.rhel8u3 @System mlnx-ofa_kernel.x86_64 5.2-OFED.5.2.2.2.0.1.rhel8u3 @System mlnx-ofa_kernel-devel.x86_64 5.2-OFED.5.2.2.2.0.1.rhel8u3 @System
4.Build k8s Cluster & flannel
4-1.Install Docker : CP&Worker
dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo && \ dnf -y install --allowerasing docker-ce docker-ce-cli containerd.io && \ systemctl start docker && systemctl enable docker
4-2.Configuring the k8s repository : CP&Worker
cat > /etc/yum.repos.d/kubernetes.repo <<EOF [kubernetes] name=Kubernetes baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=1 repo_gpgcheck=1 gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg EOF
4-3.Install k8s : CP&Worker
CP dnf -y install kubeadm kubectl Worker dnf -y install kubeadm CP&Worker systemctl start kubelet.service && \ systemctl enable kubelet.service
4-4.Configuring Docker : CP&Worker
cat > /etc/docker/daemon.json <<EOF { "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m" }, "storage-driver": "overlay2", "storage-opts": [ "overlay2.override_kernel_check=true" ] } EOF mkdir -p /etc/systemd/system/docker.service.d systemctl daemon-reload && \ systemctl restart docker
4-5.Building the k8sCluster : CP
kubeadm init --apiserver-advertise-address=192.168.11.151 --pod-network-cidr=10.244.0.0/16
Output Example
At the end you will see the following output, please copy the red text.
This will be used when the worker joins the CP.
To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.11.151:6443 --token 0gfh5j.vgu76alcycb2tc2e \ --discovery-token-ca-cert-hash sha256:edcb1a3856838586a6ea7c99200daafa4fbb639e822838f4df81ce09d2faaac3
4-6.Configuration after building k8s Cluster : CP
Copy the config file mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config Command completion settings source <(kubectl completion bash) echo "source <(kubectl completion bash)" >> ~/.bashrc
4-7.Install flannel : CP
cd /usr/src && \ wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml && \ kubectl apply -f kube-flannel.yml kubectl get nodes
Output Example
Wait until the status becomes Ready.
[root@c83g151 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION c83g151.md.jp Ready control-plane,master 44s v1.21.1
4-8.Joining a Worker : Worker
kubeadm join 192.168.11.151:6443 --token 0gfh5j.vgu76alcycb2tc2e \ --discovery-token-ca-cert-hash sha256:edcb1a3856838586a6ea7c99200daafa4fbb639e822838f4df81ce09d2faaac3
Output Example
Workers also wait until the status becomes Ready.
[root@c83g151 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION c83g151.md.jp Ready control-plane,master 5m2s v1.21.1 c83g152.md.jp Ready44s v1.21.1
4-9.Enabling the vhost_vdpa module : Worker
Build and deploy sriov-cni as described in section "5. vDPA (SR-IOV) related build and deployment".
At that time, if the vhost_vdpa module is not enabled, it will not be recognized as a Worker resource, so enable it beforehand.
modprobe vhost_vdpa lsmod |grep vd ls -Fal /dev ls -Fal /sys/bus/vdpa/drivers/vhost_vdpa [root@c83g152 ~]# lsmod |grep vd vhost_vdpa 24576 0 vhost 57344 1 vhost_vdpa mlx5_vdpa 45056 0 vhost_iotlb 16384 3 vhost_vdpa,vhost,mlx5_vdpa vdpa 16384 2 vhost_vdpa,mlx5_vdpa irqbypass 16384 2 vhost_vdpa,kvm mlx5_core 1216512 2 mlx5_vdpa,mlx5_ib [root@c83g152 ~]# ls -Fal /dev total 0 drwxr-xr-x 22 root root 3660 Apr 8 00:02 ./ dr-xr-xr-x. 17 root root 244 Apr 7 20:30 ../ crw-r--r-- 1 root root 10, 235 Apr 7 23:28 autofs drwxr-xr-x 2 root root 160 Apr 7 23:28 block/ drwxr-xr-x 2 root root 100 Apr 7 23:28 bsg/ ============ s n i p ============ drwxr-xr-x 2 root root 60 Apr 7 23:28 vfio/ crw------- 1 root root 10, 127 Apr 7 23:28 vga_arbiter crw------- 1 root root 10, 137 Apr 7 23:28 vhci crw------- 1 root root 10, 238 Apr 7 23:28 vhost-net crw------- 1 root root 240, 0 Apr 8 00:06 vhost-vdpa-0 crw------- 1 root root 240, 1 Apr 8 00:06 vhost-vdpa-1 crw------- 1 root root 10, 241 Apr 7 23:28 vhost-vsock [root@c83g152 ~]# ls -Fal /sys/bus/vdpa/drivers/vhost_vdpa total 0 drwxr-xr-x 2 root root 0 Apr 8 00:06 ./ drwxr-xr-x 3 root root 0 Apr 7 23:49 ../ --w------- 1 root root 4096 Apr 8 00:07 bind lrwxrwxrwx 1 root root 0 Apr 8 00:07 module -> ../../../../module/vhost_vdpa/ --w------- 1 root root 4096 Apr 8 00:06 uevent --w------- 1 root root 4096 Apr 8 00:07 unbind lrwxrwxrwx 1 root root 0 Apr 8 00:07 vdpa0 -> ../../../../devices/pci0000:00/0000:00:03.0/0000:07:00.2/vdpa0/ lrwxrwxrwx 1 root root 0 Apr 8 00:07 vdpa1 -> ../../../../devices/pci0000:00/0000:00:03.0/0000:07:01.2/vdpa1/
From the above output results, we can confirm the following
- /dev/vhost-vdpa-0 and /dev/vhost-vdpa-1 are recognized as vhost_vdpa devices
- 0000:07:00.2/vdpa0 and 0000:07:01.2/vdpa1 are controlled by the vhost_vdpa driver
Also, set it so that it is enabled at OS startup. *6
The blue text part has been added.
vi /etc/rc.local echo 1 > /sys/class/net/ens2f0/device/sriov_numvfs echo 1 > /sys/class/net/ens2f1/device/sriov_numvfs sleep 1 ip link set ens2f0 vf 0 mac 00:11:22:33:44:00 ip link set ens2f1 vf 0 mac 00:11:22:33:44:10 sleep 1 modprobe vhost_vdpa sleep 1 exit 0
reboot CP & Worker again.
5.Build and deploy vDPA (SR-IOV) related
5-2.Building the Docker image : CP
In this section, we will build the following three images. *7
- multus
- sriov-cni
- sriov-dp
cd /usr/src && \ git clone https://github.com/redhat-nfvpe/vdpa-deployment.git cd /usr/src/vdpa-deployment && \ make multus && \ make sriov-cni && \ make sriov-dp
5-3.Copy the Docker image : CP
Copy the built Docker image to the Worker.
/usr/src/vdpa-deployment/scripts/load-image.sh nfvpe/sriov-device-plugin root@192.168.11.152 && \ /usr/src/vdpa-deployment/scripts/load-image.sh nfvpe/sriov-cni root@192.168.11.152 && \ /usr/src/vdpa-deployment/scripts/load-image.sh nfvpe/multus root@192.168.11.152
Output Example
[root@c83g151 vdpa-deployment]# /usr/src/vdpa-deployment/scripts/load-image.sh nfvpe/sriov-device-plugin root@192.168.11.152 && \
/usr/src/vdpa-deployment/scripts/load-image.sh nfvpe/sriov-cni root@192.168.11.152 && \
> /usr/src/vdpa-deployment/scripts/load-image.sh nfvpe/sriov-cni root@192.168.11.152 && \
> /usr/src/vdpa-deployment/scripts/load-image.sh nfvpe/multus root@192.168.11.152
+ IMAGE=nfvpe/sriov-device-plugin
+ NODE=root@192.168.11.152
++ mktemp -d
+ temp=/tmp/tmp.Lh8BaezUtC
+ dest=/tmp/tmp.Lh8BaezUtC/image.tar
+ save nfvpe/sriov-device-plugin /tmp/tmp.Lh8BaezUtC/image.tar
+ local image=nfvpe/sriov-device-plugin
+ local dest=/tmp/tmp.Lh8BaezUtC/image.tar
+ echo 'Saving nfvpe/sriov-device-plugin into /tmp/tmp.Lh8BaezUtC/image.tar'
Saving nfvpe/sriov-device-plugin into /tmp/tmp.Lh8BaezUtC/image.tar
+ docker save -o /tmp/tmp.Lh8BaezUtC/image.tar nfvpe/sriov-device-plugin
============ s n i p ============
+ echo 'Loading /tmp/tmp.Z6emF9eiAs/image.tar into root@192.168.11.152'
Loading /tmp/tmp.Z6emF9eiAs/image.tar into root@192.168.11.152
+ ssh root@192.168.11.152 'docker load'
Loaded image: nfvpe/multus:latest
+ rm -r /tmp/tmp.Z6emF9eiAs
Checking the Docker Image : CP&Worker
[root@c83g151 vdpa-deployment]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE nfvpe/sriov-cni latest 521ab1f3a5a1 16 minutes ago 9.47MB <none> <none> 88062f2c13d4 16 minutes ago 561MB nfvpe/multus latest aa8d9becca0f 17 minutes ago 331MB <none> <none> e75c422aef6e 17 minutes ago 1.34GB nfvpe/sriov-device-plugin latest 0dd5f325c600 18 minutes ago 42.7MB <none> <none> 3deb8b5405fa 18 minutes ago 1.26GB quay.io/coreos/flannel v0.14.0 8522d622299c 2 weeks ago 67.9MB k8s.gcr.io/kube-apiserver v1.21.1 771ffcf9ca63 3 weeks ago 126MB k8s.gcr.io/kube-proxy v1.21.1 4359e752b596 3 weeks ago 131MB k8s.gcr.io/kube-controller-manager v1.21.1 e16544fd47b0 3 weeks ago 120MB k8s.gcr.io/kube-scheduler v1.21.1 a4183b88f6e6 3 weeks ago 50.6MB golang alpine3.12 24d827672eae 3 weeks ago 301MB golang alpine 722a834ff95b 3 weeks ago 301MB fedora 32 c451de0d2441 5 weeks ago 202MB alpine 3.12 13621d1b12d4 7 weeks ago 5.58MB alpine 3 6dbb9cc54074 7 weeks ago 5.61MB k8s.gcr.io/pause 3.4.1 0f8457a4c2ec 4 months ago 683kB k8s.gcr.io/coredns/coredns v1.8.0 296a6d5035e2 7 months ago 42.5MB k8s.gcr.io/etcd 3.4.13-0 0369cf4303ff 9 months ago 253MB
5-4.Deploying yaml files : CP
Deploy the following four files.
- /usr/src/vdpa-deployment/deployment/multus-daemonset.yaml
- /usr/src/vdpa-deployment/deployment/sriovcni-vdpa-daemonset.yaml
- /usr/src/vdpa-deployment/deployment/sriovdp-vdpa-daemonset.yaml
- /usr/src/vdpa-deployment/deployment/configMap-vdpa.yaml
cd /usr/src/vdpa-deployment && \ make deploy && \ kubectl apply -f /usr/src/vdpa-deployment/deployment/configMap-vdpa.yaml
Output Example
[root@c83g151 vdpa-deployment]# cd /usr/src/vdpa-deployment && \ > make deploy && \ > kubectl create -f /usr/src/vdpa-deployment/deployment/configMap-vdpa.yaml serviceaccount/sriov-device-plugin created daemonset.apps/kube-sriov-device-plugin-amd64 created customresourcedefinition.apiextensions.k8s.io/network-attachment-definitions.k8s.cni.cncf.io created clusterrole.rbac.authorization.k8s.io/multus created clusterrolebinding.rbac.authorization.k8s.io/multus created serviceaccount/multus created configmap/multus-cni-config created daemonset.apps/kube-multus-ds-amd64 created daemonset.apps/kube-sriov-cni-ds-amd64 created configmap/sriovdp-config created
5-5.Checking DaemonSet & ConfigMap & Pod : CP
Check the DaemonSet and ConfigMap that you deployed in 5-4.
kubectl -n kube-system get ds kubectl -n kube-system get cm kubectl -n kube-system get pod
Output Example
[root@c83g151 vdpa-deployment]# kubectl -n kube-system get ds NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE kube-flannel-ds 2 2 2 2 23h4m kube-multus-ds-amd64 2 2 2 2 2 kubernetes.io/arch=amd64 6m52s kube-proxy 2 2 2 2 2 kubernetes.io/os=linux 3h4m kube-sriov-cni-ds-amd64 2 2 2 2 2 beta.kubernetes.io/arch=amd64 6m52s kube-sriov-device-plugin-amd64 2 2 2 2 2 beta.kubernetes.io/arch=amd64 6m52s [root@c83g151 vdpa-deployment]# kubectl -n kube-system get cm NAME DATA AGE coredns 1 3h4m extension-apiserver-authentication 6 3h4m kube-flannel-cfg 2 3h4m kube-proxy 2 3h4m kube-root-ca.crt 1 3h4m kubeadm-config 2 3h4m kubelet-config-1.21 1 3h4m multus-cni-config 1 6m52s sriovdp-config 1 6m52s [root@c83g151 vdpa-deployment]# kubectl -n kube-system get pod NAME READY STATUS RESTARTS AGE coredns-558bd4d5db-7kql2 1/1 Running 2 178m coredns-558bd4d5db-nq8k7 1/1 Running 2 178m etcd-c83g151.md.jp 1/1 Running 2 178m kube-apiserver-c83g151.md.jp 1/1 Running 2 178m kube-controller-manager-c83g151.md.jp 1/1 Running 2 178m kube-flannel-ds-89v57 1/1 Running 2 174m kube-flannel-ds-zwd7n 1/1 Running 2 177m kube-multus-ds-amd64-75rbf 1/1 Running 0 33s kube-multus-ds-amd64-zk6w9 1/1 Running 0 33s kube-proxy-fdv9r 1/1 Running 2 174m kube-proxy-l6t7h 1/1 Running 2 178m kube-scheduler-c83g151.md.jp 1/1 Running 2 178m kube-sriov-cni-ds-amd64-2xfxw 1/1 Running 0 33s kube-sriov-cni-ds-amd64-ndmmr 1/1 Running 0 33s kube-sriov-device-plugin-amd64-4lt4p 1/1 Running 0 33s kube-sriov-device-plugin-amd64-gbplp 1/1 Running 0 33s
5-6.Checking the details of ConfigMap : CP
In "6. Deploy Pod", vdpa_mlx_vhost is related to the Network Attachment Definition and the Pod configuration, so please check it beforehand.
cat /usr/src/vdpa-deployment/deployment/configMap-vdpa.yaml apiVersion: v1 kind: ConfigMap metadata: name: sriovdp-config namespace: kube-system data: config.json: | { "resourceList": [{ "resourceName": "vdpa_ifcvf_vhost", "selectors": { "vendors": ["1af4"], "devices": ["1041"], "drivers": ["ifcvf"], "vdpaType": "vhost" } }, { "resourceName": "vdpa_mlx_vhost", "selectors": { "vendors": ["15b3"], "devices": ["101e"], "drivers": ["mlx5_core"], "vdpaType": "vhost" } }, { "resourceName": "vdpa_mlx_virtio", "selectors": { "vendors": ["15b3"], "devices": ["101e"], "drivers": ["mlx5_core"], "vdpaType": "virtio" } } ] }
Note
resourceName | This can be any name. You can specify this name explicitly in "6-1 sections". |
vendors | This is the vendor identifier for the PCI Device ID. 15b3 indicates that it is a Mellanox product.*8 |
devices | This is the device identifier for the PCI Device ID. 101e indicates that it is a VF of ConnectX-6 Dx. |
drivers | Specifies the mlx5_core driver. |
vdpaType | Specifies the vhost. This option has been extended from the regular SR-IOV Device plug-in. |
The PCI Device ID can be checked with the following command.
[root@c83g152 ~]# lspci -nn |grep Mellanox 07:00.0 Ethernet controller [0200]: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] [15b3:101d] 07:00.1 Ethernet controller [0200]: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] [15b3:101d] 07:00.2 Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e] 07:01.2 Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]
5-7.Checking Worker Resources : CP
Check that vdpa_mlx_vhost is recognized as a Worker resource.
kubectl get node c83g152.md.jp -o json | jq '.status.allocatable'
Output Example
[root@c83g151 vdpa-deployment]# kubectl get node c83g152.md.jp -o json | jq '.status.allocatable' { "cpu": "16", "ephemeral-storage": "127203802926", "hugepages-1Gi": "16Gi", "hugepages-2Mi": "0", "intel.com/vdpa_mlx_vhost": "2", "memory": "148123456Ki", "pods": "110" }
"2" is the number of VFs that were recognized.
Note
When this value is "0" or this line (intel.com/vdpa_mlx_vhost) is not displayed, delete the pod of sriov-device-plugin once.
Since sriov-device-plugin is deployed as a DaemonSet, it will be automatically recreated after the Pod is deleted.
This will cause sriov-device-plugin to attempt to re-register vdpa_mlx_vhost.
[root@c83g151 vdpa-deployment]# kubectl -n kube-system get pod -o wide |grep 152 kube-flannel-ds-89v57 1/1 Running 2 4h8m 192.168.11.152 c83g152.md.jpkube-multus-ds-amd64-75rbf 1/1 Running 0 74m 192.168.11.152 c83g152.md.jp kube-proxy-fdv9r 1/1 Running 2 4h8m 192.168.11.152 c83g152.md.jp kube-sriov-cni-ds-amd64-2xfxw 1/1 Running 0 74m 192.168.11.152 c83g152.md.jp kube-sriov-device-plugin-amd64-rg8hm 1/1 Running 0 73m 192.168.11.152 c83g152.md.jp [root@c83g151 vdpa-deployment]# kubectl -n kube-system delete pod kube-sriov-device-plugin-amd64-rg8hm pod "kube-sriov-device-plugin-amd64-rg8hm" deleted [root@c83g151 vdpa-deployment]# kubectl -n kube-system get pod -o wide |grep 152 kube-flannel-ds-89v57 1/1 Running 2 4h9m 192.168.11.152 c83g152.md.jp kube-multus-ds-amd64-75rbf 1/1 Running 0 76m 192.168.11.152 c83g152.md.jp kube-proxy-fdv9r 1/1 Running 2 4h9m 192.168.11.152 c83g152.md.jp kube-sriov-cni-ds-amd64-2xfxw 1/1 Running 0 76m 192.168.11.152 c83g152.md.jp kube-sriov-device-plugin-amd64-kwc5z 1/1 Running 0 3s 192.168.11.152 c83g152.md.jp
Again, check the Worker resources.
kubectl get node c83g152.md.jp -o json | jq '.status.allocatable'
If it is still not recognized, please refer to section 4-9.
The two points are as follows.
- /dev/vhost-vdpa-0 and /dev/vhost-vdpa-1 must be recognized as vhost_vdpa devices.
- 0000:07:00.2/vdpa0 and 0000:07:01.2/vdpa1 must be controlled by the vhost_vdpa driver.
Unless the above two conditions are met, vdpa_mlx_vhost will not be recognized as a Worker resource.
6.Deploy Pod
In "/usr/src/vdpa-deployment/deployment/multus-daemonset.yaml", configure the Network Attachment Definition defined as CRD.
In this section, we will configure vlan, spoofchk, ipam, etc.
For more details, please refer to the following website.
sriov-cni/configuration-reference.md at rfe/vdpa · amorenoz/sriov-cni · GitHub
6-1.Configuring the NetworkAttachmentDefinition : CP
"vdpa-mlx-vhost-net30" is specified in the annotations of the Pod.
vi 96nA-vdpa30.yaml apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: vdpa-mlx-vhost-net30 annotations: k8s.v1.cni.cncf.io/resourceName: intel.com/vdpa_mlx_vhost spec: config: '{ "type": "sriov", "cniVersion": "0.3.1", "name": "sriov-vdpa30", "vlan": 30, "trust": "on", "spoofchk": "off", "ipam": { "type": "host-local", "subnet": "192.168.30.0/24", "rangeStart": "192.168.30.64", "rangeEnd": "192.168.30.127" } }' kubectl apply -f 96nA-vdpa30.yaml kubectl get network-attachment-definitions.k8s.cni.cncf.io
Output Example
[root@c83g151 vdpa-deployment]# kubectl get network-attachment-definitions.k8s.cni.cncf.io NAME AGE vdpa-mlx-vhost-net30 14m
6-2.Deploying a Pod : CP
The key point is that the values defined in the Network Attachment Definition are specified for "annotations" and "resources".
vi 16vdpa.yaml apiVersion: v1 kind: Pod metadata: name: vdpa-pod01 annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "vdpa-mlx-vhost-net30", "mac": "CA:FE:C0:FF:EE:11" } ]' spec: nodeName: c83g152.md.jp containers: - name: vdpa-single01 image: centos:latest imagePullPolicy: IfNotPresent securityContext: privileged: true resources: requests: intel.com/vdpa_mlx_vhost: '1' limits: intel.com/vdpa_mlx_vhost: '1' command: ["sleep"] args: ["infinity"] --- apiVersion: v1 kind: Pod metadata: name: vdpa-pod02 annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "vdpa-mlx-vhost-net30", "mac": "CA:FE:C0:FF:EE:12" } ]' spec: nodeName: c83g152.md.jp containers: - name: vdpa-single02 image: centos:latest imagePullPolicy: IfNotPresent securityContext: privileged: true resources: requests: intel.com/vdpa_mlx_vhost: '1' limits: intel.com/vdpa_mlx_vhost: '1' command: ["sleep"] args: ["infinity"] kubectl apply -f 16vdpa.yaml kubectl get pod
Output Example
[root@c83g151 vdpa-deployment]# kubectl get pod NAME READY STATUS RESTARTS AGE vdpa-pod01 1/1 Running 0 16m vdpa-pod02 1/1 Running 0 16m
6-3.Check Pod details : CP
In this section, we will check the details of the Pod status by using the kubectl describe command.
kubectl describe pod vdpa-pod01
fig.1
Make sure that each parameters(/dev/vhost-vdpa-0, 07:00.2) of device-info and fig.1 match.
Output Example
[root@c83g151 vdpa-deployment]# kubectl describe pod vdpa-pod01 Name: vdpa-pod01 Namespace: default Priority: 0 Node: c83g152.md.jp/192.168.11.152 Start Time: Thu, 03 Jun 2021 20:54:49 +0900 Labels:Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "", "interface": "eth0", "ips": [ "10.244.1.9" ], "mac": "26:9a:87:2a:70:70", "default": true, "dns": {} },{ "name": "default/vdpa-mlx-vhost-net30", "interface": "net1", "ips": [ "192.168.30.71" ], "mac": "CA:FE:C0:FF:EE:11", "dns": {}, "device-info": { "type": "vdpa", "version": "1.0.0", "vdpa": { "parent-device": "vdpa0", "driver": "vhost", "path": "/dev/vhost-vdpa-0", "pci-address": "0000:07:00.2" } } }] k8s.v1.cni.cncf.io/networks: [ { "name": "vdpa-mlx-vhost-net30", "mac": "CA:FE:C0:FF:EE:11" } ] k8s.v1.cni.cncf.io/networks-status: [{ "name": "", "interface": "eth0", "ips": [ "10.244.1.9" ], "mac": "26:9a:87:2a:70:70", "default": true, "dns": {} },{ "name": "default/vdpa-mlx-vhost-net30", "interface": "net1", "ips": [ "192.168.30.71" ], "mac": "CA:FE:C0:FF:EE:11", "dns": {}, "device-info": { "type": "vdpa", "version": "1.0.0", "vdpa": { "parent-device": "vdpa0", "driver": "vhost", "path": "/dev/vhost-vdpa-0", "pci-address": "0000:07:00.2" } } }] Status: Running IP: 10.244.1.9 IPs: IP: 10.244.1.9 Containers: vdpa-single01: Container ID: docker://cf57569807eb2de3d4901ff2ade55b845682d2c7a37ee88c7f6536498fd0b63e Image: centos:latest Image ID: docker-pullable://centos@sha256:5528e8b1b1719d34604c87e11dcd1c0a20bedf46e83b5632cdeac91b8c04efc1 Port: Host Port: Command: sleep Args: infinity State: Running Started: Thu, 03 Jun 2021 20:54:51 +0900 Ready: True Restart Count: 0 Limits: intel.com/vdpa_mlx_vhost: 1 Requests: intel.com/vdpa_mlx_vhost: 1 Environment: Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9kqtb (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: kube-api-access-9kqtb: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal AddedInterface 26s multus Add eth0 [10.244.1.9/24] Normal AddedInterface 25s multus Add net1 [192.168.30.71/24] from default/vdpa-mlx-vhost-net30 Normal Pulled 25s kubelet Container image "centos:latest" already present on machine Normal Created 25s kubelet Created container vdpa-single01 Normal Started 25s kubelet Started container vdpa-single01
6-4.Check the Mac address of the Worker : Worker
Check that the MAC address specified in the annotations of the Pod is reflected in the VF of the Worker.
ip link show ens2f0
Output Example
[root@c83g152 ~]# ip link show ens2f0 8: ens2f0:mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 94:40:c9:7e:1f:10 brd ff:ff:ff:ff:ff:ff vf 0 link/ether ca:fe:c0:ff:ee:11 brd ff:ff:ff:ff:ff:ff, vlan 30, spoof checking off, link-state auto, trust on, query_rss off
7.Operation check : CP
7-1.Check the IP address of the Pod.
the IP address of the Pod is dynamically assigned, check it in advance.
kubectl describe pod vdpa-pod01 |grep Add kubectl describe pod vdpa-pod02 |grep Add
Output Example
[root@c83g151 vdpa-deployment]# kubectl describe pod vdpa-pod01 |grep Add Normal AddedInterface 26m multus Add eth0 [10.244.1.9/24] Normal AddedInterface 26m multus Add net1 [192.168.30.71/24] from default/vdpa-mlx-vhost-net30 [root@c83g151 vdpa-deployment]# kubectl describe pod vdpa-pod02 |grep Add Normal AddedInterface 26m multus Add eth0 [10.244.1.8/24] Normal AddedInterface 26m multus Add net1 [192.168.30.70/24] from default/vdpa-mlx-vhost-net30
7-2.communication check
vdpa-pod01 | 192.168.30.71/24 |
vdpa-pod02 | 192.168.30.70/24 |
Execute a ping from vdpa-pod01(192.168.30.71) to 192.168.30.70.
kubectl exec -it vdpa-pod01 -- ping 192.168.30.70
Output Example
[root@c83g151 vdpa-deployment]# kubectl exec -it vdpa-pod01 -- ping 192.168.30.70 PING 192.168.30.70 (192.168.30.70) 56(84) bytes of data. 64 bytes from 192.168.30.70: icmp_seq=1 ttl=64 time=0.447 ms 64 bytes from 192.168.30.70: icmp_seq=2 ttl=64 time=0.166 ms 64 bytes from 192.168.30.70: icmp_seq=3 ttl=64 time=0.227 ms 64 bytes from 192.168.30.70: icmp_seq=4 ttl=64 time=0.217 ms ^C --- 192.168.30.70 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 91ms rtt min/avg/max/mdev = 0.166/0.264/0.447/0.108 ms
We was able to confirm that the pods are communicating with each other via the DAC.
That's it for the configuration.
In the next section, we will describe a configuration example for more detailed settings.
8.Advanced configuration
Depending on the configuring of ConfigMap and NetworkAttachmentDefinition, you can also set up a configuration as shown in fig.3.
fig.3
The yaml file will help you understand what exactly you are doing, so I won't describe any particular explanation, but I will describe the points that need attention in red texts.
Please increase the number of VFs in SR-IOV to 4 in advance.
echo 2 > /sys/class/net/ens2f0/device/sriov_numvfs echo 2 > /sys/class/net/ens2f1/device/sriov_numvfs
8-1.Configuring ConfigMap : CP
vi 83cm-vdpa.yaml apiVersion: v1 kind: ConfigMap metadata: name: sriovdp-config namespace: kube-system data: config.json: | { "resourceList": [{ "resourceName": "vdpa_ifcvf_vhost", "selectors": { "vendors": ["1af4"], "devices": ["1041"], "drivers": ["ifcvf"], "vdpaType": "vhost" } }, { "resourceName": "vdpa_mlx_vhost11", "selectors": { "vendors": ["15b3"], "devices": ["101e"], "drivers": ["mlx5_core"], "pciAddresses": ["0000:07:00.2"], "vdpaType": "vhost" } }, { "resourceName": "vdpa_mlx_vhost12", "selectors": { "vendors": ["15b3"], "devices": ["101e"], "drivers": ["mlx5_core"], "pciAddresses": ["0000:07:00.3"], "vdpaType": "vhost" } }, { "resourceName": "vdpa_mlx_vhost21", "selectors": { "vendors": ["15b3"], "devices": ["101e"], "drivers": ["mlx5_core"], "pciAddresses": ["0000:07:01.2"], "vdpaType": "vhost" } }, { "resourceName": "vdpa_mlx_vhost22", "selectors": { "vendors": ["15b3"], "devices": ["101e"], "drivers": ["mlx5_core"], "pciAddresses": ["0000:07:01.3"], "vdpaType": "vhost" } }, { "resourceName": "vdpa_mlx_virtio", "selectors": { "vendors": ["15b3"], "devices": ["101e"], "drivers": ["mlx5_core"], "vdpaType": "virtio" } } ] } kubectl apply -f 83cm-vdpa.yaml
8-2.Configuring NetworkAttachmentDefinition : CP
vi 93nA-vdpa11-22.yaml apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: vdpa-mlx-vhost-net11 annotations: k8s.v1.cni.cncf.io/resourceName: intel.com/vdpa_mlx_vhost11 spec: config: '{ "type": "sriov", "cniVersion": "0.3.1", "name": "sriov-vdpa", "vlan": 100, "trust": "on", "spoofchk": "off", "ipam": { "type": "host-local", "subnet": "192.168.100.0/24", "rangeStart": "192.168.100.64", "rangeEnd": "192.168.100.127" } }' --- apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: vdpa-mlx-vhost-net12 annotations: k8s.v1.cni.cncf.io/resourceName: intel.com/vdpa_mlx_vhost12 spec: config: '{ "type": "sriov", "cniVersion": "0.3.1", "name": "sriov-vdpa", "vlan": 200, "trust": "on", "spoofchk": "off", "ipam": { "type": "host-local", "subnet": "192.168.200.0/24", "rangeStart": "192.168.200.64", "rangeEnd": "192.168.200.127" } }' --- apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: vdpa-mlx-vhost-net21 annotations: k8s.v1.cni.cncf.io/resourceName: intel.com/vdpa_mlx_vhost21 spec: config: '{ "type": "sriov", "cniVersion": "0.3.1", "name": "sriov-vdpa", "vlan": 100, "trust": "on", "spoofchk": "off", "ipam": { "type": "host-local", "subnet": "192.168.100.0/24", "rangeStart": "192.168.100.128", "rangeEnd": "192.168.100.191" } }' --- apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: vdpa-mlx-vhost-net22 annotations: k8s.v1.cni.cncf.io/resourceName: intel.com/vdpa_mlx_vhost22 spec: config: '{ "type": "sriov", "cniVersion": "0.3.1", "name": "sriov-vdpa", "vlan": 200, "trust": "on", "spoofchk": "off", "ipam": { "type": "host-local", "subnet": "192.168.200.0/24", "rangeStart": "192.168.200.128", "rangeEnd": "192.168.200.191" } }' kubectl apply -f 93nA-vdpa11-22.yaml
8-3.Configuring Pod : CP
vi 13vdpa.yaml apiVersion: v1 kind: Pod metadata: name: vdpa-pod01 annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "vdpa-mlx-vhost-net11", "mac": "0C:FE:C0:FF:EE:11" }, { "name": "vdpa-mlx-vhost-net12", "mac": "0C:FE:C0:FF:EE:12" } ]' spec: nodeName: c83g152.md.jp containers: - name: vdpa-single01 image: centos:latest imagePullPolicy: IfNotPresent securityContext: privileged: true resources: requests: intel.com/vdpa_mlx_vhost11: '1' intel.com/vdpa_mlx_vhost12: '1' limits: intel.com/vdpa_mlx_vhost11: '1' intel.com/vdpa_mlx_vhost12: '1' command: ["sleep"] args: ["infinity"] --- apiVersion: v1 kind: Pod metadata: name: vdpa-pod02 annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "vdpa-mlx-vhost-net21", "mac": "0C:FE:C0:FF:EE:21" }, { "name": "vdpa-mlx-vhost-net22", "mac": "0C:FE:C0:FF:EE:22" } ]' spec: nodeName: c83g152.md.jp containers: - name: vdpa-single02 image: centos:latest imagePullPolicy: IfNotPresent securityContext: privileged: true resources: requests: intel.com/vdpa_mlx_vhost21: '1' intel.com/vdpa_mlx_vhost22: '1' limits: intel.com/vdpa_mlx_vhost21: '1' intel.com/vdpa_mlx_vhost22: '1' command: ["sleep"] args: ["infinity"] kubectl apply -f 13vdpa.yaml
8-4.Check Pod details : CP
[root@c83g151 vdpa-deployment]# kubectl describe pod vdpa-pod01 Name: vdpa-pod01 Namespace: default Priority: 0 Node: c83g152.md.jp/192.168.11.152 Start Time: Thu, 03 Jun 2021 22:23:33 +0900 Labels:Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "", "interface": "eth0", "ips": [ "10.244.1.12" ], "mac": "c6:01:75:97:75:91", "default": true, "dns": {} },{ "name": "default/vdpa-mlx-vhost-net11", "interface": "net1", "ips": [ "192.168.100.64" ], "mac": "0C:FE:C0:FF:EE:11", "dns": {}, "device-info": { "type": "vdpa", "version": "1.0.0", "vdpa": { "parent-device": "vdpa0", "driver": "vhost", "path": "/dev/vhost-vdpa-0", "pci-address": "0000:07:00.2" } } },{ "name": "default/vdpa-mlx-vhost-net12", "interface": "net2", "ips": [ "192.168.200.64" ], "mac": "0C:FE:C0:FF:EE:12", "dns": {}, "device-info": { "type": "vdpa", "version": "1.0.0", "vdpa": { "parent-device": "vdpa1", "driver": "vhost", "path": "/dev/vhost-vdpa-1", "pci-address": "0000:07:00.3" } } }] k8s.v1.cni.cncf.io/networks: [ { "name": "vdpa-mlx-vhost-net11", "mac": "0C:FE:C0:FF:EE:11" }, { "name": "vdpa-mlx-vhost-net12", "mac": "0C:FE:C0:FF:EE:12" } ] k8s.v1.cni.cncf.io/networks-status: [{ "name": "", "interface": "eth0", "ips": [ "10.244.1.12" ], "mac": "c6:01:75:97:75:91", "default": true, "dns": {} },{ "name": "default/vdpa-mlx-vhost-net11", "interface": "net1", "ips": [ "192.168.100.64" ], "mac": "0C:FE:C0:FF:EE:11", "dns": {}, "device-info": { "type": "vdpa", "version": "1.0.0", "vdpa": { "parent-device": "vdpa0", "driver": "vhost", "path": "/dev/vhost-vdpa-0", "pci-address": "0000:07:00.2" } } },{ "name": "default/vdpa-mlx-vhost-net12", "interface": "net2", "ips": [ "192.168.200.64" ], "mac": "0C:FE:C0:FF:EE:12", "dns": {}, "device-info": { "type": "vdpa", "version": "1.0.0", "vdpa": { "parent-device": "vdpa1", "driver": "vhost", "path": "/dev/vhost-vdpa-1", "pci-address": "0000:07:00.3" } } }] Status: Running IP: 10.244.1.12 IPs: IP: 10.244.1.12 Containers: vdpa-single01: Container ID: docker://df698de4764a4209703f9df3a641167cdf49222860d1e41f1c85de7ba1bb5146 Image: centos:latest Image ID: docker-pullable://centos@sha256:5528e8b1b1719d34604c87e11dcd1c0a20bedf46e83b5632cdeac91b8c04efc1 Port: Host Port: Command: sleep Args: infinity State: Running Started: Thu, 03 Jun 2021 22:23:36 +0900 Ready: True Restart Count: 0 Limits: intel.com/vdpa_mlx_vhost11: 1 intel.com/vdpa_mlx_vhost12: 1 Requests: intel.com/vdpa_mlx_vhost11: 1 intel.com/vdpa_mlx_vhost12: 1 Environment: Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xst8f (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: kube-api-access-xst8f: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal AddedInterface 8m3s multus Add eth0 [10.244.1.12/24] Normal AddedInterface 8m2s multus Add net1 [192.168.100.64/24] from default/vdpa-mlx-vhost-net11 Normal AddedInterface 8m2s multus Add net2 [192.168.200.64/24] from default/vdpa-mlx-vhost-net12 Normal Pulled 8m2s kubelet Container image "centos:latest" already present on machine Normal Created 8m2s kubelet Created container vdpa-single01 Normal Started 8m1s kubelet Started container vdpa-single01
8-5.Check Worker details : Worker
[root@c83g152 ~]# ip link show ens2f0 8: ens2f0:mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 94:40:c9:7e:1f:10 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 0c:fe:c0:ff:ee:11 brd ff:ff:ff:ff:ff:ff, vlan 100, spoof checking off, link-state auto, trust on, query_rss off vf 1 link/ether 0c:fe:c0:ff:ee:12 brd ff:ff:ff:ff:ff:ff, vlan 200, spoof checking off, link-state auto, trust on, query_rss off [root@c83g152 ~]# ip link show ens2f1 9: ens2f1: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 94:40:c9:7e:1f:11 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 0c:fe:c0:ff:ee:21 brd ff:ff:ff:ff:ff:ff, vlan 100, spoof checking off, link-state auto, trust on, query_rss off vf 1 link/ether 0c:fe:c0:ff:ee:22 brd ff:ff:ff:ff:ff:ff, vlan 200, spoof checking off, link-state auto, trust on, query_rss off [root@c83g152 ~]# ls -Fal /sys/bus/vdpa/drivers/vhost_vdpa total 0 drwxr-xr-x 2 root root 0 Jun 3 22:07 ./ drwxr-xr-x 3 root root 0 Jun 3 22:07 ../ --w------- 1 root root 4096 Jun 3 22:28 bind lrwxrwxrwx 1 root root 0 Jun 3 22:28 module -> ../../../../module/vhost_vdpa/ --w------- 1 root root 4096 Jun 3 22:07 uevent --w------- 1 root root 4096 Jun 3 22:28 unbind lrwxrwxrwx 1 root root 0 Jun 3 22:28 vdpa0 -> ../../../../devices/pci0000:00/0000:00:03.0/0000:07:00.2/vdpa0/ lrwxrwxrwx 1 root root 0 Jun 3 22:28 vdpa1 -> ../../../../devices/pci0000:00/0000:00:03.0/0000:07:00.3/vdpa1/ lrwxrwxrwx 1 root root 0 Jun 3 22:28 vdpa2 -> ../../../../devices/pci0000:00/0000:00:03.0/0000:07:01.2/vdpa2/ lrwxrwxrwx 1 root root 0 Jun 3 22:28 vdpa3 -> ../../../../devices/pci0000:00/0000:00:03.0/0000:07:01.3/vdpa3/ [root@c83g152 ~]# lshw -businfo -c network Bus info Device Class Description ======================================================= pci@0000:04:00.0 ens1f0 network 82599ES 10-Gigabit SFI/SFP+ Network Connection pci@0000:04:00.1 ens1f1 network 82599ES 10-Gigabit SFI/SFP+ Network Connection pci@0000:03:00.0 eno1 network NetXtreme BCM5719 Gigabit Ethernet PCIe pci@0000:03:00.1 eno2 network NetXtreme BCM5719 Gigabit Ethernet PCIe pci@0000:03:00.2 eno3 network NetXtreme BCM5719 Gigabit Ethernet PCIe pci@0000:03:00.3 eno4 network NetXtreme BCM5719 Gigabit Ethernet PCIe pci@0000:07:00.0 ens2f0 network MT2892 Family [ConnectX-6 Dx] pci@0000:07:00.1 ens2f1 network MT2892 Family [ConnectX-6 Dx] pci@0000:07:00.2 network ConnectX Family mlx5Gen Virtual Function pci@0000:07:00.3 network ConnectX Family mlx5Gen Virtual Function pci@0000:07:01.2 network ConnectX Family mlx5Gen Virtual Function pci@0000:07:01.3 network ConnectX Family mlx5Gen Virtual Function [root@c83g152 ~]# ls /dev autofs full lp3 ptp4 shm vcs2 vcsu5 block fuse mapper ptp5 snapshot vcs3 vcsu6 bsg hpet mcelog ptp6 snd vcs4 vfio btrfs-control hpilo mem ptp7 stderr vcs5 vga_arbiter bus hugepages mqueue ptp8 stdin vcs6 vhci char hwrng net ptp9 stdout vcsa vhost-net cl infiniband null pts tty vcsa1 vhost-vdpa-0 console initctl nvram random tty0 vcsa2 vhost-vdpa-1 core input port raw tty1 vcsa3 vhost-vdpa-2 cpu ipmi0 ppp rfkill tty10 vcsa4 vhost-vdpa-3 cpu_dma_latency kmsg ptmx rtc tty11 vcsa5 vhost-vsock disk kvm ptp0 rtc0 tty12 vcsa6 watchdog dm-0 log ptp1 sda tty13 vcsu watchdog0 dm-1 loop-control ptp10 sda1 tty14 vcsu1 zero dri lp0 ptp11 sda2 tty15 vcsu2 fb0 lp1 ptp2 sg0 tty16 vcsu3 fd lp2 ptp3 sg1 tty17 vcsu4
9.Finally
We referred to the following website.
https://github.com/redhat-nfvpe/vdpa-deployment
https://docs.google.com/document/d/1DgZuksLVIVD5ZpNUNH7zPUr-8t6GKKQICDLqIwQv-FA/edit
This time we used the Legacy mode of SR-IOV, but next time we would like to use accelerated-bridge-cni and verify it in swtichdev mode.
GitHub - k8snetworkplumbingwg/accelerated-bridge-cni
Also, we thought about how many combinations of environments using vDPA exist, and it looks like there are at least 24 16 different ones.
2 and 8 were described in previous articles, and 9 is in this article.
No | vm(qemu)/k8s | k8s Pod/VMI | vDPA Framework | vDPA Type | SR-IOV mode | Related Articles |
1 | vm | - | kernel | vhost | lagacy | Not started |
2 | vm | - | kernel | vhost | switchdev | How to set up vDPA with vhost_vdpa for VMs - Metonymical Deflection |
3 | vm | - | kernel | virtio | lagacy | Not started |
4 | vm | - | kernel | virtio | switchdev | Not started |
5 | vm | - | dpdk | vhost | lagacy | Not started |
6 | vm | - | dpdk | vhost | switchdev | Not started |
7 | vm | - | dpdk | virtio | lagacy | Not started |
8 | vm | - | dpdk | virtio | switchdev | How to set up vDPA with virtio_vdpa for VMs - Metonymical Deflection |
9 | k8s | pod | kernel | vhost | lagacy | How to set up vDPA with vhost_vdpa for Kubernetes - Metonymical DeflectionThis article |
10 | k8s | pod | kernel | vhost | switchdev | How to set up vDPA with vhost_vdpa for Kubernetes + Accelerated Bridge CNI - Metonymical Deflection |
11 | k8s | pod | kernel | virtio | lagacy | Not started |
12 | k8s | pod | kernel | virtio | switchdev | Not started |
13 | k8s | pod | dpdk | client | lagacy | Not started |
14 | k8s | pod | dpdk | client | switchdev | Not started |
15 | k8s | pod | dpdk | server | lagacy | Not started |
16 | k8s | pod | dpdk | server | switchdev | Not started |
It would be inefficient to do all of these, so we will prioritize and verify those that are likely to be used frequently as use cases.
If you can understand the following articles, I think that the content of this time is not so difficult, so I realized that it is important to build up the basics.
How to set up vDPA with vhost_vdpa for VMs - Metonymical Deflection
In the future, when you encounter an environment that uses vDPA, isn't it important to be able to figure out which combination it is composed of?
Becoming Cloud-Native implies increasing abstraction, and as an engineer, I believe we have to avoid the situation where we find ourselves in a state where we don't know anything.
*1:After checking various documents, I found that the "v" in vDPA has three different meanings: virtual, vhost, and virtio, but they all seem to be the same. In this article, I have followed Introduction to vDPA kernel framework and used virtio.
*2:A loop connection is used in case a 100Gbps switch or a server with 100GNICs is not available. However, since we believe it is important that packets generated by the Pod are physically sent to the outside, we use the configuration shown in fig.1.
*3:As for swtichdev mode, there is "accelerated-bridge-cni" provided by Mellanox, which we plan to verify in the future. GitHub - k8snetworkplumbingwg/accelerated-bridge-cni
*4:In the case of VMs, the MAC address setting was mandatory, but in the case of k8s Pod, it is not a mandatory setting because the MAC address is written in the yaml.
*5:core and modules will be installed at the same time
*6:When you build the Kernel from source, you can use If you have enabled vhost_vdpa by using make menuconfig, etc., this setting is not necessary.
*7:In addition, you can also build dpdk-devel and dpdk-app. However, since we could not communicate with the external NW, we will not run DPDK on the Pod this time, but use the normal CentOS Pod to communicate with it