vhost-user vm setup over ovs-dpdk

1. Background
2. Pre-requirement
3. Host Setup
    3.1 CPU pin
    3.2 numa
    3.3 huge page
    3.4 vfio
4. OVS+DPDK
    4.1 DPDK
    4.2 ovs
5 VM setup
    5.1 QEMU usage
    5.2 Libvirt usage
6. Common issues
7. Reference

1. Background

DPDK, which is a data plane development kit, is a set of libraries to accelerate packet processing by running at userspace level, it is also used a bunch of other optimizations, like CPU affinity, Numa, huge page and so on.

This document is more of a tutorial that should guide you through all of the steps of installing DPDK and Open vSwitch from the packages, and then setup vhost-user ports for a VM running by qemu command line or libvirt, it will also record some common issues.

2. Pre-requirement 

There are a couple of things need to know before running DPDK

2.1  make sure your nic supports DPDK first 
https://core.dpdk.org/supported/

2.2 CPU pin, Numa, and huge page support

2.3 software requirement

kernel > 3.2
glibc > 2.7.

3. Host Setup

3.1 CPU core mask

3.2. Numa

This is not necessary, Uma could also work

We want to make sure to run the VM on the same NUMA node as ovs-vswitchd and as the backing NIC.

To determine which NUMA node a PCI device (NIC) is on, you can cat /sys/class/net/eth<#>/device/numa_node to see either a 0 or 1.

3.3. huge page setup

3.3.1 grub command line
default_hugepagesz=1G hugepagesz=1G hugepages=4

grub2-mkconfig -o /boot/grub2/grub.cfg

3.3.2 dynamic

echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

On a NUMA machine, pages should be allocated explicitly on separate nodes:
echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
echo 1024 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages

# echo 4 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
# echo 1024 > /sys/devices/system/node/node3/hugepages/hugepages-2048kB/nr_hugepages

3.3.3. mount huge pages before use

mkdir /mnt/huge
mount -t hugetlbfs nodev /mnt/huge

# mkdir /dev/hugepages1G
# mount -t hugetlbfs -o pagesize=1G none /dev/hugepages1G
# but since I did not bind the right dpdk driver,
# mkdir /dev/hugepages2M

# mount -t hugetlbfs -o pagesize=2M none /dev/hugepages2M

it could be made permanet by /etc/fstab
 nodev /mnt/huge hugetlbfs defaults 0 0

for 1G page_size
nodev /mnt/huge_1GB hugetlbfs pagesize=1GB 0 0

make sure restart libvirtd

3.4 VFIO

make sure your cpu supports vt-d and iommu first.
VFIO is preferred for the latest DPDK because it has better performance than UIO

Once your hardware supports it, just add below in grub command line

4. OVS+DPDK

DPDK will work as a library for OVS, current SLES disable it, however, OpenSUSE included dpdk for ovs.
Will discuss with the Network team further the reason why SLE disabled it.

To verify, you could run the below command:

linux-3txe:~ # ovs-vswitchd --version
ovs-vswitchd (Open vSwitch) 2.10.1
DPDK 18.02.2    

4.1 DPDK setup

Bind device, make sure the module is already loaded in the kernel

modprobe igb_uio

dpdk_devbind --status
dpdk_devbind --bind=igb_uio 0000:02:00.0
dpdk_devbind --unbind 0000:02:00.0

For vfio-pci

modprobe vfio
modprobe vfio_pci

Sometimes, need to bind a special driver from the vendor directly.

4.2 OVS setup

For OVS that does not include dpdk, 
need to build from source

For thoses includes DPDK

ovs-vsctl --no-wait init
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0x6
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem=1024
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true

4.3 Setup for VM

4.3.1 Add a bridge

ovs-vsctl add-br ovs-br0 -- set bridge ovs-br0 datapath_type=netdev

4.3.2  Add a dpdk port
ovs-vsctl add-port br0 dpdk-p0 \
   -- set Interface dpdk-p0 type=dpdk options:dpdk-devargs=0000:01:00.0

Some NICs (i.e. Mellanox ConnectX-3) have only one PCI address associated with multiple ports. Using a PCI device like above won’t work. Instead, below usage is suggested:
 

$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
    options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55"
4.3.3 add vhost-user port  qemu>2.2

Open vSwitch provides two types of vHost User ports:

  • vhost-user (dpdkvhostuser)
  • vhost-user-client (dpdkvhostuserclient)

vHost User uses a client-server model. The server manages the vHost User sockets, 
and the client connects to the server. Depending on which port type you use, dpdkvhostuser or dpdkvhostuserclient, a different configuration of the client-servermodel is used.

For vhost-user ports, Open vSwitch acts as the server and QEMU the client. This means if OVS dies, all VMs must be restarted. On the other hand, for vhost-user-client ports, OVS acts as the client and QEMU the server. This means OVS can die and be restarted without issue, and it is also possible to restart an instance itself. For this reason, vhost-user-client ports are the preferred type for all known use cases; the only limitation is that vhost-user client mode ports require QEMU version 2.7. Ports of type vhost-user are currently deprecated and will be removed in a future release.

For vhost-user
 

ovs-vsctl add-port ovs-br0 vhost-user1 -- set Interface vhost-user1 
type=dpdkvhostuser -- set Interface vhost-user1

For vhost-user-client

ovs-vsctl add-port br0 dpdkvhostclient0 \
    -- set Interface dpdkvhostclient0 type=dpdkvhostuserclient \
       options:vhost-server-path=/tmp/dpdkvhostclient0

5 Guest VM 

5.1 . qemu command line

qemu-system-x86_64 -name KVM-VPX -cpu host -enable-kvm -m 4096M \
-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -numa node,memdev=mem \
-mem-prealloc -smp sockets=1,cores=2 -drive file=<absolute-path-to-disc-image-file>,if=none,id=drive-ide0-0-0,format=<disc-image-format> \
-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 \
-netdev type=tap,id=hostnet0,script=no,downscript=no,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:3c:d1:ae,bus=pci.0,addr=0x3 \
-chardev socket,id=char0,path=</usr/local/var/run/openvswitch/vhost-user1> \
-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on
--nographic

Two main parts need be 

5.2.1 set up huge page

<memoryBacking>
    <hugepages>
      <page size=’1048576’ unit=’KiB’/>
    </hugepages>
  </memoryBacking>

5.2.2 insert vhost user 

<interface type=’vhostuser’>
      <mac address=’52:54:00:55:55:56’/>
      <source type=’unix’ path=’/var/run/openvswitch/vhost-user1’ mode=’client’/>
      <model type=’virtio’/>
      <address type=’pci’ domain=’0x0000’ bus=’0x00’ slot=’0x04’ function=’0x0’/>
    </interface>

5.2.3 other optimization:

<vcpu placement=’static’>6</vcpu>
  <cputune>
    <shares>4096</shares>
    <vcpupin vcpu=’0’ cpuset=’0’/>
    <vcpupin vcpu=’1’ cpuset=’2’/>
    <vcpupin vcpu=’2’ cpuset=’4’/>
    <vcpupin vcpu=’3’ cpuset=’6’/>
    <emulatorpin cpuset=’0,2,4,6’/>
  </cputune>
  <numatune>
    <memory mode=’strict’ nodeset=’0’/>
  </numatune>

5.3 Guest VM setup

vhost user port for sure, but also needs isolate cpu and huge page

Here is the VM kernel command line, set in /etc/default/grub:

GRUB_CMDLINE_LINUX_DEFAULT="quiet default_hugepagesz=1G hugepagesz=1G hugepages=2 isolcpus=1,2,3"

6. Common issues:

6.1 must setup huge pages and mount them
6.2 dpdk port is necessary for ovs
6.3 ovs need includes dpdk library
6.4 rte driver should included in dpdk
mlx,ixgbe...
6.5 issue, vhost socket permission issue

7. Reference:

https://wiki.qemu.org/Documentation/vhost-user-ovs-dpdk
http://docs.openvswitch.org/en/latest/topics/dpdk/vhost-user/
https://github.com/qemu/qemu/blob/master/tests/vhost-user-test.c
https://github.com/openvswitch/ovs/blob/master/Documentation/intro/install/dpdk.rst
http://docs.openvswitch.org/en/latest/topics/dpdk/vhost-user/