OVS-DPDK: A Practical (and Kernel-Aware) Setup Guide

1. Background

DPDK (Data Plane Development Kit) is a set of user-space libraries and drivers designed for high-throughput, low-latency packet processing. Instead of relying on the kernel networking stack for every packet, DPDK enables polling-mode drivers (PMDs) in user space and uses optimizations such as:

  • CPU pinning and isolation
  • NUMA-aware memory placement
  • Hugepages for stable TLB behavior and DMA mappings
  • VFIO for secure, high-performance device assignment

This document walks through installing DPDK + Open vSwitch (OVS) from packages (when available), enabling OVS-DPDK, creating vhost-user ports, and attaching VMs via QEMU command line or libvirt. It also lists common issues you will likely hit in real deployments.


2. Pre-requirements

2.1 NIC must be supported by DPDK

Check the supported NIC list:

https://core.dpdk.org/supported/

2.2 Platform prerequisites

  • CPU pinning (isolated cores for PMDs and VM vCPUs)
  • NUMA awareness (recommended; not strictly required)
  • Hugepages (required for most DPDK deployments)
  • IOMMU/VFIO support (strongly recommended vs UIO)

2.3 Software baseline

  • Kernel: > 3.2 (practically you want newer for VFIO/IOMMU stability)
  • glibc: > 2.7
  • QEMU: vhost-user works well with QEMU >= 2.2; vhost-user-client requires QEMU >= 2.7

3. Host Setup

3.1 CPU pinning and isolation

DPDK PMDs (in ovs-vswitchd) are polling threads and should run on dedicated cores. Similarly, VM vCPUs and the QEMU emulator thread should be pinned to avoid scheduler noise.

Recommended approach:

  • Reserve a set of isolated cores for DPDK PMDs
  • Reserve a separate set for VM vCPUs
  • Pin QEMU emulator thread to the same NUMA node

Kernel boot hint (example):

GRUB_CMDLINE_LINUX_DEFAULT="quiet isolcpus=1,2,3 nohz_full=1,2,3 rcu_nocbs=1,2,3"

3.2 NUMA awareness

NUMA is not strictly required, but performance can collapse if OVS-DPDK, VMs, and NIC DMA are spread across nodes.

To identify the NUMA node of a NIC:

cat /sys/class/net/ethX/device/numa_node

Goal: place ovs-vswitchd PMD threads, hugepage memory, and VM vCPUs on the same NUMA node as the physical NIC whenever possible.


3.3 Hugepages

3.3.1 Static (GRUB)

Example for 1G hugepages:

default_hugepagesz=1G hugepagesz=1G hugepages=4

Regenerate grub and reboot:

grub2-mkconfig -o /boot/grub2/grub.cfg

3.3.2 Dynamic allocation

Example for 2MB pages:

echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

On NUMA systems, allocate per-node:

echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
echo 1024 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages

Example for 1G pages on a specific node:

echo 4 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages

3.3.3 Mount hugepage filesystem

mkdir -p /mnt/huge
mount -t hugetlbfs nodev /mnt/huge

For 1G hugepages (optional mountpoint):

mkdir -p /mnt/huge_1GB
mount -t hugetlbfs -o pagesize=1G nodev /mnt/huge_1GB

Persist via /etc/fstab:

nodev /mnt/huge     hugetlbfs defaults        0 0
nodev /mnt/huge_1GB hugetlbfs pagesize=1G     0 0

Note: If libvirt is used, restart it after hugepage changes:

systemctl restart libvirtd

3.4 VFIO / IOMMU

VFIO is preferred over UIO because it provides proper IOMMU isolation and typically better security and stability. Ensure VT-d/AMD-Vi is enabled in BIOS.

Add IOMMU flags via GRUB (Intel example):

intel_iommu=on iommu=pt

Rebuild grub and reboot, then validate:

dmesg | egrep -i "DMAR|IOMMU|VFIO"

4. OVS + DPDK

On some distros, OVS is built with DPDK enabled; on others you may need to build from source. Verify support by checking the OVS version output:

ovs-vswitchd --version

Example (shows DPDK enabled):

ovs-vswitchd (Open vSwitch) 2.10.1
DPDK 18.02.2

4.1 DPDK: bind the NIC to a DPDK-capable driver

Modern recommendation: use vfio-pci.

modprobe vfio
modprobe vfio_pci

Then use the DPDK devbind tool (path varies by distro; common names include dpdk-devbind.py):

dpdk-devbind.py --status
dpdk-devbind.py --bind=vfio-pci 0000:01:00.0
dpdk-devbind.py --unbind 0000:01:00.0

Legacy (less preferred) option: igb_uio (requires UIO modules):

modprobe uio
modprobe igb_uio
dpdk-devbind.py --bind=igb_uio 0000:01:00.0

Some NICs require vendor drivers or specific firmware/PMD combos; always validate with DPDK docs for your NIC family (mlx/ixgbe/i40e, etc.).


4.2 OVS-DPDK initialization

Initialize OVS and enable DPDK:

ovs-vsctl --no-wait init
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0x6
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem=1024

Restart OVS services (method depends on distro/service unit names):

systemctl restart openvswitch

4.3 Bridge and ports (netdev datapath)

4.3.1 Create a DPDK bridge

ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev

4.3.2 Add a DPDK physical port

ovs-vsctl add-port br0 dpdk-p0 \
  -- set Interface dpdk-p0 type=dpdk options:dpdk-devargs=0000:01:00.0

Some NICs (e.g., ConnectX-3) may map multiple ports under one PCI function; use class/mac selection:

ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
  options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55"

4.3.3 Add vhost-user ports

OVS supports:

  • dpdkvhostuser (OVS is server, QEMU is client) — older, less flexible
  • dpdkvhostuserclient (OVS is client, QEMU is server) — preferred; supports OVS restart without forcing VM restart

Recommendation: use dpdkvhostuserclient when possible (requires QEMU >= 2.7).

vhost-user (OVS server):

ovs-vsctl add-port br0 vhost-user1 \
  -- set Interface vhost-user1 type=dpdkvhostuser

vhost-user-client (OVS client):

ovs-vsctl add-port br0 vhostclient0 \
  -- set Interface vhostclient0 type=dpdkvhostuserclient \
  options:vhost-server-path=/var/run/openvswitch/vhostclient0

5. VM Setup

5.1 QEMU command line example

Key points:

  • Back guest memory with hugepages
  • Use vhost-user netdev + virtio-net-pci
  • Pin vCPUs and emulator thread (not shown here: use taskset/cgroups/libvirt)
qemu-system-x86_64 \
  -name vm-dpdk \
  -enable-kvm -cpu host \
  -m 4096 \
  -object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on \
  -numa node,memdev=mem \
  -mem-prealloc \
  -smp sockets=1,cores=2 \
  -drive file=/path/to/disk.img,if=virtio,format=qcow2 \
  -chardev socket,id=char0,path=/var/run/openvswitch/vhostclient0,server=on,wait=off \
  -netdev type=vhost-user,id=net0,chardev=char0,vhostforce \
  -device virtio-net-pci,netdev=net0,mac=52:54:00:3c:d1:ae,mrg_rxbuf=on \
  -nographic

Notes:

  • For dpdkvhostuserclient, QEMU should act as server for the socket (hence server=on in many setups).
  • For dpdkvhostuser, QEMU acts as client and connects to OVS-managed socket.

5.2 Libvirt configuration

5.2.1 Hugepage backing

<memoryBacking>
  <hugepages>
    <page size='1048576' unit='KiB'/>
  </hugepages>
</memoryBacking>

5.2.2 vhost-user interface

For vhost-user client mode (recommended), libvirt typically sets the VM side as mode='server' depending on which side owns the socket. Match it with your OVS port type.

<interface type='vhostuser'>
  <mac address='52:54:00:55:55:56'/>
  <source type='unix' path='/var/run/openvswitch/vhostclient0' mode='server'/>
  <model type='virtio'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</interface>

5.2.3 CPU/NUMA tuning (typical example)

<vcpu placement='static'>6</vcpu>
<cputune>
  <shares>4096</shares>
  <vcpupin vcpu='0' cpuset='0'/>
  <vcpupin vcpu='1' cpuset='2'/>
  <vcpupin vcpu='2' cpuset='4'/>
  <vcpupin vcpu='3' cpuset='6'/>
  <emulatorpin cpuset='0,2,4,6'/>
</cputune>
<numatune>
  <memory mode='strict' nodeset='0'/>
</numatune>

5.3 Guest OS: hugepages + CPU isolation (optional)

If your guest runs packet processing too, also configure hugepages and isolate vCPUs inside the guest:

GRUB_CMDLINE_LINUX_DEFAULT="quiet default_hugepagesz=1G hugepagesz=1G hugepages=2 isolcpus=1,2,3"

6. Common issues

  • Hugepages not mounted: OVS-DPDK starts but PMD/alloc fails; VM memory-backend-file fails.
  • OVS not built with DPDK: ovs-vswitchd --version shows no DPDK; vhost-user types unavailable.
  • NIC not bound correctly: DPDK PMD cannot claim device; check dpdk-devbind.py --status.
  • Wrong NUMA placement: huge latency/jitter; ensure NIC/PMD/VM are on same node.
  • vhost socket permission: QEMU/libvirt cannot open /var/run/openvswitch/...; fix ownership/SELinux/AppArmor as applicable.
  • Port mode mismatch: dpdkvhostuser vs dpdkvhostuserclient requires different socket ownership (server/client).

7. Reference

  • https://wiki.qemu.org/Documentation/vhost-user-ovs-dpdk
  • http://docs.openvswitch.org/en/latest/topics/dpdk/vhost-user/
  • https://github.com/qemu/qemu/blob/master/tests/vhost-user-test.c
  • https://github.com/openvswitch/ovs/blob/master/Documentation/intro/install/dpdk.rst
← Previous Post
Next Post →

Leave a Comment