-Using Open vSwitch with DPDK
-============================
+OVS DPDK INSTALL GUIDE
+================================
-Open vSwitch can use Intel(R) DPDK lib to operate entirely in
-userspace. This file explains how to install and use Open vSwitch in
-such a mode.
+## Contents
-The DPDK support of Open vSwitch is considered experimental.
-It has not been thoroughly tested.
+1. [Overview](#overview)
+2. [Building and Installation](#build)
+3. [Setup OVS DPDK datapath](#ovssetup)
+4. [DPDK in the VM](#builddpdk)
+5. [OVS Testcases](#ovstc)
+6. [Limitations ](#ovslimits)
-This version of Open vSwitch should be built manually with `configure`
-and `make`.
+## <a name="overview"></a> 1. Overview
-OVS needs a system with 1GB hugepages support.
+Open vSwitch can use DPDK lib to operate entirely in userspace.
+This file provides information on installation and use of Open vSwitch
+using DPDK datapath. This version of Open vSwitch should be built manually
+with `configure` and `make`.
-Building and Installing:
-------------------------
+The DPDK support of Open vSwitch is considered 'experimental'.
-Required: DPDK 2.1
-Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev`
-on Debian/Ubuntu)
+### Prerequisites
-1. Configure build & install DPDK:
- 1. Set `$DPDK_DIR`
+* Required: DPDK 16.04, libnuma
+* Hardware: [DPDK Supported NICs] when physical ports in use
- ```
- export DPDK_DIR=/usr/src/dpdk-2.1
- cd $DPDK_DIR
- ```
-
- 2. Update `config/common_linuxapp` so that DPDK generate single lib file.
- (modification also required for IVSHMEM build)
-
- `CONFIG_RTE_BUILD_COMBINE_LIBS=y`
-
- Then run `make install` to build and install the library.
- For default install without IVSHMEM:
-
- `make install T=x86_64-native-linuxapp-gcc`
-
- To include IVSHMEM (shared memory):
-
- `make install T=x86_64-ivshmem-linuxapp-gcc`
-
- For further details refer to http://dpdk.org/
-
-2. Configure & build the Linux kernel:
-
- Refer to intel-dpdk-getting-started-guide.pdf for understanding
- DPDK kernel requirement.
-
-3. Configure & build OVS:
-
- * Non IVSHMEM:
-
- `export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/`
-
- * IVSHMEM:
-
- `export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/`
-
- ```
- cd $(OVS_DIR)/openvswitch
- ./boot.sh
- ./configure --with-dpdk=$DPDK_BUILD [CFLAGS="-g -O2 -Wno-cast-align"]
- make
- ```
-
- Note: 'clang' users may specify the '-Wno-cast-align' flag to suppress DPDK cast-align warnings.
-
-To have better performance one can enable aggressive compiler optimizations and
-use the special instructions(popcnt, crc32) that may not be available on all
-machines. Instead of typing `make`, type:
-
-`make CFLAGS='-O3 -march=native'`
-
-Refer to [INSTALL.userspace.md] for general requirements of building userspace OVS.
-
-Using the DPDK with ovs-vswitchd:
----------------------------------
-
-1. Setup system boot
- Add the following options to the kernel bootline:
-
- `default_hugepagesz=1GB hugepagesz=1G hugepages=1`
-
-2. Setup DPDK devices:
-
- DPDK devices can be setup using either the VFIO (for DPDK 1.7+) or UIO
- modules. UIO requires inserting an out of tree driver igb_uio.ko that is
- available in DPDK. Setup for both methods are described below.
-
- * UIO:
- 1. insert uio.ko: `modprobe uio`
- 2. insert igb_uio.ko: `insmod $DPDK_BUILD/kmod/igb_uio.ko`
- 3. Bind network device to igb_uio:
- `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1`
-
- * VFIO:
-
- VFIO needs to be supported in the kernel and the BIOS. More information
- can be found in the [DPDK Linux GSG].
-
- 1. Insert vfio-pci.ko: `modprobe vfio-pci`
- 2. Set correct permissions on vfio device: `sudo /usr/bin/chmod a+x /dev/vfio`
- and: `sudo /usr/bin/chmod 0666 /dev/vfio/*`
- 3. Bind network device to vfio-pci:
- `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1`
-
-3. Mount the hugetable filesystem
-
- `mount -t hugetlbfs -o pagesize=1G none /dev/hugepages`
-
- Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup.
-
-4. Follow the instructions in [INSTALL.md] to install only the
- userspace daemons and utilities (via 'make install').
- 1. First time only db creation (or clearing):
-
- ```
- mkdir -p /usr/local/etc/openvswitch
- mkdir -p /usr/local/var/run/openvswitch
- rm /usr/local/etc/openvswitch/conf.db
- ovsdb-tool create /usr/local/etc/openvswitch/conf.db \
- /usr/local/share/openvswitch/vswitch.ovsschema
- ```
-
- 2. Start ovsdb-server
-
- ```
- ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \
- --remote=db:Open_vSwitch,Open_vSwitch,manager_options \
- --private-key=db:Open_vSwitch,SSL,private_key \
- --certificate=Open_vSwitch,SSL,certificate \
- --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach
- ```
-
- 3. First time after db creation, initialize:
-
- ```
- ovs-vsctl --no-wait init
- ```
-
-5. Start vswitchd:
-
- DPDK configuration arguments can be passed to vswitchd via `--dpdk`
- argument. This needs to be first argument passed to vswitchd process.
- dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
- for dpdk initialization.
-
- ```
- export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
- ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach
- ```
-
- If allocated more than one GB hugepage (as for IVSHMEM), set amount and
- use NUMA node 0 memory:
-
- ```
- ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \
- -- unix:$DB_SOCK --pidfile --detach
- ```
-
-6. Add bridge & ports
-
- To use ovs-vswitchd with DPDK, create a bridge with datapath_type
- "netdev" in the configuration database. For example:
-
- `ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev`
-
- Now you can add dpdk devices. OVS expects DPDK device names to start with
- "dpdk" and end with a portid. vswitchd should print (in the log file) the
- number of dpdk devices found.
-
- ```
- ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
- ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
- ```
-
- Once first DPDK port is added to vswitchd, it creates a Polling thread and
- polls dpdk device in continuous loop. Therefore CPU utilization
- for that thread is always 100%.
-
- Note: creating bonds of DPDK interfaces is slightly different to creating
- bonds of system interfaces. For DPDK, the interface type must be explicitly
- set, for example:
-
- ```
- ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 -- set Interface dpdk0 type=dpdk -- set Interface dpdk1 type=dpdk
- ```
-
-7. Add test flows
-
- Test flow script across NICs (assuming ovs in /usr/src/ovs):
- Execute script:
-
- ```
- #! /bin/sh
- # Move to command directory
- cd /usr/src/ovs/utilities/
-
- # Clear current flows
- ./ovs-ofctl del-flows br0
-
- # Add flows between port 1 (dpdk0) to port 2 (dpdk1)
- ./ovs-ofctl add-flow br0 in_port=1,action=output:2
- ./ovs-ofctl add-flow br0 in_port=2,action=output:1
- ```
-
-Performance Tuning:
--------------------
-
- 1. PMD affinitization
-
- A poll mode driver (pmd) thread handles the I/O of all DPDK
- interfaces assigned to it. A pmd thread will busy loop through
- the assigned port/rxq's polling for packets, switch the packets
- and send to a tx port if required. Typically, it is found that
- a pmd thread is CPU bound, meaning that the greater the CPU
- occupancy the pmd thread can get, the better the performance. To
- that end, it is good practice to ensure that a pmd thread has as
- many cycles on a core available to it as possible. This can be
- achieved by affinitizing the pmd thread with a core that has no
- other workload. See section 7 below for a description of how to
- isolate cores for this purpose also.
-
- The following command can be used to specify the affinity of the
- pmd thread(s).
-
- `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string>`
-
- By setting a bit in the mask, a pmd thread is created and pinned
- to the corresponding CPU core. e.g. to run a pmd thread on core 1
-
- `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=2`
-
- For more information, please refer to the Open_vSwitch TABLE section in
-
- `man ovs-vswitchd.conf.db`
-
- Note, that a pmd thread on a NUMA node is only created if there is
- at least one DPDK interface from that NUMA node added to OVS.
-
- 2. Multiple poll mode driver threads
-
- With pmd multi-threading support, OVS creates one pmd thread
- for each NUMA node by default. However, it can be seen that in cases
- where there are multiple ports/rxq's producing traffic, performance
- can be improved by creating multiple pmd threads running on separate
- cores. These pmd threads can then share the workload by each being
- responsible for different ports/rxq's. Assignment of ports/rxq's to
- pmd threads is done automatically.
-
- The following command can be used to specify the affinity of the
- pmd threads.
-
- `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string>`
-
- A set bit in the mask means a pmd thread is created and pinned
- to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2
-
- `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6`
-
- For more information, please refer to the Open_vSwitch TABLE section in
-
- `man ovs-vswitchd.conf.db`
-
- For example, when using dpdk and dpdkvhostuser ports in a bi-directional
- VM loopback as shown below, spreading the workload over 2 or 4 pmd
- threads shows significant improvements as there will be more total CPU
- occupancy available.
+## <a name="build"></a> 2. Building and Installation
- NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
+### 2.1 Configure & build the Linux kernel
- The OVS log can be checked to confirm that the port/rxq assignment to
- pmd threads is as required. This can also be checked with the following
- commands:
+On Linux Distros running kernel version >= 3.0, kernel rebuild is not required
+and only grub cmdline needs to be updated for enabling IOMMU [VFIO support - 3.2].
+For older kernels, check if kernel is built with UIO, HUGETLBFS, PROC_PAGE_MONITOR,
+HPET, HPET_MMAP support.
- ```
- top -H
- taskset -p <pid_of_pmd>
- ```
+Detailed system requirements can be found at [DPDK requirements] and also refer to
+advanced install guide [INSTALL.DPDK-ADVANCED.md]
- To understand where most of the pmd thread time is spent and whether the
- caches are being utilized, these commands can be used:
+### 2.2 Install DPDK
+ 1. [Download DPDK] and extract the file, for example in to /usr/src
+ and set DPDK_DIR
- ```
- # Clear previous stats
- ovs-appctl dpif-netdev/pmd-stats-clear
-
- # Check current stats
- ovs-appctl dpif-netdev/pmd-stats-show
- ```
-
- 3. DPDK port Rx Queues
-
- `ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=<integer>`
-
- The command above sets the number of rx queues for each DPDK interface.
- The rx queues are assigned to pmd threads on the same NUMA node in a
- round-robin fashion. For more information, please refer to the
- Open_vSwitch TABLE section in
-
- `man ovs-vswitchd.conf.db`
-
- 4. Exact Match Cache
-
- Each pmd thread contains one EMC. After initial flow setup in the
- datapath, the EMC contains a single table and provides the lowest level
- (fastest) switching for DPDK ports. If there is a miss in the EMC then
- the next level where switching will occur is the datapath classifier.
- Missing in the EMC and looking up in the datapath classifier incurs a
- significant performance penalty. If lookup misses occur in the EMC
- because it is too small to handle the number of flows, its size can
- be increased. The EMC size can be modified by editing the define
- EM_FLOW_HASH_SHIFT in lib/dpif-netdev.c.
-
- As mentioned above an EMC is per pmd thread. So an alternative way of
- increasing the aggregate amount of possible flow entries in EMC and
- avoiding datapath classifier lookups is to have multiple pmd threads
- running. This can be done as described in section 2.
-
- 5. Compiler options
-
- The default compiler optimization level is '-O2'. Changing this to
- more aggressive compiler optimizations such as '-O3' or
- '-Ofast -march=native' with gcc can produce performance gains.
-
- 6. Simultaneous Multithreading (SMT)
-
- With SMT enabled, one physical core appears as two logical cores
- which can improve performance.
-
- SMT can be utilized to add additional pmd threads without consuming
- additional physical cores. Additional pmd threads may be added in the
- same manner as described in section 2. If trying to minimize the use
- of physical cores for pmd threads, care must be taken to set the
- correct bits in the pmd-cpu-mask to ensure that the pmd threads are
- pinned to SMT siblings.
-
- For example, when using 2x 10 core processors in a dual socket system
- with HT enabled, /proc/cpuinfo will report 40 logical cores. To use
- two logical cores which share the same physical core for pmd threads,
- the following command can be used to identify a pair of logical cores.
-
- `cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list`
-
- where N is the logical core number. In this example, it would show that
- cores 1 and 21 share the same physical core. The pmd-cpu-mask to enable
- two pmd threads running on these two logical cores (one physical core)
- is.
-
- `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=100002`
-
- Note that SMT is enabled by the Hyper-Threading section in the
- BIOS, and as such will apply to the whole system. So the impact of
- enabling/disabling it for the whole system should be considered
- e.g. If workloads on the system can scale across multiple cores,
- SMT may very beneficial. However, if they do not and perform best
- on a single physical core, SMT may not be beneficial.
-
- 7. The isolcpus kernel boot parameter
-
- isolcpus can be used on the kernel bootline to isolate cores from the
- kernel scheduler and hence dedicate them to OVS or other packet
- forwarding related workloads. For example a Linux kernel boot-line
- could be:
-
- 'GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=1G hugepages=4 default_hugepagesz=1G 'intel_iommu=off' isolcpus=1-19"'
-
- 8. NUMA/Cluster On Die
-
- Ideally inter NUMA datapaths should be avoided where possible as packets
- will go across QPI and there may be a slight performance penalty when
- compared with intra NUMA datapaths. On Intel Xeon Processor E5 v3,
- Cluster On Die is introduced on models that have 10 cores or more.
- This makes it possible to logically split a socket into two NUMA regions
- and again it is preferred where possible to keep critical datapaths
- within the one cluster.
+ ```
+ cd /usr/src/
+ wget http://dpdk.org/browse/dpdk/snapshot/dpdk-16.04.zip
+ unzip dpdk-16.04.zip
- It is good practice to ensure that threads that are in the datapath are
- pinned to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs
- responsible for forwarding.
+ export DPDK_DIR=/usr/src/dpdk-16.04
+ cd $DPDK_DIR
+ ```
- 9. Rx Mergeable buffers
+ 2. Configure and Install DPDK
- Rx Mergeable buffers is a virtio feature that allows chaining of multiple
- virtio descriptors to handle large packet sizes. As such, large packets
- are handled by reserving and chaining multiple free descriptors
- together. Mergeable buffer support is negotiated between the virtio
- driver and virtio device and is supported by the DPDK vhost library.
- This behavior is typically supported and enabled by default, however
- in the case where the user knows that rx mergeable buffers are not needed
- i.e. jumbo frames are not needed, it can be forced off by adding
- rx_mrgbuf=off to the QEMU command line options. By not reserving multiple
- chains of descriptors it will make more individual virtio descriptors
- available for rx to the guest using dpdkvhost ports and this can improve
- performance.
+ Build and install the DPDK library.
- 10. Packet processing in the guest
+ ```
+ export DPDK_TARGET=x86_64-native-linuxapp-gcc
+ export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
+ make install T=$DPDK_TARGET DESTDIR=install
+ ```
- It is good practice whether simply forwarding packets from one
- interface to another or more complex packet processing in the guest,
- to ensure that the thread performing this work has as much CPU
- occupancy as possible. For example when the DPDK sample application
- `testpmd` is used to forward packets in the guest, multiple QEMU vCPU
- threads can be created. Taskset can then be used to affinitize the
- vCPU thread responsible for forwarding to a dedicated core not used
- for other general processing on the host system.
+ Note: For IVSHMEM, Set `export DPDK_TARGET=x86_64-ivshmem-linuxapp-gcc`
- 11. DPDK virtio pmd in the guest
+### 2.3 Install OVS
+ OVS can be installed using different methods. For OVS to use DPDK datapath,
+ it has to be configured with DPDK support and is done by './configure --with-dpdk'.
+ This section focus on generic recipe that suits most cases and for distribution
+ specific instructions, refer [INSTALL.Fedora.md], [INSTALL.RHEL.md] and
+ [INSTALL.Debian.md].
- dpdkvhostcuse or dpdkvhostuser ports can be used to accelerate the path
- to the guest using the DPDK vhost library. This library is compatible with
- virtio-net drivers in the guest but significantly better performance can
- be observed when using the DPDK virtio pmd driver in the guest. The DPDK
- `testpmd` application can be used in the guest as an example application
- that forwards packet from one DPDK vhost port to another. An example of
- running `testpmd` in the guest can be seen here.
+ The OVS sources can be downloaded in different ways and skip this section
+ if already having the correct sources. Otherwise download the correct version using
+ one of the below suggested methods and follow the documentation of that specific
+ version.
- `./testpmd -c 0x3 -n 4 --socket-mem 512 -- --burst=64 -i --txqflags=0xf00 --disable-hw-vlan --forward-mode=io --auto-start`
+ - OVS stable releases can be downloaded in compressed format from [Download OVS]
- See below information on dpdkvhostcuse and dpdkvhostuser ports.
- See [DPDK Docs] for more information on `testpmd`.
+ ```
+ cd /usr/src
+ wget http://openvswitch.org/releases/openvswitch-<version>.tar.gz
+ tar -zxvf openvswitch-<version>.tar.gz
+ export OVS_DIR=/usr/src/openvswitch-<version>
+ ```
+ - OVS current development can be clone using 'git' tool
+ ```
+ cd /usr/src/
+ git clone https://github.com/openvswitch/ovs.git
+ export OVS_DIR=/usr/src/ovs
+ ```
-DPDK Rings :
-------------
+ - Install OVS dependencies
-Following the steps above to create a bridge, you can now add dpdk rings
-as a port to the vswitch. OVS will expect the DPDK ring device name to
-start with dpdkr and end with a portid.
+ GNU make, GCC 4.x (or) Clang 3.4, libnuma (Mandatory)
+ libssl, libcap-ng, Python 2.7 (Optional)
+ More information can be found at [Build Requirements]
-`ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr`
+ - Configure, Install OVS
-DPDK rings client test application
+ ```
+ cd $OVS_DIR
+ ./boot.sh
+ ./configure --with-dpdk=$DPDK_BUILD
+ make install
+ ```
-Included in the test directory is a sample DPDK application for testing
-the rings. This is from the base dpdk directory and modified to work
-with the ring naming used within ovs.
+ Note: Passing DPDK_BUILD can be skipped if DPDK library is installed in
+ standard locations i.e `./configure --with-dpdk` should suffice.
-location tests/ovs_client
+ Additional information can be found in [INSTALL.md].
-To run the client :
+## <a name="ovssetup"></a> 3. Setup OVS with DPDK datapath
-```
-cd /usr/src/ovs/tests/
-ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr"
-```
+### 3.1 Setup Hugepages
-In the case of the dpdkr example above the "port id you gave dpdkr" is 0.
+ Allocate and mount 2M Huge pages:
-It is essential to have --proc-type=secondary
+ - For persistent allocation of huge pages, write to hugepages.conf file
+ in /etc/sysctl.d
-The application simply receives an mbuf on the receive queue of the
-ethernet ring and then places that same mbuf on the transmit ring of
-the ethernet ring. It is a trivial loopback application.
+ `echo 'vm.nr_hugepages=2048' > /etc/sysctl.d/hugepages.conf`
-DPDK rings in VM (IVSHMEM shared memory communications)
--------------------------------------------------------
+ - For run-time allocation of huge pages
-In addition to executing the client in the host, you can execute it within
-a guest VM. To do so you will need a patched qemu. You can download the
-patch and getting started guide at :
+ `sysctl -w vm.nr_hugepages=N` where N = No. of 2M huge pages allocated
-https://01.org/packet-processing/downloads
+ - To verify hugepage configuration
-A general rule of thumb for better performance is that the client
-application should not be assigned the same dpdk core mask "-c" as
-the vswitchd.
+ `grep HugePages_ /proc/meminfo`
-DPDK vhost:
------------
+ - Mount hugepages
-DPDK 2.1 supports two types of vhost:
+ `mount -t hugetlbfs none /dev/hugepages`
-1. vhost-user
-2. vhost-cuse
+ Note: Mount hugepages if not already mounted by default.
-Whatever type of vhost is enabled in the DPDK build specified, is the type
-that will be enabled in OVS. By default, vhost-user is enabled in DPDK.
-Therefore, unless vhost-cuse has been enabled in DPDK, vhost-user ports
-will be enabled in OVS.
-Please note that support for vhost-cuse is intended to be deprecated in OVS
-in a future release.
+### 3.2 Setup DPDK devices using VFIO
-DPDK vhost-user:
-----------------
+ - Supported with kernel version >= 3.6
+ - VFIO needs support from BIOS and kernel.
+ - BIOS changes:
-The following sections describe the use of vhost-user 'dpdkvhostuser' ports
-with OVS.
+ Enable VT-d, can be verified from `dmesg | grep -e DMAR -e IOMMU` output
-DPDK vhost-user Prerequisites:
--------------------------
+ - GRUB bootline:
-1. DPDK 2.1 with vhost support enabled as documented in the "Building and
- Installing section"
+ Add `iommu=pt intel_iommu=on`, can be verified from `cat /proc/cmdline` output
-2. QEMU version v2.1.0+
+ - Load modules and bind the NIC to VFIO driver
- QEMU v2.1.0 will suffice, but it is recommended to use v2.2.0 if providing
- your VM with memory greater than 1GB due to potential issues with memory
- mapping larger areas.
+ ```
+ modprobe vfio-pci
+ sudo /usr/bin/chmod a+x /dev/vfio
+ sudo /usr/bin/chmod 0666 /dev/vfio/*
+ $DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1
+ $DPDK_DIR/tools/dpdk_nic_bind.py --status
+ ```
-Adding DPDK vhost-user ports to the Switch:
---------------------------------------
+ Note: If running kernels < 3.6 UIO drivers to be used,
+ please check [DPDK in the VM], DPDK devices using UIO section for the steps.
-Following the steps above to create a bridge, you can now add DPDK vhost-user
-as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-user ports can
-have arbitrary names.
+### 3.3 Setup OVS
- - For vhost-user, the name of the port type is `dpdkvhostuser`
+ 1. DB creation (One time step)
```
- ovs-vsctl add-port br0 vhost-user-1 -- set Interface vhost-user-1
- type=dpdkvhostuser
+ mkdir -p /usr/local/etc/openvswitch
+ mkdir -p /usr/local/var/run/openvswitch
+ rm /usr/local/etc/openvswitch/conf.db
+ ovsdb-tool create /usr/local/etc/openvswitch/conf.db \
+ /usr/local/share/openvswitch/vswitch.ovsschema
```
- This action creates a socket located at
- `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide
- to your VM on the QEMU command line. More instructions on this can be
- found in the next section "DPDK vhost-user VM configuration"
- Note: If you wish for the vhost-user sockets to be created in a
- directory other than `/usr/local/var/run/openvswitch`, you may specify
- another location on the ovs-vswitchd command line like so:
-
- `./vswitchd/ovs-vswitchd --dpdk -vhost_sock_dir /my-dir -c 0x1 ...`
+ 2. Start ovsdb-server
-DPDK vhost-user VM configuration:
----------------------------------
-Follow the steps below to attach vhost-user port(s) to a VM.
+ No SSL support
-1. Configure sockets.
- Pass the following parameters to QEMU to attach a vhost-user device:
-
- ```
- -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1
- -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
- -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
- ```
+ ```
+ ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \
+ --remote=db:Open_vSwitch,Open_vSwitch,manager_options \
+ --pidfile --detach
+ ```
- ...where vhost-user-1 is the name of the vhost-user port added
- to the switch.
- Repeat the above parameters for multiple devices, changing the
- chardev path and id as necessary. Note that a separate and different
- chardev path needs to be specified for each vhost-user device. For
- example you have a second vhost-user port named 'vhost-user-2', you
- append your QEMU command line with an additional set of parameters:
+ SSL support
- ```
- -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
- -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
- -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
- ```
+ ```
+ ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \
+ --remote=db:Open_vSwitch,Open_vSwitch,manager_options \
+ --private-key=db:Open_vSwitch,SSL,private_key \
+ --certificate=Open_vSwitch,SSL,certificate \
+ --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach
+ ```
-2. Configure huge pages.
- QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports access
- a virtio-net device's virtual rings and packet buffers mapping the VM's
- physical memory on hugetlbfs. To enable vhost-user ports to map the VM's
- memory into their process address space, pass the following paramters
- to QEMU:
+ 3. Initialize DB (One time step)
- ```
- -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
- share=on
- -numa node,memdev=mem -mem-prealloc
- ```
+ ```
+ ovs-vsctl --no-wait init
+ ```
-DPDK vhost-cuse:
-----------------
+ 4. Start vswitchd
-The following sections describe the use of vhost-cuse 'dpdkvhostcuse' ports
-with OVS.
+ DPDK configuration arguments can be passed to vswitchd via Open_vSwitch
+ 'other_config' column. The important configuration options are listed below.
+ Defaults will be provided for all values not explicitly set. Refer
+ ovs-vswitchd.conf.db(5) for additional information on configuration options.
-DPDK vhost-cuse Prerequisites:
--------------------------
+ * dpdk-init
+ Specifies whether OVS should initialize and support DPDK ports. This is
+ a boolean, and defaults to false.
-1. DPDK 2.1 with vhost support enabled as documented in the "Building and
- Installing section"
- As an additional step, you must enable vhost-cuse in DPDK by setting the
- following additional flag in `config/common_linuxapp`:
+ * dpdk-lcore-mask
+ Specifies the CPU cores on which dpdk lcore threads should be spawned and
+ expects hex string (eg '0x123').
- `CONFIG_RTE_LIBRTE_VHOST_USER=n`
+ * dpdk-socket-mem
+ Comma separated list of memory to pre-allocate from hugepages on specific
+ sockets.
- Following this, rebuild DPDK as per the instructions in the "Building and
- Installing" section. Finally, rebuild OVS as per step 3 in the "Building
- and Installing" section - OVS will detect that DPDK has vhost-cuse libraries
- compiled and in turn will enable support for it in the switch and disable
- vhost-user support.
+ * dpdk-hugepage-dir
+ Directory where hugetlbfs is mounted
-2. Insert the Cuse module:
+ * vhost-sock-dir
+ Option to set the path to the vhost_user unix socket files.
- `modprobe cuse`
+ NOTE: Changing any of these options requires restarting the ovs-vswitchd
+ application.
-3. Build and insert the `eventfd_link` module:
+ Open vSwitch can be started as normal. DPDK will be initialized as long
+ as the dpdk-init option has been set to 'true'.
```
- cd $DPDK_DIR/lib/librte_vhost/eventfd_link/
- make
- insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko
+ export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
+ ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
+ ovs-vswitchd unix:$DB_SOCK --pidfile --detach
```
-4. QEMU version v2.1.0+
-
- vhost-cuse will work with QEMU v2.1.0 and above, however it is recommended to
- use v2.2.0 if providing your VM with memory greater than 1GB due to potential
- issues with memory mapping larger areas.
- Note: QEMU v1.6.2 will also work, with slightly different command line parameters,
- which are specified later in this document.
-
-Adding DPDK vhost-cuse ports to the Switch:
---------------------------------------
-
-Following the steps above to create a bridge, you can now add DPDK vhost-cuse
-as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-cuse ports can have
-arbitrary names.
-
- - For vhost-cuse, the name of the port type is `dpdkvhostcuse`
+ If allocated more than one GB hugepage (as for IVSHMEM), set amount and
+ use NUMA node 0 memory. For details on using ivshmem with DPDK, refer to
+ [OVS Testcases].
```
- ovs-vsctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1
- type=dpdkvhostcuse
+ ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,0"
+ ovs-vswitchd unix:$DB_SOCK --pidfile --detach
```
- When attaching vhost-cuse ports to QEMU, the name provided during the
- add-port operation must match the ifname parameter on the QEMU command
- line. More instructions on this can be found in the next section.
+ To better scale the work loads across cores, Multiple pmd threads can be
+ created and pinned to CPU cores by explicity specifying pmd-cpu-mask.
+ eg: To spawn 2 pmd threads and pin them to cores 1, 2
-DPDK vhost-cuse VM configuration:
----------------------------------
+ ```
+ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6
+ ```
- vhost-cuse ports use a Linux* character device to communicate with QEMU.
- By default it is set to `/dev/vhost-net`. It is possible to reuse this
- standard device for DPDK vhost, which makes setup a little simpler but it
- is better practice to specify an alternative character device in order to
- avoid any conflicts if kernel vhost is to be used in parallel.
+ 5. Create bridge & add DPDK devices
-1. This step is only needed if using an alternative character device.
+ create a bridge with datapath_type "netdev" in the configuration database
- The new character device filename must be specified on the vswitchd
- commandline:
+ `ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev`
- `./vswitchd/ovs-vswitchd --dpdk --cuse_dev_name my-vhost-net -c 0x1 ...`
+ Now you can add DPDK devices. OVS expects DPDK device names to start with
+ "dpdk" and end with a portid. vswitchd should print (in the log file) the
+ number of dpdk devices found.
- Note that the `--cuse_dev_name` argument and associated string must be the first
- arguments after `--dpdk` and come before the EAL arguments. In the example
- above, the character device to be used will be `/dev/my-vhost-net`.
+ ```
+ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
+ ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
+ ```
-2. This step is only needed if reusing the standard character device. It will
- conflict with the kernel vhost character device so the user must first
- remove it.
+ After the DPDK ports get added to switch, a polling thread continuously polls
+ DPDK devices and consumes 100% of the core as can be checked from 'top' and 'ps' cmds.
- `rm -rf /dev/vhost-net`
+ ```
+ top -H
+ ps -eLo pid,psr,comm | grep pmd
+ ```
-3a. Configure virtio-net adaptors:
- The following parameters must be passed to the QEMU binary:
+ Note: creating bonds of DPDK interfaces is slightly different to creating
+ bonds of system interfaces. For DPDK, the interface type must be explicitly
+ set, for example:
```
- -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on
- -device virtio-net-pci,netdev=net1,mac=<mac>
+ ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 -- set Interface dpdk0 type=dpdk -- set Interface dpdk1 type=dpdk
```
- Repeat the above parameters for multiple devices.
+ 6. PMD thread statistics
- The DPDK vhost library will negiotiate its own features, so they
- need not be passed in as command line params. Note that as offloads are
- disabled this is the equivalent of setting:
+ ```
+ # Check current stats
+ ovs-appctl dpif-netdev/pmd-stats-show
- `csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off`
+ # Show port/rxq assignment
+ ovs-appctl dpif-netdev/pmd-rxq-show
-3b. If using an alternative character device. It must be also explicitly
- passed to QEMU using the `vhostfd` argument:
+ # Clear previous stats
+ ovs-appctl dpif-netdev/pmd-stats-clear
+ ```
+
+ 7. Stop vswitchd & Delete bridge
```
- -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on,
- vhostfd=<open_fd>
- -device virtio-net-pci,netdev=net1,mac=<mac>
+ ovs-appctl -t ovs-vswitchd exit
+ ovs-appctl -t ovsdb-server exit
+ ovs-vsctl del-br br0
```
- The open file descriptor must be passed to QEMU running as a child
- process. This could be done with a simple python script.
+## <a name="builddpdk"></a> 4. DPDK in the VM
- ```
- #!/usr/bin/python
- fd = os.open("/dev/usvhost", os.O_RDWR)
- subprocess.call("qemu-system-x86_64 .... -netdev tap,id=vhostnet0,\
- vhost=on,vhostfd=" + fd +"...", shell=True)
+DPDK 'testpmd' application can be run in the Guest VM for high speed
+packet forwarding between vhostuser ports. DPDK and testpmd application
+has to be compiled on the guest VM. Below are the steps for setting up
+the testpmd application in the VM. More information on the vhostuser ports
+can be found in [Vhost Walkthrough].
- Alternatively the `qemu-wrap.py` script can be used to automate the
- requirements specified above and can be used in conjunction with libvirt if
- desired. See the "DPDK vhost VM configuration with QEMU wrapper" section
- below.
+ * Instantiate the Guest
-4. Configure huge pages:
- QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
- virtio-net device's virtual rings and packet buffers mapping the VM's
- physical memory on hugetlbfs. To enable vhost-ports to map the VM's
- memory into their process address space, pass the following parameters
- to QEMU:
+ ```
+ Qemu version >= 2.2.0
- `-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
- share=on -numa node,memdev=mem -mem-prealloc`
+ export VM_NAME=Centos-vm
+ export GUEST_MEM=3072M
+ export QCOW2_IMAGE=/root/CentOS7_x86_64.qcow2
+ export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch
- Note: For use with an earlier QEMU version such as v1.6.2, use the
- following to configure hugepages instead:
+ qemu-system-x86_64 -name $VM_NAME -cpu host -enable-kvm -m $GUEST_MEM -object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on -numa node,memdev=mem -mem-prealloc -smp sockets=1,cores=2 -drive file=$QCOW2_IMAGE -chardev socket,id=char0,path=$VHOST_SOCK_DIR/dpdkvhostuser0 -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser1 -netdev type=vhost-user,id=mynet2,chardev=char1,vhostforce -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic -snapshot
+ ```
- `-mem-path /dev/hugepages -mem-prealloc`
+ * Download the DPDK Srcs to VM and build DPDK
-DPDK vhost-cuse VM configuration with QEMU wrapper:
----------------------------------------------------
-The QEMU wrapper script automatically detects and calls QEMU with the
-necessary parameters. It performs the following actions:
+ ```
+ cd /root/dpdk/
+ wget http://dpdk.org/browse/dpdk/snapshot/dpdk-16.04.zip
+ unzip dpdk-16.04.zip
+ export DPDK_DIR=/root/dpdk/dpdk-16.04
+ export DPDK_TARGET=x86_64-native-linuxapp-gcc
+ export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
+ cd $DPDK_DIR
+ make install T=$DPDK_TARGET DESTDIR=install
+ ```
- * Automatically detects the location of the hugetlbfs and inserts this
- into the command line parameters.
- * Automatically open file descriptors for each virtio-net device and
- inserts this into the command line parameters.
- * Calls QEMU passing both the command line parameters passed to the
- script itself and those it has auto-detected.
+ * Build the test-pmd application
-Before use, you **must** edit the configuration parameters section of the
-script to point to the correct emulator location and set additional
-settings. Of these settings, `emul_path` and `us_vhost_path` **must** be
-set. All other settings are optional.
+ ```
+ cd app/test-pmd
+ export RTE_SDK=$DPDK_DIR
+ export RTE_TARGET=$DPDK_TARGET
+ make
+ ```
-To use directly from the command line simply pass the wrapper some of the
-QEMU parameters: it will configure the rest. For example:
+ * Setup Huge pages and DPDK devices using UIO
-```
-qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4
- --enable-kvm -nographic -vnc none -net none -netdev tap,id=net1,
- script=no,downscript=no,ifname=if1,vhost=on -device virtio-net-pci,
- netdev=net1,mac=00:00:00:00:00:01
-```
+ ```
+ sysctl vm.nr_hugepages=1024
+ mkdir -p /dev/hugepages
+ mount -t hugetlbfs hugetlbfs /dev/hugepages (only if not already mounted)
+ modprobe uio
+ insmod $DPDK_BUILD/kmod/igb_uio.ko
+ $DPDK_DIR/tools/dpdk_nic_bind.py --status
+ $DPDK_DIR/tools/dpdk_nic_bind.py -b igb_uio 00:03.0 00:04.0
+ ```
-DPDK vhost-cuse VM configuration with libvirt:
-----------------------------------------------
+ vhost ports pci ids can be retrieved using `lspci | grep Ethernet` cmd.
-If you are using libvirt, you must enable libvirt to access the character
-device by adding it to controllers cgroup for libvirtd using the following
-steps.
+## <a name="ovstc"></a> 5. OVS Testcases
- 1. In `/etc/libvirt/qemu.conf` add/edit the following lines:
+ Below are few testcases and the list of steps to be followed.
- ```
- 1) clear_emulator_capabilities = 0
- 2) user = "root"
- 3) group = "root"
- 4) cgroup_device_acl = [
- "/dev/null", "/dev/full", "/dev/zero",
- "/dev/random", "/dev/urandom",
- "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
- "/dev/rtc", "/dev/hpet", "/dev/net/tun",
- "/dev/<my-vhost-device>",
- "/dev/hugepages"]
- ```
+### 5.1 PHY-PHY
- <my-vhost-device> refers to "vhost-net" if using the `/dev/vhost-net`
- device. If you have specificed a different name on the ovs-vswitchd
- commandline using the "--cuse_dev_name" parameter, please specify that
- filename instead.
+ The steps (1-5) in 3.3 section will create & initialize DB, start vswitchd and also
+ add DPDK devices to bridge 'br0'.
- 2. Disable SELinux or set to permissive mode
+ 1. Add Test flows to forward packets betwen DPDK port 0 and port 1
- 3. Restart the libvirtd process
- For example, on Fedora:
+ ```
+ # Clear current flows
+ ovs-ofctl del-flows br0
- `systemctl restart libvirtd.service`
+ # Add flows between port 1 (dpdk0) to port 2 (dpdk1)
+ ovs-ofctl add-flow br0 in_port=1,action=output:2
+ ovs-ofctl add-flow br0 in_port=2,action=output:1
+ ```
-After successfully editing the configuration, you may launch your
-vhost-enabled VM. The XML describing the VM can be configured like so
-within the <qemu:commandline> section:
+### 5.2 PHY-VM-PHY [VHOST LOOPBACK]
- 1. Set up shared hugepages:
+ The steps (1-5) in 3.3 section will create & initialize DB, start vswitchd and also
+ add DPDK devices to bridge 'br0'.
- ```
- <qemu:arg value='-object'/>
- <qemu:arg value='memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on'/>
- <qemu:arg value='-numa'/>
- <qemu:arg value='node,memdev=mem'/>
- <qemu:arg value='-mem-prealloc'/>
- ```
+ 1. Add dpdkvhostuser ports to bridge 'br0'. More information on the dpdkvhostuser ports
+ can be found in [Vhost Walkthrough].
- 2. Set up your tap devices:
+ ```
+ ovs-vsctl add-port br0 dpdkvhostuser0 -- set Interface dpdkvhostuser0 type=dpdkvhostuser
+ ovs-vsctl add-port br0 dpdkvhostuser1 -- set Interface dpdkvhostuser1 type=dpdkvhostuser
+ ```
- ```
- <qemu:arg value='-netdev'/>
- <qemu:arg value='type=tap,id=net1,script=no,downscript=no,ifname=vhost0,vhost=on'/>
- <qemu:arg value='-device'/>
- <qemu:arg value='virtio-net-pci,netdev=net1,mac=00:00:00:00:00:01'/>
- ```
+ 2. Add Test flows to forward packets betwen DPDK devices and VM ports
- Repeat for as many devices as are desired, modifying the id, ifname
- and mac as necessary.
+ ```
+ # Clear current flows
+ ovs-ofctl del-flows br0
- Again, if you are using an alternative character device (other than
- `/dev/vhost-net`), please specify the file descriptor like so:
+ # Add flows
+ ovs-ofctl add-flow br0 in_port=1,action=output:3
+ ovs-ofctl add-flow br0 in_port=3,action=output:1
+ ovs-ofctl add-flow br0 in_port=4,action=output:2
+ ovs-ofctl add-flow br0 in_port=2,action=output:4
- `<qemu:arg value='type=tap,id=net3,script=no,downscript=no,ifname=vhost0,vhost=on,vhostfd=<open_fd>'/>`
+ # Dump flows
+ ovs-ofctl dump-flows br0
+ ```
- Where <open_fd> refers to the open file descriptor of the character device.
- Instructions of how to retrieve the file descriptor can be found in the
- "DPDK vhost VM configuration" section.
- Alternatively, the process is automated with the qemu-wrap.py script,
- detailed in the next section.
+ 3. Instantiate Guest VM using Qemu cmdline
-Now you may launch your VM using virt-manager, or like so:
+ Guest Configuration
- `virsh create my_vhost_vm.xml`
+ ```
+ | configuration | values | comments
+ |----------------------|--------|-----------------
+ | qemu version | 2.2.0 |
+ | qemu thread affinity | core 5 | taskset 0x20
+ | memory | 4GB | -
+ | cores | 2 | -
+ | Qcow2 image | CentOS7| -
+ | mrg_rxbuf | off | -
+ ```
-DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper:
-----------------------------------------------------------
+ Instantiate Guest
-To use the qemu-wrapper script in conjuntion with libvirt, follow the
-steps in the previous section before proceeding with the following steps:
+ ```
+ export VM_NAME=vhost-vm
+ export GUEST_MEM=3072M
+ export QCOW2_IMAGE=/root/CentOS7_x86_64.qcow2
+ export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch
- 1. Place `qemu-wrap.py` in libvirtd's binary search PATH ($PATH)
- Ideally in the same directory that the QEMU binary is located.
+ taskset 0x20 qemu-system-x86_64 -name $VM_NAME -cpu host -enable-kvm -m $GUEST_MEM -object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on -numa node,memdev=mem -mem-prealloc -smp sockets=1,cores=2 -drive file=$QCOW2_IMAGE -chardev socket,id=char0,path=$VHOST_SOCK_DIR/dpdkvhostuser0 -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser1 -netdev type=vhost-user,id=mynet2,chardev=char1,vhostforce -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic -snapshot
+ ```
- 2. Ensure that the script has the same owner/group and file permissions
- as the QEMU binary.
+ 4. Guest VM using libvirt
- 3. Update the VM xml file using "virsh edit VM.xml"
+ The below is a simple xml configuration of 'demovm' guest that can be instantiated
+ using 'virsh'. The guest uses a pair of vhostuser port and boots with 4GB RAM and 2 cores.
+ More information can be found in [Vhost Walkthrough].
- 1. Set the VM to use the launch script.
- Set the emulator path contained in the `<emulator><emulator/>` tags.
- For example, replace:
+ ```
+ <domain type='kvm'>
+ <name>demovm</name>
+ <uuid>4a9b3f53-fa2a-47f3-a757-dd87720d9d1d</uuid>
+ <memory unit='KiB'>4194304</memory>
+ <currentMemory unit='KiB'>4194304</currentMemory>
+ <memoryBacking>
+ <hugepages>
+ <page size='2' unit='M' nodeset='0'/>
+ </hugepages>
+ </memoryBacking>
+ <vcpu placement='static'>2</vcpu>
+ <cputune>
+ <shares>4096</shares>
+ <vcpupin vcpu='0' cpuset='4'/>
+ <vcpupin vcpu='1' cpuset='5'/>
+ <emulatorpin cpuset='4,5'/>
+ </cputune>
+ <os>
+ <type arch='x86_64' machine='pc'>hvm</type>
+ <boot dev='hd'/>
+ </os>
+ <features>
+ <acpi/>
+ <apic/>
+ </features>
+ <cpu mode='host-model'>
+ <model fallback='allow'/>
+ <topology sockets='2' cores='1' threads='1'/>
+ <numa>
+ <cell id='0' cpus='0-1' memory='4194304' unit='KiB' memAccess='shared'/>
+ </numa>
+ </cpu>
+ <on_poweroff>destroy</on_poweroff>
+ <on_reboot>restart</on_reboot>
+ <on_crash>destroy</on_crash>
+ <devices>
+ <emulator>/usr/bin/qemu-kvm</emulator>
+ <disk type='file' device='disk'>
+ <driver name='qemu' type='qcow2' cache='none'/>
+ <source file='/root/CentOS7_x86_64.qcow2'/>
+ <target dev='vda' bus='virtio'/>
+ </disk>
+ <disk type='dir' device='disk'>
+ <driver name='qemu' type='fat'/>
+ <source dir='/usr/src/dpdk-16.04'/>
+ <target dev='vdb' bus='virtio'/>
+ <readonly/>
+ </disk>
+ <interface type='vhostuser'>
+ <mac address='00:00:00:00:00:01'/>
+ <source type='unix' path='/usr/local/var/run/openvswitch/dpdkvhostuser0' mode='client'/>
+ <model type='virtio'/>
+ <driver queues='2'>
+ <host mrg_rxbuf='off'/>
+ </driver>
+ </interface>
+ <interface type='vhostuser'>
+ <mac address='00:00:00:00:00:02'/>
+ <source type='unix' path='/usr/local/var/run/openvswitch/dpdkvhostuser1' mode='client'/>
+ <model type='virtio'/>
+ <driver queues='2'>
+ <host mrg_rxbuf='off'/>
+ </driver>
+ </interface>
+ <serial type='pty'>
+ <target port='0'/>
+ </serial>
+ <console type='pty'>
+ <target type='serial' port='0'/>
+ </console>
+ </devices>
+ </domain>
+ ```
- `<emulator>/usr/bin/qemu-kvm<emulator/>`
+ 5. DPDK Packet forwarding in Guest VM
- with:
+ To accomplish this, DPDK and testpmd application have to be first compiled
+ on the VM and the steps are listed in [DPDK in the VM].
- `<emulator>/usr/bin/qemu-wrap.py<emulator/>`
+ * Run test-pmd application
- 4. Edit the Configuration Parameters section of the script to point to
- the correct emulator location and set any additional options. If you are
- using a alternative character device name, please set "us_vhost_path" to the
- location of that device. The script will automatically detect and insert
- the correct "vhostfd" value in the QEMU command line arguments.
+ ```
+ cd $DPDK_DIR/app/test-pmd;
+ ./testpmd -c 0x3 -n 4 --socket-mem 1024 -- --burst=64 -i --txqflags=0xf00 --disable-hw-vlan
+ set fwd mac_retry
+ start
+ ```
- 5. Use virt-manager to launch the VM
+ * Bind vNIC back to kernel once the test is completed.
-Running ovs-vswitchd with DPDK backend inside a VM
---------------------------------------------------
+ ```
+ $DPDK_DIR/tools/dpdk_nic_bind.py --bind=virtio-pci 0000:00:03.0
+ $DPDK_DIR/tools/dpdk_nic_bind.py --bind=virtio-pci 0000:00:04.0
+ ```
+ Note: Appropriate PCI IDs to be passed in above example. The PCI IDs can be
+ retrieved using '$DPDK_DIR/tools/dpdk_nic_bind.py --status' cmd.
-Please note that additional configuration is required if you want to run
-ovs-vswitchd with DPDK backend inside a QEMU virtual machine. Ovs-vswitchd
-creates separate DPDK TX queues for each CPU core available. This operation
-fails inside QEMU virtual machine because, by default, VirtIO NIC provided
-to the guest is configured to support only single TX queue and single RX
-queue. To change this behavior, you need to turn on 'mq' (multiqueue)
-property of all virtio-net-pci devices emulated by QEMU and used by DPDK.
-You may do it manually (by changing QEMU command line) or, if you use Libvirt,
-by adding the following string:
+### 5.3 PHY-VM-PHY [IVSHMEM]
-`<driver name='vhost' queues='N'/>`
+ The steps for setup of IVSHMEM are covered in section 5.2(PVP - IVSHMEM)
+ of [OVS Testcases] in ADVANCED install guide.
-to <interface> sections of all network devices used by DPDK. Parameter 'N'
-determines how many queues can be used by the guest.
+## <a name="ovslimits"></a> 6. Limitations
-Restrictions:
--------------
+ - Supports MTU size 1500, MTU setting for DPDK netdevs will be in future OVS release.
+ - Currently DPDK ports does not use HW offload functionality.
+ - Network Interface Firmware requirements:
+ Each release of DPDK is validated against a specific firmware version for
+ a supported Network Interface. New firmware versions introduce bug fixes,
+ performance improvements and new functionality that DPDK leverages. The
+ validated firmware versions are available as part of the release notes for
+ DPDK. It is recommended that users update Network Interface firmware to
+ match what has been validated for the DPDK release.
- - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
- - Currently DPDK port does not make use any offload functionality.
- - DPDK-vHost support works with 1G huge pages.
+ For DPDK 16.04, the list of validated firmware versions can be found at:
- ivshmem:
- - If you run Open vSwitch with smaller page sizes (e.g. 2MB), you may be
- unable to share any rings or mempools with a virtual machine.
- This is because the current implementation of ivshmem works by sharing
- a single 1GB huge page from the host operating system to any guest
- operating system through the Qemu ivshmem device. When using smaller
- page sizes, multiple pages may be required to hold the ring descriptors
- and buffer pools. The Qemu ivshmem device does not allow you to share
- multiple file descriptors to the guest operating system. However, if you
- want to share dpdkr rings with other processes on the host, you can do
- this with smaller page sizes.
+ http://dpdk.org/doc/guides/rel_notes/release_16_04.html
- Platform and Network Interface:
- - Currently it is not possible to use an Intel XL710 Network Interface as a
- DPDK port type on a platform with more than 64 logical cores. This is
- related to how DPDK reports the number of TX queues that may be used by
- a DPDK application with an XL710. The maximum number of TX queues supported
- by a DPDK application for an XL710 is 64. If a user attempts to add an
- XL710 interface as a DPDK port type to a system as described above the
- port addition will fail as OVS will attempt to initialize a TX queue greater
- than 64. This issue is expected to be resolved in a future DPDK release.
- As a workaround a user can disable hyper-threading to reduce the overall
- core count of the system to be less than or equal to 64 when using an XL710
- interface with DPDK.
Bug Reporting:
--------------
Please report problems to bugs@openvswitch.org.
-[INSTALL.userspace.md]:INSTALL.userspace.md
+
+[DPDK requirements]: http://dpdk.org/doc/guides/linux_gsg/sys_reqs.html
+[Download DPDK]: http://dpdk.org/browse/dpdk/refs/
+[Download OVS]: http://openvswitch.org/releases/
+[DPDK Supported NICs]: http://dpdk.org/doc/nics
+[Build Requirements]: https://github.com/openvswitch/ovs/blob/master/INSTALL.md#build-requirements
+[INSTALL.DPDK-ADVANCED.md]: INSTALL.DPDK-ADVANCED.md
+[OVS Testcases]: INSTALL.DPDK-ADVANCED.md#ovstc
+[Vhost Walkthrough]: INSTALL.DPDK-ADVANCED.md#vhost
+[DPDK in the VM]: INSTALL.DPDK.md#builddpdk
[INSTALL.md]:INSTALL.md
-[DPDK Linux GSG]: http://www.dpdk.org/doc/guides/linux_gsg/build_dpdk.html#binding-and-unbinding-network-ports-to-from-the-igb-uioor-vfio-modules
-[DPDK Docs]: http://dpdk.org/doc
+[INSTALL.Fedora.md]:INSTALL.Fedora.md
+[INSTALL.RHEL.md]:INSTALL.RHEL.md
+[INSTALL.Debian.md]:INSTALL.Debian.md