X-Git-Url: http://git.cascardo.eti.br/?a=blobdiff_plain;f=INSTALL.DPDK.md;h=54077947852509f61bb2f27cc516d55f0ce37fab;hb=refs%2Fheads%2Fnetdev;hp=a5b3494349df9500c98af63c7c9396ae353308c8;hpb=dbde55e7fa21881af18a48502c91168be269482a;p=cascardo%2Fovs.git diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index a5b349434..540779478 100644 --- a/INSTALL.DPDK.md +++ b/INSTALL.DPDK.md @@ -1,571 +1,583 @@ -Using Open vSwitch with DPDK -============================ +OVS DPDK INSTALL GUIDE +================================ -Open vSwitch can use Intel(R) DPDK lib to operate entirely in -userspace. This file explains how to install and use Open vSwitch in -such a mode. +## Contents -The DPDK support of Open vSwitch is considered experimental. -It has not been thoroughly tested. +1. [Overview](#overview) +2. [Building and Installation](#build) +3. [Setup OVS DPDK datapath](#ovssetup) +4. [DPDK in the VM](#builddpdk) +5. [OVS Testcases](#ovstc) +6. [Limitations ](#ovslimits) -This version of Open vSwitch should be built manually with `configure` -and `make`. +## 1. Overview -OVS needs a system with 1GB hugepages support. +Open vSwitch can use DPDK lib to operate entirely in userspace. +This file provides information on installation and use of Open vSwitch +using DPDK datapath. This version of Open vSwitch should be built manually +with `configure` and `make`. -Building and Installing: ------------------------- +The DPDK support of Open vSwitch is considered 'experimental'. -Required DPDK 1.8.0, `fuse`, `fuse-devel` (`libfuse-dev` on Debian/Ubuntu) +### Prerequisites -1. Configure build & install DPDK: - 1. Set `$DPDK_DIR` +* Required: DPDK 16.04, libnuma +* Hardware: [DPDK Supported NICs] when physical ports in use - ``` - export DPDK_DIR=/usr/src/dpdk-1.8.0 - cd $DPDK_DIR - ``` - - 2. Update `config/common_linuxapp` so that DPDK generate single lib file. - (modification also required for IVSHMEM build) - - `CONFIG_RTE_BUILD_COMBINE_LIBS=y` - - Update `config/common_linuxapp` so that DPDK is built with vhost - libraries: - - `CONFIG_RTE_LIBRTE_VHOST=y` - - Then run `make install` to build and install the library. - For default install without IVSHMEM: - - `make install T=x86_64-native-linuxapp-gcc` - - To include IVSHMEM (shared memory): - - `make install T=x86_64-ivshmem-linuxapp-gcc` - - For further details refer to http://dpdk.org/ - -2. Configure & build the Linux kernel: - - Refer to intel-dpdk-getting-started-guide.pdf for understanding - DPDK kernel requirement. - -3. Configure & build OVS: - - * Non IVSHMEM: - - `export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/` - - * IVSHMEM: - - `export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/` - - ``` - cd $(OVS_DIR)/openvswitch - ./boot.sh - ./configure --with-dpdk=$DPDK_BUILD - make - ``` - -To have better performance one can enable aggressive compiler optimizations and -use the special instructions(popcnt, crc32) that may not be available on all -machines. Instead of typing `make`, type: - -`make CFLAGS='-O3 -march=native'` +## 2. Building and Installation -Refer to [INSTALL.userspace.md] for general requirements of building userspace OVS. +### 2.1 Configure & build the Linux kernel -Using the DPDK with ovs-vswitchd: ---------------------------------- +On Linux Distros running kernel version >= 3.0, kernel rebuild is not required +and only grub cmdline needs to be updated for enabling IOMMU [VFIO support - 3.2]. +For older kernels, check if kernel is built with UIO, HUGETLBFS, PROC_PAGE_MONITOR, +HPET, HPET_MMAP support. -1. Setup system boot - Add the following options to the kernel bootline: - - `default_hugepagesz=1GB hugepagesz=1G hugepages=1` +Detailed system requirements can be found at [DPDK requirements] and also refer to +advanced install guide [INSTALL.DPDK-ADVANCED.md] -2. Setup DPDK devices: +### 2.2 Install DPDK + 1. [Download DPDK] and extract the file, for example in to /usr/src + and set DPDK_DIR - DPDK devices can be setup using either the VFIO (for DPDK 1.7+) or UIO - modules. UIO requires inserting an out of tree driver igb_uio.ko that is - available in DPDK. Setup for both methods are described below. - - * UIO: - 1. insert uio.ko: `modprobe uio` - 2. insert igb_uio.ko: `insmod $DPDK_BUILD/kmod/igb_uio.ko` - 3. Bind network device to igb_uio: - `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1` - - * VFIO: - - VFIO needs to be supported in the kernel and the BIOS. More information - can be found in the [DPDK Linux GSG]. - - 1. Insert vfio-pci.ko: `modprobe vfio-pci` - 2. Set correct permissions on vfio device: `sudo /usr/bin/chmod a+x /dev/vfio` - and: `sudo /usr/bin/chmod 0666 /dev/vfio/*` - 3. Bind network device to vfio-pci: - `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1` - -3. Mount the hugetable filsystem - - `mount -t hugetlbfs -o pagesize=1G none /dev/hugepages` - - Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup. + ``` + cd /usr/src/ + wget http://dpdk.org/browse/dpdk/snapshot/dpdk-16.04.zip + unzip dpdk-16.04.zip -4. Follow the instructions in [INSTALL.md] to install only the - userspace daemons and utilities (via 'make install'). - 1. First time only db creation (or clearing): + export DPDK_DIR=/usr/src/dpdk-16.04 + cd $DPDK_DIR + ``` - ``` - mkdir -p /usr/local/etc/openvswitch - mkdir -p /usr/local/var/run/openvswitch - rm /usr/local/etc/openvswitch/conf.db - ovsdb-tool create /usr/local/etc/openvswitch/conf.db \ - /usr/local/share/openvswitch/vswitch.ovsschema - ``` + 2. Configure and Install DPDK - 2. Start ovsdb-server + Build and install the DPDK library. - ``` - ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ - --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ - --private-key=db:Open_vSwitch,SSL,private_key \ - --certificate=Open_vSwitch,SSL,certificate \ - --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach - ``` + ``` + export DPDK_TARGET=x86_64-native-linuxapp-gcc + export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET + make install T=$DPDK_TARGET DESTDIR=install + ``` - 3. First time after db creation, initialize: + Note: For IVSHMEM, Set `export DPDK_TARGET=x86_64-ivshmem-linuxapp-gcc` - ``` - ovs-vsctl --no-wait init - ``` +### 2.3 Install OVS + OVS can be installed using different methods. For OVS to use DPDK datapath, + it has to be configured with DPDK support and is done by './configure --with-dpdk'. + This section focus on generic recipe that suits most cases and for distribution + specific instructions, refer [INSTALL.Fedora.md], [INSTALL.RHEL.md] and + [INSTALL.Debian.md]. -5. Start vswitchd: + The OVS sources can be downloaded in different ways and skip this section + if already having the correct sources. Otherwise download the correct version using + one of the below suggested methods and follow the documentation of that specific + version. - DPDK configuration arguments can be passed to vswitchd via `--dpdk` - argument. This needs to be first argument passed to vswitchd process. - dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter - for dpdk initialization. + - OVS stable releases can be downloaded in compressed format from [Download OVS] - ``` - export DB_SOCK=/usr/local/var/run/openvswitch/db.sock - ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach - ``` + ``` + cd /usr/src + wget http://openvswitch.org/releases/openvswitch-.tar.gz + tar -zxvf openvswitch-.tar.gz + export OVS_DIR=/usr/src/openvswitch- + ``` - If allocated more than one GB hugepage (as for IVSHMEM), set amount and - use NUMA node 0 memory: + - OVS current development can be clone using 'git' tool - ``` - ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \ - -- unix:$DB_SOCK --pidfile --detach - ``` + ``` + cd /usr/src/ + git clone https://github.com/openvswitch/ovs.git + export OVS_DIR=/usr/src/ovs + ``` -6. Add bridge & ports + - Install OVS dependencies - To use ovs-vswitchd with DPDK, create a bridge with datapath_type - "netdev" in the configuration database. For example: + GNU make, GCC 4.x (or) Clang 3.4, libnuma (Mandatory) + libssl, libcap-ng, Python 2.7 (Optional) + More information can be found at [Build Requirements] - `ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev` + - Configure, Install OVS - Now you can add dpdk devices. OVS expect DPDK device name start with dpdk - and end with portid. vswitchd should print (in the log file) the number - of dpdk devices found. + ``` + cd $OVS_DIR + ./boot.sh + ./configure --with-dpdk=$DPDK_BUILD + make install + ``` - ``` - ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk - ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk - ``` + Note: Passing DPDK_BUILD can be skipped if DPDK library is installed in + standard locations i.e `./configure --with-dpdk` should suffice. - Once first DPDK port is added to vswitchd, it creates a Polling thread and - polls dpdk device in continuous loop. Therefore CPU utilization - for that thread is always 100%. + Additional information can be found in [INSTALL.md]. -7. Add test flows +## 3. Setup OVS with DPDK datapath - Test flow script across NICs (assuming ovs in /usr/src/ovs): - Execute script: +### 3.1 Setup Hugepages - ``` - #! /bin/sh - # Move to command directory - cd /usr/src/ovs/utilities/ + Allocate and mount 2M Huge pages: - # Clear current flows - ./ovs-ofctl del-flows br0 + - For persistent allocation of huge pages, write to hugepages.conf file + in /etc/sysctl.d - # Add flows between port 1 (dpdk0) to port 2 (dpdk1) - ./ovs-ofctl add-flow br0 in_port=1,action=output:2 - ./ovs-ofctl add-flow br0 in_port=2,action=output:1 - ``` + `echo 'vm.nr_hugepages=2048' > /etc/sysctl.d/hugepages.conf` -8. Performance tuning + - For run-time allocation of huge pages - With pmd multi-threading support, OVS creates one pmd thread for each - numa node as default. The pmd thread handles the I/O of all DPDK - interfaces on the same numa node. The following two commands can be used - to configure the multi-threading behavior. + `sysctl -w vm.nr_hugepages=N` where N = No. of 2M huge pages allocated - `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=` + - To verify hugepage configuration - The command above asks for a CPU mask for setting the affinity of pmd - threads. A set bit in the mask means a pmd thread is created and pinned - to the corresponding CPU core. For more information, please refer to - `man ovs-vswitchd.conf.db` + `grep HugePages_ /proc/meminfo` - `ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=` + - Mount hugepages - The command above sets the number of rx queues of each DPDK interface. The - rx queues are assigned to pmd threads on the same numa node in round-robin - fashion. For more information, please refer to `man ovs-vswitchd.conf.db` + `mount -t hugetlbfs none /dev/hugepages` - Ideally for maximum throughput, the pmd thread should not be scheduled out - which temporarily halts its execution. The following affinitization methods - can help. + Note: Mount hugepages if not already mounted by default. - Lets pick core 4,6,8,10 for pmd threads to run on. Also assume a dual 8 core - sandy bridge system with hyperthreading enabled where CPU1 has cores 0,...,7 - and 16,...,23 & CPU2 cores 8,...,15 & 24,...,31. (A different cpu - configuration could have different core mask requirements). +### 3.2 Setup DPDK devices using VFIO - To kernel bootline add core isolation list for cores and associated hype cores - (e.g. isolcpus=4,20,6,22,8,24,10,26,). Reboot system for isolation to take - effect, restart everything. + - Supported with kernel version >= 3.6 + - VFIO needs support from BIOS and kernel. + - BIOS changes: - Configure pmd threads on core 4,6,8,10 using 'pmd-cpu-mask': + Enable VT-d, can be verified from `dmesg | grep -e DMAR -e IOMMU` output - `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=00000550` + - GRUB bootline: - You should be able to check that pmd threads are pinned to the correct cores - via: + Add `iommu=pt intel_iommu=on`, can be verified from `cat /proc/cmdline` output - ``` - top -p `pidof ovs-vswitchd` -H -d1 - ``` + - Load modules and bind the NIC to VFIO driver - Note, the pmd threads on a numa node are only created if there is at least - one DPDK interface from the numa node that has been added to OVS. + ``` + modprobe vfio-pci + sudo /usr/bin/chmod a+x /dev/vfio + sudo /usr/bin/chmod 0666 /dev/vfio/* + $DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1 + $DPDK_DIR/tools/dpdk_nic_bind.py --status + ``` - Note, core 0 is always reserved from non-pmd threads and should never be set - in the cpu mask. + Note: If running kernels < 3.6 UIO drivers to be used, + please check [DPDK in the VM], DPDK devices using UIO section for the steps. - To understand where most of the time is spent and whether the caches are - effective, these commands can be used: +### 3.3 Setup OVS - ``` - ovs-appctl dpif-netdev/pmd-stats-clear #To reset statistics - ovs-appctl dpif-netdev/pmd-stats-show - ``` + 1. DB creation (One time step) -DPDK Rings : ------------- + ``` + mkdir -p /usr/local/etc/openvswitch + mkdir -p /usr/local/var/run/openvswitch + rm /usr/local/etc/openvswitch/conf.db + ovsdb-tool create /usr/local/etc/openvswitch/conf.db \ + /usr/local/share/openvswitch/vswitch.ovsschema + ``` -Following the steps above to create a bridge, you can now add dpdk rings -as a port to the vswitch. OVS will expect the DPDK ring device name to -start with dpdkr and end with a portid. + 2. Start ovsdb-server -`ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr` + No SSL support -DPDK rings client test application + ``` + ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ + --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ + --pidfile --detach + ``` -Included in the test directory is a sample DPDK application for testing -the rings. This is from the base dpdk directory and modified to work -with the ring naming used within ovs. + SSL support -location tests/ovs_client + ``` + ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ + --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ + --private-key=db:Open_vSwitch,SSL,private_key \ + --certificate=Open_vSwitch,SSL,certificate \ + --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach + ``` -To run the client : + 3. Initialize DB (One time step) -``` -cd /usr/src/ovs/tests/ -ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr" -``` + ``` + ovs-vsctl --no-wait init + ``` -In the case of the dpdkr example above the "port id you gave dpdkr" is 0. + 4. Start vswitchd -It is essential to have --proc-type=secondary + DPDK configuration arguments can be passed to vswitchd via Open_vSwitch + 'other_config' column. The important configuration options are listed below. + Defaults will be provided for all values not explicitly set. Refer + ovs-vswitchd.conf.db(5) for additional information on configuration options. -The application simply receives an mbuf on the receive queue of the -ethernet ring and then places that same mbuf on the transmit ring of -the ethernet ring. It is a trivial loopback application. + * dpdk-init + Specifies whether OVS should initialize and support DPDK ports. This is + a boolean, and defaults to false. -DPDK rings in VM (IVSHMEM shared memory communications) -------------------------------------------------------- + * dpdk-lcore-mask + Specifies the CPU cores on which dpdk lcore threads should be spawned and + expects hex string (eg '0x123'). -In addition to executing the client in the host, you can execute it within -a guest VM. To do so you will need a patched qemu. You can download the -patch and getting started guide at : + * dpdk-socket-mem + Comma separated list of memory to pre-allocate from hugepages on specific + sockets. -https://01.org/packet-processing/downloads + * dpdk-hugepage-dir + Directory where hugetlbfs is mounted -A general rule of thumb for better performance is that the client -application should not be assigned the same dpdk core mask "-c" as -the vswitchd. + * vhost-sock-dir + Option to set the path to the vhost_user unix socket files. -DPDK vhost: ------------ + NOTE: Changing any of these options requires restarting the ovs-vswitchd + application. -vhost-cuse is only supported at present i.e. not using the standard QEMU -vhost-user interface. It is intended that vhost-user support will be added -in future releases when supported in DPDK and that vhost-cuse will eventually -be deprecated. See [DPDK Docs] for more info on vhost. + Open vSwitch can be started as normal. DPDK will be initialized as long + as the dpdk-init option has been set to 'true'. -Prerequisites: -1. Insert the Cuse module: + ``` + export DB_SOCK=/usr/local/var/run/openvswitch/db.sock + ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true + ovs-vswitchd unix:$DB_SOCK --pidfile --detach + ``` - `modprobe cuse` + If allocated more than one GB hugepage (as for IVSHMEM), set amount and + use NUMA node 0 memory. For details on using ivshmem with DPDK, refer to + [OVS Testcases]. -2. Build and insert the `eventfd_link` module: + ``` + ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,0" + ovs-vswitchd unix:$DB_SOCK --pidfile --detach + ``` - `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/` - `make` - `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko` + To better scale the work loads across cores, Multiple pmd threads can be + created and pinned to CPU cores by explicity specifying pmd-cpu-mask. + eg: To spawn 2 pmd threads and pin them to cores 1, 2 -Following the steps above to create a bridge, you can now add DPDK vhost -as a port to the vswitch. + ``` + ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6 + ``` -`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost` + 5. Create bridge & add DPDK devices -Unlike DPDK ring ports, DPDK vhost ports can have arbitrary names: + create a bridge with datapath_type "netdev" in the configuration database -`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost` + `ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev` -However, please note that when attaching userspace devices to QEMU, the -name provided during the add-port operation must match the ifname parameter -on the QEMU command line. + Now you can add DPDK devices. OVS expects DPDK device names to start with + "dpdk" and end with a portid. vswitchd should print (in the log file) the + number of dpdk devices found. + ``` + ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk + ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk + ``` -DPDK vhost VM configuration: ----------------------------- + After the DPDK ports get added to switch, a polling thread continuously polls + DPDK devices and consumes 100% of the core as can be checked from 'top' and 'ps' cmds. - vhost ports use a Linux* character device to communicate with QEMU. - By default it is set to `/dev/vhost-net`. It is possible to reuse this - standard device for DPDK vhost, which makes setup a little simpler but it - is better practice to specify an alternative character device in order to - avoid any conflicts if kernel vhost is to be used in parallel. + ``` + top -H + ps -eLo pid,psr,comm | grep pmd + ``` -1. This step is only needed if using an alternative character device. + Note: creating bonds of DPDK interfaces is slightly different to creating + bonds of system interfaces. For DPDK, the interface type must be explicitly + set, for example: - The new character device filename must be specified on the vswitchd - commandline: + ``` + ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 -- set Interface dpdk0 type=dpdk -- set Interface dpdk1 type=dpdk + ``` - `./vswitchd/ovs-vswitchd --dpdk --cuse_dev_name my-vhost-net -c 0x1 ...` + 6. PMD thread statistics - Note that the `--cuse_dev_name` argument and associated string must be the first - arguments after `--dpdk` and come before the EAL arguments. In the example - above, the character device to be used will be `/dev/my-vhost-net`. + ``` + # Check current stats + ovs-appctl dpif-netdev/pmd-stats-show -2. This step is only needed if reusing the standard character device. It will - conflict with the kernel vhost character device so the user must first - remove it. + # Show port/rxq assignment + ovs-appctl dpif-netdev/pmd-rxq-show - `rm -rf /dev/vhost-net` + # Clear previous stats + ovs-appctl dpif-netdev/pmd-stats-clear + ``` -3a. Configure virtio-net adaptors: - The following parameters must be passed to the QEMU binary: + 7. Stop vswitchd & Delete bridge ``` - -netdev tap,id=,script=no,downscript=no,ifname=,vhost=on - -device virtio-net-pci,netdev=net1,mac= + ovs-appctl -t ovs-vswitchd exit + ovs-appctl -t ovsdb-server exit + ovs-vsctl del-br br0 ``` - Repeat the above parameters for multiple devices. +## 4. DPDK in the VM - The DPDK vhost library will negiotiate its own features, so they - need not be passed in as command line params. Note that as offloads are - disabled this is the equivalent of setting: +DPDK 'testpmd' application can be run in the Guest VM for high speed +packet forwarding between vhostuser ports. DPDK and testpmd application +has to be compiled on the guest VM. Below are the steps for setting up +the testpmd application in the VM. More information on the vhostuser ports +can be found in [Vhost Walkthrough]. - `csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off` + * Instantiate the Guest -3b. If using an alternative character device. It must be also explicitly - passed to QEMU using the `vhostfd` argument: + ``` + Qemu version >= 2.2.0 - ``` - -netdev tap,id=,script=no,downscript=no,ifname=,vhost=on, - vhostfd= - -device virtio-net-pci,netdev=net1,mac= - ``` + export VM_NAME=Centos-vm + export GUEST_MEM=3072M + export QCOW2_IMAGE=/root/CentOS7_x86_64.qcow2 + export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch - The open file descriptor must be passed to QEMU running as a child - process. This could be done with a simple python script. + qemu-system-x86_64 -name $VM_NAME -cpu host -enable-kvm -m $GUEST_MEM -object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on -numa node,memdev=mem -mem-prealloc -smp sockets=1,cores=2 -drive file=$QCOW2_IMAGE -chardev socket,id=char0,path=$VHOST_SOCK_DIR/dpdkvhostuser0 -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser1 -netdev type=vhost-user,id=mynet2,chardev=char1,vhostforce -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic -snapshot + ``` - ``` - #!/usr/bin/python - fd = os.open("/dev/usvhost", os.O_RDWR) - subprocess.call("qemu-system-x86_64 .... -netdev tap,id=vhostnet0,\ - vhost=on,vhostfd=" + fd +"...", shell=True) + * Download the DPDK Srcs to VM and build DPDK - Alternatively the the `qemu-wrap.py` script can be used to automate the - requirements specified above and can be used in conjunction with libvirt if - desired. See the "DPDK vhost VM configuration with QEMU wrapper" section - below. + ``` + cd /root/dpdk/ + wget http://dpdk.org/browse/dpdk/snapshot/dpdk-16.04.zip + unzip dpdk-16.04.zip + export DPDK_DIR=/root/dpdk/dpdk-16.04 + export DPDK_TARGET=x86_64-native-linuxapp-gcc + export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET + cd $DPDK_DIR + make install T=$DPDK_TARGET DESTDIR=install + ``` -4. Configure huge pages: - QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a - virtio-net device's virtual rings and packet buffers mapping the VM's - physical memory on hugetlbfs. To enable vhost-ports to map the VM's - memory into their process address space, pass the following paramters - to QEMU: + * Build the test-pmd application - `-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages, - share=on -numa node,memdev=mem -mem-prealloc` + ``` + cd app/test-pmd + export RTE_SDK=$DPDK_DIR + export RTE_TARGET=$DPDK_TARGET + make + ``` + * Setup Huge pages and DPDK devices using UIO -DPDK vhost VM configuration with QEMU wrapper: ----------------------------------------------- + ``` + sysctl vm.nr_hugepages=1024 + mkdir -p /dev/hugepages + mount -t hugetlbfs hugetlbfs /dev/hugepages (only if not already mounted) + modprobe uio + insmod $DPDK_BUILD/kmod/igb_uio.ko + $DPDK_DIR/tools/dpdk_nic_bind.py --status + $DPDK_DIR/tools/dpdk_nic_bind.py -b igb_uio 00:03.0 00:04.0 + ``` -The QEMU wrapper script automatically detects and calls QEMU with the -necessary parameters. It performs the following actions: + vhost ports pci ids can be retrieved using `lspci | grep Ethernet` cmd. - * Automatically detects the location of the hugetlbfs and inserts this - into the command line parameters. - * Automatically open file descriptors for each virtio-net device and - inserts this into the command line parameters. - * Calls QEMU passing both the command line parameters passed to the - script itself and those it has auto-detected. +## 5. OVS Testcases -Before use, you **must** edit the configuration parameters section of the -script to point to the correct emulator location and set additional -settings. Of these settings, `emul_path` and `us_vhost_path` **must** be -set. All other settings are optional. + Below are few testcases and the list of steps to be followed. -To use directly from the command line simply pass the wrapper some of the -QEMU parameters: it will configure the rest. For example: +### 5.1 PHY-PHY -``` -qemu-wrap.py -cpu host -boot c -hda -m 4096 -smp 4 - --enable-kvm -nographic -vnc none -net none -netdev tap,id=net1, - script=no,downscript=no,ifname=if1,vhost=on -device virtio-net-pci, - netdev=net1,mac=00:00:00:00:00:01 -``` + The steps (1-5) in 3.3 section will create & initialize DB, start vswitchd and also + add DPDK devices to bridge 'br0'. -DPDK vhost VM configuration with libvirt: ------------------------------------------ + 1. Add Test flows to forward packets betwen DPDK port 0 and port 1 -If you are using libvirt, you must enable libvirt to access the character -device by adding it to controllers cgroup for libvirtd using the following -steps. + ``` + # Clear current flows + ovs-ofctl del-flows br0 - 1. In `/etc/libvirt/qemu.conf` add/edit the following lines: + # Add flows between port 1 (dpdk0) to port 2 (dpdk1) + ovs-ofctl add-flow br0 in_port=1,action=output:2 + ovs-ofctl add-flow br0 in_port=2,action=output:1 + ``` - ``` - 1) clear_emulator_capabilities = 0 - 2) user = "root" - 3) group = "root" - 4) cgroup_device_acl = [ - "/dev/null", "/dev/full", "/dev/zero", - "/dev/random", "/dev/urandom", - "/dev/ptmx", "/dev/kvm", "/dev/kqemu", - "/dev/rtc", "/dev/hpet", "/dev/net/tun", - "/dev/", - "/dev/hugepages"] - ``` +### 5.2 PHY-VM-PHY [VHOST LOOPBACK] - refers to "vhost-net" if using the `/dev/vhost-net` - device. If you have specificed a different name on the ovs-vswitchd - commandline using the "--cuse_dev_name" parameter, please specify that - filename instead. + The steps (1-5) in 3.3 section will create & initialize DB, start vswitchd and also + add DPDK devices to bridge 'br0'. - 2. Disable SELinux or set to permissive mode + 1. Add dpdkvhostuser ports to bridge 'br0'. More information on the dpdkvhostuser ports + can be found in [Vhost Walkthrough]. - 3. Restart the libvirtd process - For example, on Fedora: + ``` + ovs-vsctl add-port br0 dpdkvhostuser0 -- set Interface dpdkvhostuser0 type=dpdkvhostuser + ovs-vsctl add-port br0 dpdkvhostuser1 -- set Interface dpdkvhostuser1 type=dpdkvhostuser + ``` - `systemctl restart libvirtd.service` + 2. Add Test flows to forward packets betwen DPDK devices and VM ports -After successfully editing the configuration, you may launch your -vhost-enabled VM. The XML describing the VM can be configured like so -within the section: + ``` + # Clear current flows + ovs-ofctl del-flows br0 - 1. Set up shared hugepages: + # Add flows + ovs-ofctl add-flow br0 in_port=1,action=output:3 + ovs-ofctl add-flow br0 in_port=3,action=output:1 + ovs-ofctl add-flow br0 in_port=4,action=output:2 + ovs-ofctl add-flow br0 in_port=2,action=output:4 - ``` - - - - - - ``` + # Dump flows + ovs-ofctl dump-flows br0 + ``` - 2. Set up your tap devices: + 3. Instantiate Guest VM using Qemu cmdline - ``` - - - - - ``` + Guest Configuration - Repeat for as many devices as are desired, modifying the id, ifname - and mac as necessary. + ``` + | configuration | values | comments + |----------------------|--------|----------------- + | qemu version | 2.2.0 | + | qemu thread affinity | core 5 | taskset 0x20 + | memory | 4GB | - + | cores | 2 | - + | Qcow2 image | CentOS7| - + | mrg_rxbuf | off | - + ``` - Again, if you are using an alternative character device (other than - `/dev/vhost-net`), please specify the file descriptor like so: + Instantiate Guest - `` + ``` + export VM_NAME=vhost-vm + export GUEST_MEM=3072M + export QCOW2_IMAGE=/root/CentOS7_x86_64.qcow2 + export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch - Where refers to the open file descriptor of the character device. - Instructions of how to retrieve the file descriptor can be found in the - "DPDK vhost VM configuration" section. - Alternatively, the process is automated with the qemu-wrap.py script, - detailed in the next section. + taskset 0x20 qemu-system-x86_64 -name $VM_NAME -cpu host -enable-kvm -m $GUEST_MEM -object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on -numa node,memdev=mem -mem-prealloc -smp sockets=1,cores=2 -drive file=$QCOW2_IMAGE -chardev socket,id=char0,path=$VHOST_SOCK_DIR/dpdkvhostuser0 -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser1 -netdev type=vhost-user,id=mynet2,chardev=char1,vhostforce -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic -snapshot + ``` -Now you may launch your VM using virt-manager, or like so: + 4. Guest VM using libvirt - `virsh create my_vhost_vm.xml` + The below is a simple xml configuration of 'demovm' guest that can be instantiated + using 'virsh'. The guest uses a pair of vhostuser port and boots with 4GB RAM and 2 cores. + More information can be found in [Vhost Walkthrough]. -DPDK vhost VM configuration with libvirt and QEMU wrapper: ----------------------------------------------------------- + ``` + + demovm + 4a9b3f53-fa2a-47f3-a757-dd87720d9d1d + 4194304 + 4194304 + + + + + + 2 + + 4096 + + + + + + hvm + + + + + + + + + + + + + + destroy + restart + destroy + + /usr/bin/qemu-kvm + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ``` -To use the qemu-wrapper script in conjuntion with libvirt, follow the -steps in the previous section before proceeding with the following steps: + 5. DPDK Packet forwarding in Guest VM - 1. Place `qemu-wrap.py` in libvirtd's binary search PATH ($PATH) - Ideally in the same directory that the QEMU binary is located. + To accomplish this, DPDK and testpmd application have to be first compiled + on the VM and the steps are listed in [DPDK in the VM]. - 2. Ensure that the script has the same owner/group and file permissions - as the QEMU binary. + * Run test-pmd application - 3. Update the VM xml file using "virsh edit VM.xml" + ``` + cd $DPDK_DIR/app/test-pmd; + ./testpmd -c 0x3 -n 4 --socket-mem 1024 -- --burst=64 -i --txqflags=0xf00 --disable-hw-vlan + set fwd mac_retry + start + ``` - 1. Set the VM to use the launch script. - Set the emulator path contained in the `` tags. - For example, replace: + * Bind vNIC back to kernel once the test is completed. - `/usr/bin/qemu-kvm` + ``` + $DPDK_DIR/tools/dpdk_nic_bind.py --bind=virtio-pci 0000:00:03.0 + $DPDK_DIR/tools/dpdk_nic_bind.py --bind=virtio-pci 0000:00:04.0 + ``` + Note: Appropriate PCI IDs to be passed in above example. The PCI IDs can be + retrieved using '$DPDK_DIR/tools/dpdk_nic_bind.py --status' cmd. - with: +### 5.3 PHY-VM-PHY [IVSHMEM] - `/usr/bin/qemu-wrap.py` + The steps for setup of IVSHMEM are covered in section 5.2(PVP - IVSHMEM) + of [OVS Testcases] in ADVANCED install guide. - 4. Edit the Configuration Parameters section of the script to point to - the correct emulator location and set any additional options. If you are - using a alternative character device name, please set "us_vhost_path" to the - location of that device. The script will automatically detect and insert - the correct "vhostfd" value in the QEMU command line arguements. +## 6. Limitations - 5. Use virt-manager to launch the VM + - Supports MTU size 1500, MTU setting for DPDK netdevs will be in future OVS release. + - Currently DPDK ports does not use HW offload functionality. + - Network Interface Firmware requirements: + Each release of DPDK is validated against a specific firmware version for + a supported Network Interface. New firmware versions introduce bug fixes, + performance improvements and new functionality that DPDK leverages. The + validated firmware versions are available as part of the release notes for + DPDK. It is recommended that users update Network Interface firmware to + match what has been validated for the DPDK release. -Restrictions: -------------- + For DPDK 16.04, the list of validated firmware versions can be found at: - - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. - - Currently DPDK port does not make use any offload functionality. - - DPDK-vHost support works with 1G huge pages. + http://dpdk.org/doc/guides/rel_notes/release_16_04.html - ivshmem: - - The shared memory is currently restricted to the use of a 1GB - huge pages. - - All huge pages are shared amongst the host, clients, virtual - machines etc. Bug Reporting: -------------- Please report problems to bugs@openvswitch.org. -[INSTALL.userspace.md]:INSTALL.userspace.md + +[DPDK requirements]: http://dpdk.org/doc/guides/linux_gsg/sys_reqs.html +[Download DPDK]: http://dpdk.org/browse/dpdk/refs/ +[Download OVS]: http://openvswitch.org/releases/ +[DPDK Supported NICs]: http://dpdk.org/doc/nics +[Build Requirements]: https://github.com/openvswitch/ovs/blob/master/INSTALL.md#build-requirements +[INSTALL.DPDK-ADVANCED.md]: INSTALL.DPDK-ADVANCED.md +[OVS Testcases]: INSTALL.DPDK-ADVANCED.md#ovstc +[Vhost Walkthrough]: INSTALL.DPDK-ADVANCED.md#vhost +[DPDK in the VM]: INSTALL.DPDK.md#builddpdk [INSTALL.md]:INSTALL.md -[DPDK Linux GSG]: http://www.dpdk.org/doc/guides/linux_gsg/build_dpdk.html#binding-and-unbinding-network-ports-to-from-the-igb-uioor-vfio-modules -[DPDK Docs]: http://dpdk.org/doc +[INSTALL.Fedora.md]:INSTALL.Fedora.md +[INSTALL.RHEL.md]:INSTALL.RHEL.md +[INSTALL.Debian.md]:INSTALL.Debian.md