X-Git-Url: http://git.cascardo.eti.br/?a=blobdiff_plain;f=INSTALL.DPDK.md;h=cdef6cfcb6f870912116c7809569319916ee5404;hb=d54ac8032cd7c46a965ecd48fbf07fa430ce826d;hp=60889d01d51e5d322023e4886dc5e4fbb5cca03d;hpb=5568661cbff1095b595c6020f2a311d8743dc47f;p=cascardo%2Fovs.git diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index 60889d01d..cdef6cfcb 100644 --- a/INSTALL.DPDK.md +++ b/INSTALL.DPDK.md @@ -16,13 +16,15 @@ OVS needs a system with 1GB hugepages support. Building and Installing: ------------------------ -Required DPDK 1.8.0, `fuse`, `fuse-devel` (`libfuse-dev` on Debian/Ubuntu) +Required: DPDK 2.0 +Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev` +on Debian/Ubuntu) 1. Configure build & install DPDK: 1. Set `$DPDK_DIR` ``` - export DPDK_DIR=/usr/src/dpdk-1.8.0 + export DPDK_DIR=/usr/src/dpdk-2.0 cd $DPDK_DIR ``` @@ -32,7 +34,7 @@ Required DPDK 1.8.0, `fuse`, `fuse-devel` (`libfuse-dev` on Debian/Ubuntu) `CONFIG_RTE_BUILD_COMBINE_LIBS=y` Update `config/common_linuxapp` so that DPDK is built with vhost - libraries: + libraries. `CONFIG_RTE_LIBRTE_VHOST=y` @@ -65,10 +67,12 @@ Required DPDK 1.8.0, `fuse`, `fuse-devel` (`libfuse-dev` on Debian/Ubuntu) ``` cd $(OVS_DIR)/openvswitch ./boot.sh - ./configure --with-dpdk=$DPDK_BUILD + ./configure --with-dpdk=$DPDK_BUILD [CFLAGS="-g -O2 -Wno-cast-align"] make ``` + Note: 'clang' users may specify the '-Wno-cast-align' flag to suppress DPDK cast-align warnings. + To have better performance one can enable aggressive compiler optimizations and use the special instructions(popcnt, crc32) that may not be available on all machines. Instead of typing `make`, type: @@ -95,7 +99,7 @@ Using the DPDK with ovs-vswitchd: 1. insert uio.ko: `modprobe uio` 2. insert igb_uio.ko: `insmod $DPDK_BUILD/kmod/igb_uio.ko` 3. Bind network device to igb_uio: - `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1` + `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1` * VFIO: @@ -106,7 +110,7 @@ Using the DPDK with ovs-vswitchd: 2. Set correct permissions on vfio device: `sudo /usr/bin/chmod a+x /dev/vfio` and: `sudo /usr/bin/chmod 0666 /dev/vfio/*` 3. Bind network device to vfio-pci: - `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1` + `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1` 3. Mount the hugetable filsystem @@ -182,6 +186,14 @@ Using the DPDK with ovs-vswitchd: polls dpdk device in continuous loop. Therefore CPU utilization for that thread is always 100%. + Note: creating bonds of DPDK interfaces is slightly different to creating + bonds of system interfaces. For DPDK, the interface type must be explicitly + set, for example: + + ``` + ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 -- set Interface dpdk0 type=dpdk -- set Interface dpdk1 type=dpdk + ``` + 7. Add test flows Test flow script across NICs (assuming ovs in /usr/src/ovs): @@ -247,8 +259,13 @@ Using the DPDK with ovs-vswitchd: Note, the pmd threads on a numa node are only created if there is at least one DPDK interface from the numa node that has been added to OVS. - Note, core 0 is always reserved from non-pmd threads and should never be set - in the cpu mask. + To understand where most of the time is spent and whether the caches are + effective, these commands can be used: + + ``` + ovs-appctl dpif-netdev/pmd-stats-clear #To reset statistics + ovs-appctl dpif-netdev/pmd-stats-show + ``` DPDK Rings : ------------ @@ -298,40 +315,164 @@ the vswitchd. DPDK vhost: ----------- -vhost-cuse is only supported at present i.e. not using the standard QEMU -vhost-user interface. It is intended that vhost-user support will be added -in future releases when supported in DPDK and that vhost-cuse will eventually -be deprecated. See [DPDK Docs] for more info on vhost. +DPDK 2.0 supports two types of vhost: + +1. vhost-user +2. vhost-cuse + +Whatever type of vhost is enabled in the DPDK build specified, is the type +that will be enabled in OVS. By default, vhost-user is enabled in DPDK. +Therefore, unless vhost-cuse has been enabled in DPDK, vhost-user ports +will be enabled in OVS. +Please note that support for vhost-cuse is intended to be deprecated in OVS +in a future release. + +DPDK vhost-user: +---------------- + +The following sections describe the use of vhost-user 'dpdkvhostuser' ports +with OVS. + +DPDK vhost-user Prerequisites: +------------------------- + +1. DPDK 2.0 with vhost support enabled as documented in the "Building and + Installing section" + +2. QEMU version v2.1.0+ -Prerequisites: -1. Insert the Cuse module: + QEMU v2.1.0 will suffice, but it is recommended to use v2.2.0 if providing + your VM with memory greater than 1GB due to potential issues with memory + mapping larger areas. - `modprobe cuse` +Adding DPDK vhost-user ports to the Switch: +-------------------------------------- -2. Build and insert the `eventfd_link` module: +Following the steps above to create a bridge, you can now add DPDK vhost-user +as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-user ports can +have arbitrary names. - `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/` - `make` - `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko` + - For vhost-user, the name of the port type is `dpdkvhostuser` -Following the steps above to create a bridge, you can now add DPDK vhost -as a port to the vswitch. + ``` + ovs-ofctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 + type=dpdkvhostuser + ``` + + This action creates a socket located at + `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide + to your VM on the QEMU command line. More instructions on this can be + found in the next section "DPDK vhost-user VM configuration" + Note: If you wish for the vhost-user sockets to be created in a + directory other than `/usr/local/var/run/openvswitch`, you may specify + another location on the ovs-vswitchd command line like so: + + `./vswitchd/ovs-vswitchd --dpdk -vhost_sock_dir /my-dir -c 0x1 ...` -`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost` +DPDK vhost-user VM configuration: +--------------------------------- +Follow the steps below to attach vhost-user port(s) to a VM. -Unlike DPDK ring ports, DPDK vhost ports can have arbitrary names: +1. Configure sockets. + Pass the following parameters to QEMU to attach a vhost-user device: -`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost` + ``` + -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1 + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 + ``` -However, please note that when attaching userspace devices to QEMU, the -name provided during the add-port operation must match the ifname parameter -on the QEMU command line. + ...where vhost-user-1 is the name of the vhost-user port added + to the switch. + Repeat the above parameters for multiple devices, changing the + chardev path and id as necessary. Note that a separate and different + chardev path needs to be specified for each vhost-user device. For + example you have a second vhost-user port named 'vhost-user-2', you + append your QEMU command line with an additional set of parameters: + ``` + -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2 + -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce + -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2 + ``` + +2. Configure huge pages. + QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports access + a virtio-net device's virtual rings and packet buffers mapping the VM's + physical memory on hugetlbfs. To enable vhost-user ports to map the VM's + memory into their process address space, pass the following paramters + to QEMU: + + ``` + -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages, + share=on + -numa node,memdev=mem -mem-prealloc + ``` -DPDK vhost VM configuration: ----------------------------- +DPDK vhost-cuse: +---------------- - vhost ports use a Linux* character device to communicate with QEMU. +The following sections describe the use of vhost-cuse 'dpdkvhostcuse' ports +with OVS. + +DPDK vhost-cuse Prerequisites: +------------------------- + +1. DPDK 2.0 with vhost support enabled as documented in the "Building and + Installing section" + As an additional step, you must enable vhost-cuse in DPDK by setting the + following additional flag in `config/common_linuxapp`: + + `CONFIG_RTE_LIBRTE_VHOST_USER=n` + + Following this, rebuild DPDK as per the instructions in the "Building and + Installing" section. Finally, rebuild OVS as per step 3 in the "Building + and Installing" section - OVS will detect that DPDK has vhost-cuse libraries + compiled and in turn will enable support for it in the switch and disable + vhost-user support. + +2. Insert the Cuse module: + + `modprobe cuse` + +3. Build and insert the `eventfd_link` module: + + ``` + cd $DPDK_DIR/lib/librte_vhost/eventfd_link/ + make + insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko + ``` + +4. QEMU version v2.1.0+ + + vhost-cuse will work with QEMU v2.1.0 and above, however it is recommended to + use v2.2.0 if providing your VM with memory greater than 1GB due to potential + issues with memory mapping larger areas. + Note: QEMU v1.6.2 will also work, with slightly different command line parameters, + which are specified later in this document. + +Adding DPDK vhost-cuse ports to the Switch: +-------------------------------------- + +Following the steps above to create a bridge, you can now add DPDK vhost-cuse +as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-cuse ports can have +arbitrary names. + + - For vhost-cuse, the name of the port type is `dpdkvhostcuse` + + ``` + ovs-ofctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1 + type=dpdkvhostcuse + ``` + + When attaching vhost-cuse ports to QEMU, the name provided during the + add-port operation must match the ifname parameter on the QEMU command + line. More instructions on this can be found in the next section. + +DPDK vhost-cuse VM configuration: +--------------------------------- + + vhost-cuse ports use a Linux* character device to communicate with QEMU. By default it is set to `/dev/vhost-net`. It is possible to reuse this standard device for DPDK vhost, which makes setup a little simpler but it is better practice to specify an alternative character device in order to @@ -397,16 +538,19 @@ DPDK vhost VM configuration: QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a virtio-net device's virtual rings and packet buffers mapping the VM's physical memory on hugetlbfs. To enable vhost-ports to map the VM's - memory into their process address space, pass the following paramters + memory into their process address space, pass the following parameters to QEMU: `-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages, share=on -numa node,memdev=mem -mem-prealloc` + Note: For use with an earlier QEMU version such as v1.6.2, use the + following to configure hugepages instead: -DPDK vhost VM configuration with QEMU wrapper: ----------------------------------------------- + `-mem-path /dev/hugepages -mem-prealloc` +DPDK vhost-cuse VM configuration with QEMU wrapper: +--------------------------------------------------- The QEMU wrapper script automatically detects and calls QEMU with the necessary parameters. It performs the following actions: @@ -432,8 +576,8 @@ qemu-wrap.py -cpu host -boot c -hda -m 4096 -smp 4 netdev=net1,mac=00:00:00:00:00:01 ``` -DPDK vhost VM configuration with libvirt: ------------------------------------------ +DPDK vhost-cuse VM configuration with libvirt: +---------------------------------------------- If you are using libvirt, you must enable libvirt to access the character device by adding it to controllers cgroup for libvirtd using the following @@ -507,7 +651,7 @@ Now you may launch your VM using virt-manager, or like so: `virsh create my_vhost_vm.xml` -DPDK vhost VM configuration with libvirt and QEMU wrapper: +DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper: ---------------------------------------------------------- To use the qemu-wrapper script in conjuntion with libvirt, follow the @@ -535,10 +679,28 @@ steps in the previous section before proceeding with the following steps: the correct emulator location and set any additional options. If you are using a alternative character device name, please set "us_vhost_path" to the location of that device. The script will automatically detect and insert - the correct "vhostfd" value in the QEMU command line arguements. + the correct "vhostfd" value in the QEMU command line arguments. 5. Use virt-manager to launch the VM +Running ovs-vswitchd with DPDK backend inside a VM +-------------------------------------------------- + +Please note that additional configuration is required if you want to run +ovs-vswitchd with DPDK backend inside a QEMU virtual machine. Ovs-vswitchd +creates separate DPDK TX queues for each CPU core available. This operation +fails inside QEMU virtual machine because, by default, VirtIO NIC provided +to the guest is configured to support only single TX queue and single RX +queue. To change this behavior, you need to turn on 'mq' (multiqueue) +property of all virtio-net-pci devices emulated by QEMU and used by DPDK. +You may do it manually (by changing QEMU command line) or, if you use Libvirt, +by adding the following string: + +`` + +to sections of all network devices used by DPDK. Parameter 'N' +determines how many queues can be used by the guest. + Restrictions: ------------- @@ -547,10 +709,16 @@ Restrictions: - DPDK-vHost support works with 1G huge pages. ivshmem: - - The shared memory is currently restricted to the use of a 1GB - huge pages. - - All huge pages are shared amongst the host, clients, virtual - machines etc. + - If you run Open vSwitch with smaller page sizes (e.g. 2MB), you may be + unable to share any rings or mempools with a virtual machine. + This is because the current implementation of ivshmem works by sharing + a single 1GB huge page from the host operating system to any guest + operating system through the Qemu ivshmem device. When using smaller + page sizes, multiple pages may be required to hold the ring descriptors + and buffer pools. The Qemu ivshmem device does not allow you to share + multiple file descriptors to the guest operating system. However, if you + want to share dpdkr rings with other processes on the host, you can do + this with smaller page sizes. Bug Reporting: --------------