X-Git-Url: http://git.cascardo.eti.br/?a=blobdiff_plain;f=INSTALL.DPDK;h=d9a77c98f1f6c19b717a75c326e185b507c9866a;hb=29a5b29f09b06205a4b5f2d3701cbd18a70ae226;hp=4551f4cd9062d89ac144283986bfb74bc77abc2c;hpb=d1279464ccfc6321075174f04b9df522b24cb674;p=cascardo%2Fovs.git diff --git a/INSTALL.DPDK b/INSTALL.DPDK index 4551f4cd9..d9a77c98f 100644 --- a/INSTALL.DPDK +++ b/INSTALL.DPDK @@ -14,15 +14,19 @@ and "make". Building and Installing: ------------------------ -Recommended to use DPDK 1.6. +Required DPDK 1.7. DPDK: -Set dir i.g.: export DPDK_DIR=/usr/src/dpdk-1.6.0r2 +Set dir i.g.: export DPDK_DIR=/usr/src/dpdk-1.7.0 cd $DPDK_DIR -update config/defconfig_x86_64-default-linuxapp-gcc so that dpdk generate single lib file. +update config/common_linuxapp so that dpdk generate single lib file. +(modification also required for IVSHMEM build) CONFIG_RTE_BUILD_COMBINE_LIBS=y -make install T=x86_64-default-linuxapp-gcc +For default install without IVSHMEM: +make install T=x86_64-native-linuxapp-gcc +To include IVSHMEM (shared memory): +make install T=x86_64-ivshmem-linuxapp-gcc For details refer to http://dpdk.org/ Linux kernel: @@ -30,9 +34,13 @@ Refer to intel-dpdk-getting-started-guide.pdf for understanding DPDK kernel requirement. OVS: +Non IVSHMEM: +export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/ +IVSHMEM: +export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/ + cd $(OVS_DIR)/openvswitch ./boot.sh -export DPDK_BUILD=/usr/src/dpdk-1.6.0r2/x86_64-default-linuxapp-gcc ./configure --with-dpdk=$DPDK_BUILD make @@ -49,9 +57,9 @@ First setup DPDK devices: - insert uio.ko e.g. modprobe uio - insert igb_uio.ko - e.g. insmod DPDK/x86_64-default-linuxapp-gcc/kmod/igb_uio.ko - - Bind network device to ibg_uio. - e.g. DPDK/tools/pci_unbind.py --bind=igb_uio eth1 + e.g. insmod $DPDK_BUILD/kmod/igb_uio.ko + - Bind network device to igb_uio. + e.g. $DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1 Alternate binding method: Find target Ethernet devices lspci -nn|grep Ethernet @@ -73,7 +81,7 @@ First setup DPDK devices: Prepare system: - mount hugetlbfs - e.g. mount -t hugetlbfs -o pagesize=1G none /mnt/huge/ + e.g. mount -t hugetlbfs -o pagesize=1G none /dev/hugepages Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup. @@ -91,7 +99,7 @@ Start ovsdb-server as discussed in INSTALL doc: ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ --private-key=db:Open_vSwitch,SSL,private_key \ - --certificate=dbitch,SSL,certificate \ + --certificate=Open_vSwitch,SSL,certificate \ --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach First time after db creation, initialize: cd $OVS_DIR @@ -105,12 +113,13 @@ for dpdk initialization. e.g. export DB_SOCK=/usr/local/var/run/openvswitch/db.sock - ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach + ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach -If allocated more than 1 GB huge pages, set amount and use NUMA node 0 memory: +If allocated more than one GB hugepage (as for IVSHMEM), set amount and use NUMA +node 0 memory: ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \ - -- unix:$DB_SOCK --pidfile --detach + -- unix:$DB_SOCK --pidfile --detach To use ovs-vswitchd with DPDK, create a bridge with datapath_type "netdev" in the configuration database. For example: @@ -136,9 +145,7 @@ Test flow script across NICs (assuming ovs in /usr/src/ovs): ############################# Script: #! /bin/sh - # Move to command directory - cd /usr/src/ovs/utilities/ # Clear current flows @@ -152,42 +159,51 @@ nw_dst=1.1.1.1,idle_timeout=0,action=output:1 ###################################### -Ideally for maximum throughput, the 100% task should not be scheduled out -which temporarily halts the process. The following affinitization methods will -help. +With pmd multi-threading support, OVS creates one pmd thread for each +numa node as default. The pmd thread handles the I/O of all DPDK +interfaces on the same numa node. The following two commands can be used +to configure the multi-threading behavior. + + ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask= + +The command above asks for a CPU mask for setting the affinity of pmd threads. +A set bit in the mask means a pmd thread is created and pinned to the +corresponding CPU core. For more information, please refer to +`man ovs-vswitchd.conf.db` -At this time all ovs-vswitchd tasks end up being affinitized to cpu core 0 -but this may change. Lets pick a target core for 100% task to run on, i.e. core 7. -Also assume a dual 8 core sandy bridge system with hyperthreading enabled. -(A different cpu configuration will have different core mask requirements). + ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs= -To give better ownership of 100%, isolation maybe useful. -To kernel bootline add core isolation list for core 7 and associated hype core 23 - e.g. isolcpus=7,23 -Reboot system for isolation to take effect, restart everything +The command above sets the number of rx queues of each DPDK interface. The +rx queues are assigned to pmd threads on the same numa node in round-robin +fashion. For more information, please refer to `man ovs-vswitchd.conf.db` -List threads (and their pid) of ovs-vswitchd - top -p `pidof ovs-vswitchd` -H -d1 +Ideally for maximum throughput, the pmd thread should not be scheduled out +which temporarily halts its execution. The following affinitization methods +can help. -Look for pmd* thread which is polling dpdk devices, this will be the 100% CPU -bound task. Using this thread pid, affinitize to core 7 (mask 0x080), -example pid 1762 +Lets pick core 4,6,8,10 for pmd threads to run on. Also assume a dual 8 core +sandy bridge system with hyperthreading enabled where CPU1 has cores 0,...,7 +and 16,...,23 & CPU2 cores 8,...,15 & 24,...,31. (A different cpu +configuration could have different core mask requirements). -taskset -p 080 1762 - pid 1762's current affinity mask: 1 - pid 1762's new affinity mask: 80 +To kernel bootline add core isolation list for cores and associated hype cores +(e.g. isolcpus=4,20,6,22,8,24,10,26,). Reboot system for isolation to take +effect, restart everything. -Assume that all other ovs-vswitchd threads to be on other socket 0 cores. -Affinitize the rest of the ovs-vswitchd thread ids to 0x0FF007F +Configure pmd threads on core 4,6,8,10 using 'pmd-cpu-mask': -taskset -p 0x0FF007F {thread pid, e.g 1738} - pid 1738's current affinity mask: 1 - pid 1738's new affinity mask: ff007f -. . . + ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=00000550 -The core 23 is left idle, which allows core 7 to run at full rate. +You should be able to check that pmd threads are pinned to the correct cores +via: -Future changes may change the need for cpu core affinitization. + top -p `pidof ovs-vswitchd` -H -d1 + +Note, the pmd threads on a numa node are only created if there is at least +one DPDK interface from the numa node that has been added to OVS. + +Note, core 0 is always reserved from non-pmd threads and should never be set +in the cpu mask. DPDK Rings : ------------ @@ -207,8 +223,8 @@ with the ring naming used within ovs. location tests/ovs_client To run the client : - - ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr" + cd /usr/src/ovs/tests/ + ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr" In the case of the dpdkr example above the "port id you gave dpdkr" is 0. @@ -218,6 +234,9 @@ The application simply receives an mbuf on the receive queue of the ethernet ring and then places that same mbuf on the transmit ring of the ethernet ring. It is a trivial loopback application. +DPDK rings in VM (IVSHMEM shared memory communications) +------------------------------------------------------- + In addition to executing the client in the host, you can execute it within a guest VM. To do so you will need a patched qemu. You can download the patch and getting started guide at : @@ -232,12 +251,6 @@ Restrictions: ------------- - This Support is for Physical NIC. I have tested with Intel NIC only. - - vswitchd userspace datapath does affine polling thread but it is - assumed that devices are on numa node 0. Therefore if device is - attached to non zero numa node switching performance would be - suboptimal. - - There are fixed number of polling thread and fixed number of per - device queues configured. - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. - Currently DPDK port does not make use any offload functionality. ivshmem