1 Using Open vSwitch with DPDK
2 ============================
4 Open vSwitch can use Intel(R) DPDK lib to operate entirely in
5 userspace. This file explains how to install and use Open vSwitch in
8 The DPDK support of Open vSwitch is considered experimental.
9 It has not been thoroughly tested.
11 This version of Open vSwitch should be built manually with "configure"
14 Building and Installing:
15 ------------------------
20 Set dir i.g.: export DPDK_DIR=/usr/src/dpdk-1.7.0
22 update config/common_linuxapp so that dpdk generate single lib file.
23 (modification also required for IVSHMEM build)
24 CONFIG_RTE_BUILD_COMBINE_LIBS=y
26 For default install without IVSHMEM:
27 make install T=x86_64-native-linuxapp-gcc
28 To include IVSHMEM (shared memory):
29 make install T=x86_64-ivshmem-linuxapp-gcc
30 For details refer to http://dpdk.org/
33 Refer to intel-dpdk-getting-started-guide.pdf for understanding
34 DPDK kernel requirement.
38 export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/
40 export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/
42 cd $(OVS_DIR)/openvswitch
44 ./configure --with-dpdk=$DPDK_BUILD
47 Refer to INSTALL.userspace for general requirements of building
50 Using the DPDK with ovs-vswitchd:
51 ---------------------------------
54 kernel bootline, add: default_hugepagesz=1GB hugepagesz=1G hugepages=1
56 First setup DPDK devices:
60 e.g. insmod $DPDK_BUILD/kmod/igb_uio.ko
61 - Bind network device to igb_uio.
62 e.g. $DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1
63 Alternate binding method:
64 Find target Ethernet devices
65 lspci -nn|grep Ethernet
66 Bring Down (e.g. eth2, eth3)
69 Look at current devices (e.g ixgbe devices)
70 ls /sys/bus/pci/drivers/ixgbe/
71 0000:02:00.0 0000:02:00.1 bind module new_id remove_id uevent unbind
72 Unbind target pci devices from current driver (e.g. 02:00.0 ...)
73 echo 0000:02:00.0 > /sys/bus/pci/drivers/ixgbe/unbind
74 echo 0000:02:00.1 > /sys/bus/pci/drivers/ixgbe/unbind
75 Bind to target driver (e.g. igb_uio)
76 echo 0000:02:00.0 > /sys/bus/pci/drivers/igb_uio/bind
77 echo 0000:02:00.1 > /sys/bus/pci/drivers/igb_uio/bind
78 Check binding for listed devices
79 ls /sys/bus/pci/drivers/igb_uio
80 0000:02:00.0 0000:02:00.1 bind module new_id remove_id uevent unbind
84 e.g. mount -t hugetlbfs -o pagesize=1G none /dev/hugepages
86 Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup.
88 Start ovsdb-server as discussed in INSTALL doc:
90 First time only db creation (or clearing):
91 mkdir -p /usr/local/etc/openvswitch
92 mkdir -p /usr/local/var/run/openvswitch
93 rm /usr/local/etc/openvswitch/conf.db
95 ./ovsdb/ovsdb-tool create /usr/local/etc/openvswitch/conf.db \
96 ./vswitchd/vswitch.ovsschema
99 ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \
100 --remote=db:Open_vSwitch,Open_vSwitch,manager_options \
101 --private-key=db:Open_vSwitch,SSL,private_key \
102 --certificate=Open_vSwitch,SSL,certificate \
103 --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach
104 First time after db creation, initialize:
106 ./utilities/ovs-vsctl --no-wait init
109 DPDK configuration arguments can be passed to vswitchd via `--dpdk`
110 argument. This needs to be first argument passed to vswitchd process.
111 dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
112 for dpdk initialization.
115 export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
116 ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach
118 If allocated more than one GB hugepage (as for IVSHMEM), set amount and use NUMA
121 ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \
122 -- unix:$DB_SOCK --pidfile --detach
124 To use ovs-vswitchd with DPDK, create a bridge with datapath_type
125 "netdev" in the configuration database. For example:
128 ovs-vsctl set bridge br0 datapath_type=netdev
130 Now you can add dpdk devices. OVS expect DPDK device name start with dpdk
131 and end with portid. vswitchd should print number of dpdk devices found.
133 ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
134 ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
136 Once first DPDK port is added to vswitchd, it creates a Polling thread and
137 polls dpdk device in continuous loop. Therefore CPU utilization
138 for that thread is always 100%.
140 Test flow script across NICs (assuming ovs in /usr/src/ovs):
141 Assume 1.1.1.1 on NIC port 1 (dpdk0)
142 Assume 1.1.1.2 on NIC port 2 (dpdk1)
145 ############################# Script:
148 # Move to command directory
149 cd /usr/src/ovs/utilities/
151 # Clear current flows
152 ./ovs-ofctl del-flows br0
154 # Add flows between port 1 (dpdk0) to port 2 (dpdk1)
155 ./ovs-ofctl add-flow br0 in_port=1,dl_type=0x800,nw_src=1.1.1.1,\
156 nw_dst=1.1.1.2,idle_timeout=0,action=output:2
157 ./ovs-ofctl add-flow br0 in_port=2,dl_type=0x800,nw_src=1.1.1.2,\
158 nw_dst=1.1.1.1,idle_timeout=0,action=output:1
160 ######################################
162 Ideally for maximum throughput, the 100% task should not be scheduled out
163 which temporarily halts the process. The following affinitization methods will
166 At this time all ovs-vswitchd tasks end up being affinitized to cpu core 0
167 but this may change. Lets pick a target core for 100% task to run on, i.e. core 7.
168 Also assume a dual 8 core sandy bridge system with hyperthreading enabled
169 where CPU1 has cores 0,...,7 and 16,...,23 & CPU2 cores 8,...,15 & 24,...,31.
170 (A different cpu configuration will have different core mask requirements).
172 To give better ownership of 100%, isolation maybe useful.
173 To kernel bootline add core isolation list for core 7 and associated hype core 23
175 Reboot system for isolation to take effect, restart everything
177 List threads (and their pid) of ovs-vswitchd
178 top -p `pidof ovs-vswitchd` -H -d1
180 Look for pmd* thread which is polling dpdk devices, this will be the 100% CPU
181 bound task. Using this thread pid, affinitize to core 7 (mask 0x080),
185 pid 1762's current affinity mask: 1
186 pid 1762's new affinity mask: 80
188 Assume that all other ovs-vswitchd threads to be on other socket 0 cores.
189 Affinitize the rest of the ovs-vswitchd thread ids to 0x07F007F
191 taskset -p 0x07F007F {thread pid, e.g 1738}
192 pid 1738's current affinity mask: 1
193 pid 1738's new affinity mask: 7f007f
196 The core 23 is left idle, which allows core 7 to run at full rate.
198 Future changes may change the need for cpu core affinitization.
203 Following the steps above to create a bridge, you can now add dpdk rings
204 as a port to the vswitch. OVS will expect the DPDK ring device name to
205 start with dpdkr and end with a portid.
207 ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr
209 DPDK rings client test application
211 Included in the test directory is a sample DPDK application for testing
212 the rings. This is from the base dpdk directory and modified to work
213 with the ring naming used within ovs.
215 location tests/ovs_client
218 cd /usr/src/ovs/tests/
219 ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr"
221 In the case of the dpdkr example above the "port id you gave dpdkr" is 0.
223 It is essential to have --proc-type=secondary
225 The application simply receives an mbuf on the receive queue of the
226 ethernet ring and then places that same mbuf on the transmit ring of
227 the ethernet ring. It is a trivial loopback application.
229 DPDK rings in VM (IVSHMEM shared memory communications)
230 -------------------------------------------------------
232 In addition to executing the client in the host, you can execute it within
233 a guest VM. To do so you will need a patched qemu. You can download the
234 patch and getting started guide at :
236 https://01.org/packet-processing/downloads
238 A general rule of thumb for better performance is that the client
239 application should not be assigned the same dpdk core mask "-c" as
245 - This Support is for Physical NIC. I have tested with Intel NIC only.
246 - vswitchd userspace datapath does affine polling thread but it is
247 assumed that devices are on numa node 0. Therefore if device is
248 attached to non zero numa node switching performance would be
250 - There are fixed number of polling thread and fixed number of per
251 device queues configured.
252 - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
253 - Currently DPDK port does not make use any offload functionality.
255 - The shared memory is currently restricted to the use of a 1GB
257 - All huge pages are shared amongst the host, clients, virtual
263 Please report problems to bugs@openvswitch.org.