Building and Installing:
------------------------
-Required: DPDK 2.2
+Required: DPDK 16.04, libnuma
Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev`
on Debian/Ubuntu)
1. Set `$DPDK_DIR`
```
- export DPDK_DIR=/usr/src/dpdk-2.2
+ export DPDK_DIR=/usr/src/dpdk-16.04
cd $DPDK_DIR
```
- 2. Update `config/common_linuxapp` so that DPDK generate single lib file.
- (modification also required for IVSHMEM build)
-
- `CONFIG_RTE_BUILD_COMBINE_LIBS=y`
-
- Then run `make install` to build and install the library.
+ 2. Then run `make install` to build and install the library.
For default install without IVSHMEM:
- `make install T=x86_64-native-linuxapp-gcc`
+ `make install T=x86_64-native-linuxapp-gcc DESTDIR=install`
To include IVSHMEM (shared memory):
- `make install T=x86_64-ivshmem-linuxapp-gcc`
+ `make install T=x86_64-ivshmem-linuxapp-gcc DESTDIR=install`
For further details refer to http://dpdk.org/
5. Start vswitchd:
- DPDK configuration arguments can be passed to vswitchd via `--dpdk`
- argument. This needs to be first argument passed to vswitchd process.
- dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
- for dpdk initialization.
+ DPDK configuration arguments can be passed to vswitchd via Open_vSwitch
+ other_config column. The recognized configuration options are listed.
+ Defaults will be provided for all values not explicitly set.
+
+ * dpdk-init
+ Specifies whether OVS should initialize and support DPDK ports. This is
+ a boolean, and defaults to false.
+
+ * dpdk-lcore-mask
+ Specifies the CPU cores on which dpdk lcore threads should be spawned.
+ The DPDK lcore threads are used for DPDK library tasks, such as
+ library internal message processing, logging, etc. Value should be in
+ the form of a hex string (so '0x123') similar to the 'taskset' mask
+ input.
+ If not specified, the value will be determined by choosing the lowest
+ CPU core from initial cpu affinity list. Otherwise, the value will be
+ passed directly to the DPDK library.
+ For performance reasons, it is best to set this to a single core on
+ the system, rather than allow lcore threads to float.
+
+ * dpdk-alloc-mem
+ This sets the total memory to preallocate from hugepages regardless of
+ processor socket. It is recommended to use dpdk-socket-mem instead.
+
+ * dpdk-socket-mem
+ Comma separated list of memory to pre-allocate from hugepages on specific
+ sockets.
+
+ * dpdk-hugepage-dir
+ Directory where hugetlbfs is mounted
+
+ * dpdk-extra
+ Extra arguments to provide to DPDK EAL, as previously specified on the
+ command line. Do not pass '--no-huge' to the system in this way. Support
+ for running the system without hugepages is nonexistent.
+
+ * cuse-dev-name
+ Option to set the vhost_cuse character device name.
+
+ * vhost-sock-dir
+ Option to set the path to the vhost_user unix socket files.
+
+ NOTE: Changing any of these options requires restarting the ovs-vswitchd
+ application.
+
+ Open vSwitch can be started as normal. DPDK will be initialized as long
+ as the dpdk-init option has been set to 'true'.
+
```
export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
- ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach
+ ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
+ ovs-vswitchd unix:$DB_SOCK --pidfile --detach
```
If allocated more than one GB hugepage (as for IVSHMEM), set amount and
use NUMA node 0 memory:
```
- ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \
- -- unix:$DB_SOCK --pidfile --detach
+ ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,0"
+ ovs-vswitchd unix:$DB_SOCK --pidfile --detach
```
6. Add bridge & ports
./ovs-ofctl add-flow br0 in_port=2,action=output:1
```
-Performance Tuning:
--------------------
+8. QoS usage example
+
+ Assuming you have a vhost-user port transmitting traffic consisting of
+ packets of size 64 bytes, the following command would limit the egress
+ transmission rate of the port to ~1,000,000 packets per second:
+
+ `ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos create qos
+ type=egress-policer other-config:cir=46000000 other-config:cbs=2048`
+
+ To examine the QoS configuration of the port:
+
+ `ovs-appctl -t ovs-vswitchd qos/show vhost-user0`
+
+ To clear the QoS configuration from the port and ovsdb use the following:
+
+ `ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos`
+
+ For more details regarding egress-policer parameters please refer to the
+ vswitch.xml.
+
+9. Ingress Policing Example
+
+ Assuming you have a vhost-user port receiving traffic consisting of
+ packets of size 64 bytes, the following command would limit the reception
+ rate of the port to ~1,000,000 packets per second:
- 1. PMD affinitization
+ `ovs-vsctl set interface vhost-user0 ingress_policing_rate=368000
+ ingress_policing_burst=1000`
- A poll mode driver (pmd) thread handles the I/O of all DPDK
- interfaces assigned to it. A pmd thread will busy loop through
- the assigned port/rxq's polling for packets, switch the packets
- and send to a tx port if required. Typically, it is found that
- a pmd thread is CPU bound, meaning that the greater the CPU
- occupancy the pmd thread can get, the better the performance. To
- that end, it is good practice to ensure that a pmd thread has as
- many cycles on a core available to it as possible. This can be
- achieved by affinitizing the pmd thread with a core that has no
- other workload. See section 7 below for a description of how to
- isolate cores for this purpose also.
+ To examine the ingress policer configuration of the port:
- The following command can be used to specify the affinity of the
- pmd thread(s).
+ `ovs-vsctl list interface vhost-user0`
- `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string>`
+ To clear the ingress policer configuration from the port use the following:
- By setting a bit in the mask, a pmd thread is created and pinned
- to the corresponding CPU core. e.g. to run a pmd thread on core 1
+ `ovs-vsctl set interface vhost-user0 ingress_policing_rate=0`
- `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=2`
+ For more details regarding ingress-policer see the vswitch.xml.
- For more information, please refer to the Open_vSwitch TABLE section in
+Performance Tuning:
+-------------------
+
+1. PMD affinitization
- `man ovs-vswitchd.conf.db`
+ A poll mode driver (pmd) thread handles the I/O of all DPDK
+ interfaces assigned to it. A pmd thread will busy loop through
+ the assigned port/rxq's polling for packets, switch the packets
+ and send to a tx port if required. Typically, it is found that
+ a pmd thread is CPU bound, meaning that the greater the CPU
+ occupancy the pmd thread can get, the better the performance. To
+ that end, it is good practice to ensure that a pmd thread has as
+ many cycles on a core available to it as possible. This can be
+ achieved by affinitizing the pmd thread with a core that has no
+ other workload. See section 7 below for a description of how to
+ isolate cores for this purpose also.
- Note, that a pmd thread on a NUMA node is only created if there is
- at least one DPDK interface from that NUMA node added to OVS.
+ The following command can be used to specify the affinity of the
+ pmd thread(s).
- 2. Multiple poll mode driver threads
+ `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string>`
- With pmd multi-threading support, OVS creates one pmd thread
- for each NUMA node by default. However, it can be seen that in cases
- where there are multiple ports/rxq's producing traffic, performance
- can be improved by creating multiple pmd threads running on separate
- cores. These pmd threads can then share the workload by each being
- responsible for different ports/rxq's. Assignment of ports/rxq's to
- pmd threads is done automatically.
+ By setting a bit in the mask, a pmd thread is created and pinned
+ to the corresponding CPU core. e.g. to run a pmd thread on core 1
- The following command can be used to specify the affinity of the
- pmd threads.
+ `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=2`
- `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string>`
+ For more information, please refer to the Open_vSwitch TABLE section in
- A set bit in the mask means a pmd thread is created and pinned
- to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2
+ `man ovs-vswitchd.conf.db`
- `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6`
+ Note, that a pmd thread on a NUMA node is only created if there is
+ at least one DPDK interface from that NUMA node added to OVS.
- For more information, please refer to the Open_vSwitch TABLE section in
+2. Multiple poll mode driver threads
- `man ovs-vswitchd.conf.db`
+ With pmd multi-threading support, OVS creates one pmd thread
+ for each NUMA node by default. However, it can be seen that in cases
+ where there are multiple ports/rxq's producing traffic, performance
+ can be improved by creating multiple pmd threads running on separate
+ cores. These pmd threads can then share the workload by each being
+ responsible for different ports/rxq's. Assignment of ports/rxq's to
+ pmd threads is done automatically.
- For example, when using dpdk and dpdkvhostuser ports in a bi-directional
- VM loopback as shown below, spreading the workload over 2 or 4 pmd
- threads shows significant improvements as there will be more total CPU
- occupancy available.
+ The following command can be used to specify the affinity of the
+ pmd threads.
- NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
+ `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string>`
- The OVS log can be checked to confirm that the port/rxq assignment to
- pmd threads is as required. This can also be checked with the following
- commands:
+ A set bit in the mask means a pmd thread is created and pinned
+ to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2
- ```
- top -H
- taskset -p <pid_of_pmd>
- ```
+ `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6`
- To understand where most of the pmd thread time is spent and whether the
- caches are being utilized, these commands can be used:
+ For more information, please refer to the Open_vSwitch TABLE section in
- ```
- # Clear previous stats
- ovs-appctl dpif-netdev/pmd-stats-clear
+ `man ovs-vswitchd.conf.db`
- # Check current stats
- ovs-appctl dpif-netdev/pmd-stats-show
- ```
+ For example, when using dpdk and dpdkvhostuser ports in a bi-directional
+ VM loopback as shown below, spreading the workload over 2 or 4 pmd
+ threads shows significant improvements as there will be more total CPU
+ occupancy available.
- 3. DPDK port Rx Queues
+ NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
- `ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=<integer>`
+ The following command can be used to confirm that the port/rxq assignment
+ to pmd threads is as required:
- The command above sets the number of rx queues for each DPDK interface.
- The rx queues are assigned to pmd threads on the same NUMA node in a
- round-robin fashion. For more information, please refer to the
- Open_vSwitch TABLE section in
+ `ovs-appctl dpif-netdev/pmd-rxq-show`
- `man ovs-vswitchd.conf.db`
+ This can also be checked with:
+
+ ```
+ top -H
+ taskset -p <pid_of_pmd>
+ ```
- 4. Exact Match Cache
+ To understand where most of the pmd thread time is spent and whether the
+ caches are being utilized, these commands can be used:
- Each pmd thread contains one EMC. After initial flow setup in the
- datapath, the EMC contains a single table and provides the lowest level
- (fastest) switching for DPDK ports. If there is a miss in the EMC then
- the next level where switching will occur is the datapath classifier.
- Missing in the EMC and looking up in the datapath classifier incurs a
- significant performance penalty. If lookup misses occur in the EMC
- because it is too small to handle the number of flows, its size can
- be increased. The EMC size can be modified by editing the define
- EM_FLOW_HASH_SHIFT in lib/dpif-netdev.c.
+ ```
+ # Clear previous stats
+ ovs-appctl dpif-netdev/pmd-stats-clear
- As mentioned above an EMC is per pmd thread. So an alternative way of
- increasing the aggregate amount of possible flow entries in EMC and
- avoiding datapath classifier lookups is to have multiple pmd threads
- running. This can be done as described in section 2.
+ # Check current stats
+ ovs-appctl dpif-netdev/pmd-stats-show
+ ```
- 5. Compiler options
+3. DPDK port Rx Queues
- The default compiler optimization level is '-O2'. Changing this to
- more aggressive compiler optimizations such as '-O3' or
- '-Ofast -march=native' with gcc can produce performance gains.
+ `ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>`
- 6. Simultaneous Multithreading (SMT)
+ The command above sets the number of rx queues for DPDK interface.
+ The rx queues are assigned to pmd threads on the same NUMA node in a
+ round-robin fashion. For more information, please refer to the
+ Open_vSwitch TABLE section in
- With SMT enabled, one physical core appears as two logical cores
- which can improve performance.
+ `man ovs-vswitchd.conf.db`
- SMT can be utilized to add additional pmd threads without consuming
- additional physical cores. Additional pmd threads may be added in the
- same manner as described in section 2. If trying to minimize the use
- of physical cores for pmd threads, care must be taken to set the
- correct bits in the pmd-cpu-mask to ensure that the pmd threads are
- pinned to SMT siblings.
+4. Exact Match Cache
- For example, when using 2x 10 core processors in a dual socket system
- with HT enabled, /proc/cpuinfo will report 40 logical cores. To use
- two logical cores which share the same physical core for pmd threads,
- the following command can be used to identify a pair of logical cores.
+ Each pmd thread contains one EMC. After initial flow setup in the
+ datapath, the EMC contains a single table and provides the lowest level
+ (fastest) switching for DPDK ports. If there is a miss in the EMC then
+ the next level where switching will occur is the datapath classifier.
+ Missing in the EMC and looking up in the datapath classifier incurs a
+ significant performance penalty. If lookup misses occur in the EMC
+ because it is too small to handle the number of flows, its size can
+ be increased. The EMC size can be modified by editing the define
+ EM_FLOW_HASH_SHIFT in lib/dpif-netdev.c.
- `cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list`
+ As mentioned above an EMC is per pmd thread. So an alternative way of
+ increasing the aggregate amount of possible flow entries in EMC and
+ avoiding datapath classifier lookups is to have multiple pmd threads
+ running. This can be done as described in section 2.
- where N is the logical core number. In this example, it would show that
- cores 1 and 21 share the same physical core. The pmd-cpu-mask to enable
- two pmd threads running on these two logical cores (one physical core)
- is.
+5. Compiler options
- `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=100002`
+ The default compiler optimization level is '-O2'. Changing this to
+ more aggressive compiler optimizations such as '-O3' or
+ '-Ofast -march=native' with gcc can produce performance gains.
- Note that SMT is enabled by the Hyper-Threading section in the
- BIOS, and as such will apply to the whole system. So the impact of
- enabling/disabling it for the whole system should be considered
- e.g. If workloads on the system can scale across multiple cores,
- SMT may very beneficial. However, if they do not and perform best
- on a single physical core, SMT may not be beneficial.
+6. Simultaneous Multithreading (SMT)
- 7. The isolcpus kernel boot parameter
+ With SMT enabled, one physical core appears as two logical cores
+ which can improve performance.
- isolcpus can be used on the kernel bootline to isolate cores from the
- kernel scheduler and hence dedicate them to OVS or other packet
- forwarding related workloads. For example a Linux kernel boot-line
- could be:
+ SMT can be utilized to add additional pmd threads without consuming
+ additional physical cores. Additional pmd threads may be added in the
+ same manner as described in section 2. If trying to minimize the use
+ of physical cores for pmd threads, care must be taken to set the
+ correct bits in the pmd-cpu-mask to ensure that the pmd threads are
+ pinned to SMT siblings.
- 'GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=1G hugepages=4 default_hugepagesz=1G 'intel_iommu=off' isolcpus=1-19"'
+ For example, when using 2x 10 core processors in a dual socket system
+ with HT enabled, /proc/cpuinfo will report 40 logical cores. To use
+ two logical cores which share the same physical core for pmd threads,
+ the following command can be used to identify a pair of logical cores.
- 8. NUMA/Cluster On Die
+ `cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list`
- Ideally inter NUMA datapaths should be avoided where possible as packets
- will go across QPI and there may be a slight performance penalty when
- compared with intra NUMA datapaths. On Intel Xeon Processor E5 v3,
- Cluster On Die is introduced on models that have 10 cores or more.
- This makes it possible to logically split a socket into two NUMA regions
- and again it is preferred where possible to keep critical datapaths
- within the one cluster.
+ where N is the logical core number. In this example, it would show that
+ cores 1 and 21 share the same physical core. The pmd-cpu-mask to enable
+ two pmd threads running on these two logical cores (one physical core)
+ is.
- It is good practice to ensure that threads that are in the datapath are
- pinned to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs
- responsible for forwarding.
+ `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=100002`
- 9. Rx Mergeable buffers
+ Note that SMT is enabled by the Hyper-Threading section in the
+ BIOS, and as such will apply to the whole system. So the impact of
+ enabling/disabling it for the whole system should be considered
+ e.g. If workloads on the system can scale across multiple cores,
+ SMT may very beneficial. However, if they do not and perform best
+ on a single physical core, SMT may not be beneficial.
- Rx Mergeable buffers is a virtio feature that allows chaining of multiple
- virtio descriptors to handle large packet sizes. As such, large packets
- are handled by reserving and chaining multiple free descriptors
- together. Mergeable buffer support is negotiated between the virtio
- driver and virtio device and is supported by the DPDK vhost library.
- This behavior is typically supported and enabled by default, however
- in the case where the user knows that rx mergeable buffers are not needed
- i.e. jumbo frames are not needed, it can be forced off by adding
- rx_mrgbuf=off to the QEMU command line options. By not reserving multiple
- chains of descriptors it will make more individual virtio descriptors
- available for rx to the guest using dpdkvhost ports and this can improve
- performance.
-
- 10. Packet processing in the guest
+7. The isolcpus kernel boot parameter
- It is good practice whether simply forwarding packets from one
- interface to another or more complex packet processing in the guest,
- to ensure that the thread performing this work has as much CPU
- occupancy as possible. For example when the DPDK sample application
- `testpmd` is used to forward packets in the guest, multiple QEMU vCPU
- threads can be created. Taskset can then be used to affinitize the
- vCPU thread responsible for forwarding to a dedicated core not used
- for other general processing on the host system.
-
- 11. DPDK virtio pmd in the guest
-
- dpdkvhostcuse or dpdkvhostuser ports can be used to accelerate the path
- to the guest using the DPDK vhost library. This library is compatible with
- virtio-net drivers in the guest but significantly better performance can
- be observed when using the DPDK virtio pmd driver in the guest. The DPDK
- `testpmd` application can be used in the guest as an example application
- that forwards packet from one DPDK vhost port to another. An example of
- running `testpmd` in the guest can be seen here.
+ isolcpus can be used on the kernel bootline to isolate cores from the
+ kernel scheduler and hence dedicate them to OVS or other packet
+ forwarding related workloads. For example a Linux kernel boot-line
+ could be:
- `./testpmd -c 0x3 -n 4 --socket-mem 512 -- --burst=64 -i --txqflags=0xf00 --disable-hw-vlan --forward-mode=io --auto-start`
+ ```
+ GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=1G hugepages=4
+ default_hugepagesz=1G 'intel_iommu=off' isolcpus=1-19"
+ ```
- See below information on dpdkvhostcuse and dpdkvhostuser ports.
- See [DPDK Docs] for more information on `testpmd`.
+8. NUMA/Cluster On Die
+
+ Ideally inter NUMA datapaths should be avoided where possible as packets
+ will go across QPI and there may be a slight performance penalty when
+ compared with intra NUMA datapaths. On Intel Xeon Processor E5 v3,
+ Cluster On Die is introduced on models that have 10 cores or more.
+ This makes it possible to logically split a socket into two NUMA regions
+ and again it is preferred where possible to keep critical datapaths
+ within the one cluster.
+
+ It is good practice to ensure that threads that are in the datapath are
+ pinned to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs
+ responsible for forwarding. If DPDK is built with
+ CONFIG_RTE_LIBRTE_VHOST_NUMA=y, vHost User ports automatically
+ detect the NUMA socket of the QEMU vCPUs and will be serviced by a PMD
+ from the same node provided a core on this node is enabled in the
+ pmd-cpu-mask.
+
+9. Rx Mergeable buffers
+
+ Rx Mergeable buffers is a virtio feature that allows chaining of multiple
+ virtio descriptors to handle large packet sizes. As such, large packets
+ are handled by reserving and chaining multiple free descriptors
+ together. Mergeable buffer support is negotiated between the virtio
+ driver and virtio device and is supported by the DPDK vhost library.
+ This behavior is typically supported and enabled by default, however
+ in the case where the user knows that rx mergeable buffers are not needed
+ i.e. jumbo frames are not needed, it can be forced off by adding
+ mrg_rxbuf=off to the QEMU command line options. By not reserving multiple
+ chains of descriptors it will make more individual virtio descriptors
+ available for rx to the guest using dpdkvhost ports and this can improve
+ performance.
+
+10. Packet processing in the guest
+
+ It is good practice whether simply forwarding packets from one
+ interface to another or more complex packet processing in the guest,
+ to ensure that the thread performing this work has as much CPU
+ occupancy as possible. For example when the DPDK sample application
+ `testpmd` is used to forward packets in the guest, multiple QEMU vCPU
+ threads can be created. Taskset can then be used to affinitize the
+ vCPU thread responsible for forwarding to a dedicated core not used
+ for other general processing on the host system.
+
+11. DPDK virtio pmd in the guest
+
+ dpdkvhostcuse or dpdkvhostuser ports can be used to accelerate the path
+ to the guest using the DPDK vhost library. This library is compatible with
+ virtio-net drivers in the guest but significantly better performance can
+ be observed when using the DPDK virtio pmd driver in the guest. The DPDK
+ `testpmd` application can be used in the guest as an example application
+ that forwards packet from one DPDK vhost port to another. An example of
+ running `testpmd` in the guest can be seen here.
+ ```
+ ./testpmd -c 0x3 -n 4 --socket-mem 512 -- --burst=64 -i --txqflags=0xf00
+ --disable-hw-vlan --forward-mode=io --auto-start
+ ```
+ See below information on dpdkvhostcuse and dpdkvhostuser ports.
+ See [DPDK Docs] for more information on `testpmd`.
DPDK Rings :
------------
DPDK vhost:
-----------
-DPDK 2.2 supports two types of vhost:
+DPDK 16.04 supports two types of vhost:
1. vhost-user
2. vhost-cuse
DPDK vhost-user Prerequisites:
-------------------------
-1. DPDK 2.2 with vhost support enabled as documented in the "Building and
+1. DPDK 16.04 with vhost support enabled as documented in the "Building and
Installing section"
2. QEMU version v2.1.0+
Following the steps above to create a bridge, you can now add DPDK vhost-user
as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-user ports can
-have arbitrary names.
+have arbitrary names, except that forward and backward slashes are prohibited
+in the names.
- For vhost-user, the name of the port type is `dpdkvhostuser`
`/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide
to your VM on the QEMU command line. More instructions on this can be
found in the next section "DPDK vhost-user VM configuration"
- Note: If you wish for the vhost-user sockets to be created in a
- directory other than `/usr/local/var/run/openvswitch`, you may specify
- another location on the ovs-vswitchd command line like so:
+ - If you wish for the vhost-user sockets to be created in a sub-directory of
+ `/usr/local/var/run/openvswitch`, you may specify this directory in the
+ ovsdb like so:
- `./vswitchd/ovs-vswitchd --dpdk -vhost_sock_dir /my-dir -c 0x1 ...`
+ `./utilities/ovs-vsctl --no-wait \
+ set Open_vSwitch . other_config:vhost-sock-dir=subdir`
DPDK vhost-user VM configuration:
---------------------------------
```
3. Optional: Enable multiqueue support
- QEMU needs to be configured with multiple queues and the number queues
- must be less or equal to Open vSwitch other_config:n-dpdk-rxqs.
- The $q below is the number of queues.
+ The vhost-user interface must be configured in Open vSwitch with the
+ desired amount of queues with:
+
+ ```
+ ovs-vsctl set Interface vhost-user-2 options:n_rxq=<requested queues>
+ ```
+
+ QEMU needs to be configured as well.
+ The $q below should match the queues requested in OVS (if $q is more,
+ packets will not be received).
The $v is the number of vectors, which is '$q x 2 + 2'.
```
-device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v
```
+ If one wishes to use multiple queues for an interface in the guest, the
+ driver in the guest operating system must be configured to do so. It is
+ recommended that the number of queues configured be equal to '$q'.
+
+ For example, this can be done for the Linux kernel virtio-net driver with:
+
+ ```
+ ethtool -L <DEV> combined <$q>
+ ```
+
+ A note on the command above:
+
+ `-L`: Changes the numbers of channels of the specified network device
+
+ `combined`: Changes the number of multi-purpose channels.
+
DPDK vhost-cuse:
----------------
DPDK vhost-cuse Prerequisites:
-------------------------
-1. DPDK 2.2 with vhost support enabled as documented in the "Building and
+1. DPDK 16.04 with vhost support enabled as documented in the "Building and
Installing section"
As an additional step, you must enable vhost-cuse in DPDK by setting the
- following additional flag in `config/common_linuxapp`:
+ following additional flag in `config/common_base`:
`CONFIG_RTE_LIBRTE_VHOST_USER=n`
1. This step is only needed if using an alternative character device.
- The new character device filename must be specified on the vswitchd
- commandline:
+ The new character device filename must be specified in the ovsdb:
- `./vswitchd/ovs-vswitchd --dpdk --cuse_dev_name my-vhost-net -c 0x1 ...`
+ `./utilities/ovs-vsctl --no-wait set Open_vSwitch . \
+ other_config:cuse-dev-name=my-vhost-net`
- Note that the `--cuse_dev_name` argument and associated string must be the first
- arguments after `--dpdk` and come before the EAL arguments. In the example
- above, the character device to be used will be `/dev/my-vhost-net`.
+ In the example above, the character device to be used will be
+ `/dev/my-vhost-net`.
2. This step is only needed if reusing the standard character device. It will
conflict with the kernel vhost character device so the user must first
```
<my-vhost-device> refers to "vhost-net" if using the `/dev/vhost-net`
- device. If you have specificed a different name on the ovs-vswitchd
- commandline using the "--cuse_dev_name" parameter, please specify that
+ device. If you have specificed a different name in the database
+ using the "other_config:cuse-dev-name" parameter, please specify that
filename instead.
2. Disable SELinux or set to permissive mode
this with smaller page sizes.
Platform and Network Interface:
- - Currently it is not possible to use an Intel XL710 Network Interface as a
- DPDK port type on a platform with more than 64 logical cores. This is
- related to how DPDK reports the number of TX queues that may be used by
- a DPDK application with an XL710. The maximum number of TX queues supported
- by a DPDK application for an XL710 is 64. If a user attempts to add an
- XL710 interface as a DPDK port type to a system as described above the
- port addition will fail as OVS will attempt to initialize a TX queue greater
- than 64. This issue is expected to be resolved in a future DPDK release.
- As a workaround a user can disable hyper-threading to reduce the overall
- core count of the system to be less than or equal to 64 when using an XL710
- interface with DPDK.
-
- vHost and QEMU v2.4.0+:
- - For versions of QEMU v2.4.0 and later, it is currently not possible to
- unbind more than one dpdkvhostuser port from the guest kernel driver
- without causing the ovs-vswitchd process to crash. If this is a requirement
- for your use case, it is recommended either to use a version of QEMU
- between v2.2.0 and v2.3.1 (inclusive), or alternatively, to apply the
- following patch to DPDK and rebuild:
- http://dpdk.org/dev/patchwork/patch/7736/
- This problem will likely be resolved in Open vSwitch at a later date, when
- the next release of DPDK (which includes the above patch) is available and
- integrated into OVS.
+ - By default with DPDK 16.04, a maximum of 64 TX queues can be used with an
+ Intel XL710 Network Interface on a platform with more than 64 logical
+ cores. If a user attempts to add an XL710 interface as a DPDK port type to
+ a system as described above, an error will be reported that initialization
+ failed for the 65th queue. OVS will then roll back to the previous
+ successful queue initialization and use that value as the total number of
+ TX queues available with queue locking. If a user wishes to use more than
+ 64 queues and avoid locking, then the
+ `CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF` config parameter in DPDK must be
+ increased to the desired number of queues. Both DPDK and OVS must be
+ recompiled for this change to take effect.
Bug Reporting:
--------------