From: Ciara Loftus Date: Thu, 4 Jun 2015 13:51:40 +0000 (-0700) Subject: netdev-dpdk: add dpdk vhost-user ports X-Git-Tag: v2.4.0~94 X-Git-Url: http://git.cascardo.eti.br/?p=cascardo%2Fovs.git;a=commitdiff_plain;h=7d1ced01772de541d6692c7d5604210e274bcd37 netdev-dpdk: add dpdk vhost-user ports This patch adds support for a new port type to the userspace datapath called dpdkvhostuser. A new dpdkvhostuser port will create a unix domain socket which when provided to QEMU is used to facilitate communication between the virtio-net device on the VM and the OVS port on the host. vhost-cuse ('dpdkvhost') ports are still available as 'dpdkvhostcuse' ports and will be enabled if vhost-cuse support is detected in the DPDK build specified during compilation of the switch. Otherwise, vhost-user ports are enabled. Signed-off-by: Ciara Loftus Acked-by: Flavio Leitner Signed-off-by: Pravin B Shelar --- diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index 462ba0e4c..cdef6cfcb 100644 --- a/INSTALL.DPDK.md +++ b/INSTALL.DPDK.md @@ -16,7 +16,9 @@ OVS needs a system with 1GB hugepages support. Building and Installing: ------------------------ -Required DPDK 2.0, `fuse`, `fuse-devel` (`libfuse-dev` on Debian/Ubuntu) +Required: DPDK 2.0 +Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev` +on Debian/Ubuntu) 1. Configure build & install DPDK: 1. Set `$DPDK_DIR` @@ -32,12 +34,9 @@ Required DPDK 2.0, `fuse`, `fuse-devel` (`libfuse-dev` on Debian/Ubuntu) `CONFIG_RTE_BUILD_COMBINE_LIBS=y` Update `config/common_linuxapp` so that DPDK is built with vhost - libraries; currently, OVS only supports vhost-cuse, so DPDK vhost-user - libraries should be explicitly turned off (they are enabled by default - in DPDK 2.0). + libraries. `CONFIG_RTE_LIBRTE_VHOST=y` - `CONFIG_RTE_LIBRTE_VHOST_USER=n` Then run `make install` to build and install the library. For default install without IVSHMEM: @@ -316,40 +315,164 @@ the vswitchd. DPDK vhost: ----------- -vhost-cuse is only supported at present i.e. not using the standard QEMU -vhost-user interface. It is intended that vhost-user support will be added -in future releases when supported in DPDK and that vhost-cuse will eventually -be deprecated. See [DPDK Docs] for more info on vhost. +DPDK 2.0 supports two types of vhost: -Prerequisites: -1. Insert the Cuse module: +1. vhost-user +2. vhost-cuse - `modprobe cuse` +Whatever type of vhost is enabled in the DPDK build specified, is the type +that will be enabled in OVS. By default, vhost-user is enabled in DPDK. +Therefore, unless vhost-cuse has been enabled in DPDK, vhost-user ports +will be enabled in OVS. +Please note that support for vhost-cuse is intended to be deprecated in OVS +in a future release. -2. Build and insert the `eventfd_link` module: +DPDK vhost-user: +---------------- - `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/` - `make` - `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko` +The following sections describe the use of vhost-user 'dpdkvhostuser' ports +with OVS. -Following the steps above to create a bridge, you can now add DPDK vhost -as a port to the vswitch. +DPDK vhost-user Prerequisites: +------------------------- -`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost` +1. DPDK 2.0 with vhost support enabled as documented in the "Building and + Installing section" -Unlike DPDK ring ports, DPDK vhost ports can have arbitrary names: +2. QEMU version v2.1.0+ -`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost` + QEMU v2.1.0 will suffice, but it is recommended to use v2.2.0 if providing + your VM with memory greater than 1GB due to potential issues with memory + mapping larger areas. -However, please note that when attaching userspace devices to QEMU, the -name provided during the add-port operation must match the ifname parameter -on the QEMU command line. +Adding DPDK vhost-user ports to the Switch: +-------------------------------------- +Following the steps above to create a bridge, you can now add DPDK vhost-user +as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-user ports can +have arbitrary names. -DPDK vhost VM configuration: ----------------------------- + - For vhost-user, the name of the port type is `dpdkvhostuser` - vhost ports use a Linux* character device to communicate with QEMU. + ``` + ovs-ofctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 + type=dpdkvhostuser + ``` + + This action creates a socket located at + `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide + to your VM on the QEMU command line. More instructions on this can be + found in the next section "DPDK vhost-user VM configuration" + Note: If you wish for the vhost-user sockets to be created in a + directory other than `/usr/local/var/run/openvswitch`, you may specify + another location on the ovs-vswitchd command line like so: + + `./vswitchd/ovs-vswitchd --dpdk -vhost_sock_dir /my-dir -c 0x1 ...` + +DPDK vhost-user VM configuration: +--------------------------------- +Follow the steps below to attach vhost-user port(s) to a VM. + +1. Configure sockets. + Pass the following parameters to QEMU to attach a vhost-user device: + + ``` + -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1 + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 + ``` + + ...where vhost-user-1 is the name of the vhost-user port added + to the switch. + Repeat the above parameters for multiple devices, changing the + chardev path and id as necessary. Note that a separate and different + chardev path needs to be specified for each vhost-user device. For + example you have a second vhost-user port named 'vhost-user-2', you + append your QEMU command line with an additional set of parameters: + + ``` + -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2 + -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce + -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2 + ``` + +2. Configure huge pages. + QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports access + a virtio-net device's virtual rings and packet buffers mapping the VM's + physical memory on hugetlbfs. To enable vhost-user ports to map the VM's + memory into their process address space, pass the following paramters + to QEMU: + + ``` + -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages, + share=on + -numa node,memdev=mem -mem-prealloc + ``` + +DPDK vhost-cuse: +---------------- + +The following sections describe the use of vhost-cuse 'dpdkvhostcuse' ports +with OVS. + +DPDK vhost-cuse Prerequisites: +------------------------- + +1. DPDK 2.0 with vhost support enabled as documented in the "Building and + Installing section" + As an additional step, you must enable vhost-cuse in DPDK by setting the + following additional flag in `config/common_linuxapp`: + + `CONFIG_RTE_LIBRTE_VHOST_USER=n` + + Following this, rebuild DPDK as per the instructions in the "Building and + Installing" section. Finally, rebuild OVS as per step 3 in the "Building + and Installing" section - OVS will detect that DPDK has vhost-cuse libraries + compiled and in turn will enable support for it in the switch and disable + vhost-user support. + +2. Insert the Cuse module: + + `modprobe cuse` + +3. Build and insert the `eventfd_link` module: + + ``` + cd $DPDK_DIR/lib/librte_vhost/eventfd_link/ + make + insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko + ``` + +4. QEMU version v2.1.0+ + + vhost-cuse will work with QEMU v2.1.0 and above, however it is recommended to + use v2.2.0 if providing your VM with memory greater than 1GB due to potential + issues with memory mapping larger areas. + Note: QEMU v1.6.2 will also work, with slightly different command line parameters, + which are specified later in this document. + +Adding DPDK vhost-cuse ports to the Switch: +-------------------------------------- + +Following the steps above to create a bridge, you can now add DPDK vhost-cuse +as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-cuse ports can have +arbitrary names. + + - For vhost-cuse, the name of the port type is `dpdkvhostcuse` + + ``` + ovs-ofctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1 + type=dpdkvhostcuse + ``` + + When attaching vhost-cuse ports to QEMU, the name provided during the + add-port operation must match the ifname parameter on the QEMU command + line. More instructions on this can be found in the next section. + +DPDK vhost-cuse VM configuration: +--------------------------------- + + vhost-cuse ports use a Linux* character device to communicate with QEMU. By default it is set to `/dev/vhost-net`. It is possible to reuse this standard device for DPDK vhost, which makes setup a little simpler but it is better practice to specify an alternative character device in order to @@ -415,16 +538,19 @@ DPDK vhost VM configuration: QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a virtio-net device's virtual rings and packet buffers mapping the VM's physical memory on hugetlbfs. To enable vhost-ports to map the VM's - memory into their process address space, pass the following paramters + memory into their process address space, pass the following parameters to QEMU: `-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages, share=on -numa node,memdev=mem -mem-prealloc` + Note: For use with an earlier QEMU version such as v1.6.2, use the + following to configure hugepages instead: -DPDK vhost VM configuration with QEMU wrapper: ----------------------------------------------- + `-mem-path /dev/hugepages -mem-prealloc` +DPDK vhost-cuse VM configuration with QEMU wrapper: +--------------------------------------------------- The QEMU wrapper script automatically detects and calls QEMU with the necessary parameters. It performs the following actions: @@ -450,8 +576,8 @@ qemu-wrap.py -cpu host -boot c -hda -m 4096 -smp 4 netdev=net1,mac=00:00:00:00:00:01 ``` -DPDK vhost VM configuration with libvirt: ------------------------------------------ +DPDK vhost-cuse VM configuration with libvirt: +---------------------------------------------- If you are using libvirt, you must enable libvirt to access the character device by adding it to controllers cgroup for libvirtd using the following @@ -525,7 +651,7 @@ Now you may launch your VM using virt-manager, or like so: `virsh create my_vhost_vm.xml` -DPDK vhost VM configuration with libvirt and QEMU wrapper: +DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper: ---------------------------------------------------------- To use the qemu-wrapper script in conjuntion with libvirt, follow the @@ -553,7 +679,7 @@ steps in the previous section before proceeding with the following steps: the correct emulator location and set any additional options. If you are using a alternative character device name, please set "us_vhost_path" to the location of that device. The script will automatically detect and insert - the correct "vhostfd" value in the QEMU command line arguements. + the correct "vhostfd" value in the QEMU command line arguments. 5. Use virt-manager to launch the VM diff --git a/acinclude.m4 b/acinclude.m4 index d09a73fc1..20391eca6 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -220,6 +220,9 @@ AC_DEFUN([OVS_CHECK_DPDK], [ DPDK_vswitchd_LDFLAGS=-Wl,--whole-archive,$DPDK_LIB,--no-whole-archive AC_SUBST([DPDK_vswitchd_LDFLAGS]) AC_DEFINE([DPDK_NETDEV], [1], [System uses the DPDK module.]) + + OVS_GREP_IFELSE([$RTE_SDK/include/rte_config.h], [define RTE_LIBRTE_VHOST_USER 1], + [], [AC_DEFINE([VHOST_CUSE], [1], [DPDK vhost-cuse support enabled, vhost-user disabled.])]) else RTE_SDK= fi diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 63243d816..3af1ee788 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -16,7 +16,6 @@ #include -#include #include #include #include @@ -26,8 +25,12 @@ #include #include #include +#include #include +#include +#include +#include "dirs.h" #include "dp-packet.h" #include "dpif-netdev.h" #include "list.h" @@ -90,8 +93,8 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF / ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF)) #define NIC_PORT_RX_Q_SIZE 2048 /* Size of Physical NIC RX Queue, Max (n+32<=4096)*/ #define NIC_PORT_TX_Q_SIZE 2048 /* Size of Physical NIC TX Queue, Max (n+32<=4096)*/ -/* Character device cuse_dev_name. */ -static char *cuse_dev_name = NULL; +char *cuse_dev_name = NULL; /* Character device cuse_dev_name. */ +char *vhost_sock_dir = NULL; /* Location of vhost-user sockets */ /* * Maximum amount of time in micro seconds to try and enqueue to vhost. @@ -126,7 +129,7 @@ enum { DRAIN_TSC = 200000ULL }; enum dpdk_dev_type { DPDK_DEV_ETH = 0, - DPDK_DEV_VHOST = 1 + DPDK_DEV_VHOST = 1, }; static int rte_eal_init_ret = ENODEV; @@ -221,6 +224,9 @@ struct netdev_dpdk { /* virtio-net structure for vhost device */ OVSRCU_TYPE(struct virtio_net *) virtio_dev; + /* Identifier used to distinguish vhost devices from each other */ + char vhost_id[PATH_MAX]; + /* In dpdk_list. */ struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex); }; @@ -594,21 +600,51 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[], } static int -netdev_dpdk_vhost_construct(struct netdev *netdev_) +vhost_construct_helper(struct netdev *netdev_) { struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); - int err; if (rte_eal_init_ret) { return rte_eal_init_ret; } + rte_spinlock_init(&netdev->vhost_tx_lock); + return netdev_dpdk_init(netdev_, -1, DPDK_DEV_VHOST); +} + +static int +netdev_dpdk_vhost_cuse_construct(struct netdev *netdev_) +{ + struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); + int err; + ovs_mutex_lock(&dpdk_mutex); - err = netdev_dpdk_init(netdev_, -1, DPDK_DEV_VHOST); + strncpy(netdev->vhost_id, netdev->up.name, sizeof(netdev->vhost_id)); + err = vhost_construct_helper(netdev_); ovs_mutex_unlock(&dpdk_mutex); + return err; +} - rte_spinlock_init(&netdev->vhost_tx_lock); +static int +netdev_dpdk_vhost_user_construct(struct netdev *netdev_) +{ + struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); + int err; + ovs_mutex_lock(&dpdk_mutex); + /* Take the name of the vhost-user port and append it to the location where + * the socket is to be created, then register the socket. + */ + snprintf(netdev->vhost_id, sizeof(netdev->vhost_id), "%s/%s", + vhost_sock_dir, netdev_->name); + err = rte_vhost_driver_register(netdev->vhost_id); + if (err) { + VLOG_ERR("vhost-user socket device setup failure for socket %s\n", + netdev->vhost_id); + } + VLOG_INFO("Socket %s created for vhost-user port %s\n", netdev->vhost_id, netdev_->name); + err = vhost_construct_helper(netdev_); + ovs_mutex_unlock(&dpdk_mutex); return err; } @@ -1607,7 +1643,7 @@ new_device(struct virtio_net *dev) ovs_mutex_lock(&dpdk_mutex); /* Add device to the vhost port with the same name as that passed down. */ LIST_FOR_EACH(netdev, list_node, &dpdk_list) { - if (strncmp(dev->ifname, netdev->up.name, IFNAMSIZ) == 0) { + if (strncmp(dev->ifname, netdev->vhost_id, IF_NAME_SZ) == 0) { ovs_mutex_lock(&netdev->mutex); ovsrcu_set(&netdev->virtio_dev, dev); ovs_mutex_unlock(&netdev->mutex); @@ -1687,7 +1723,7 @@ static const struct virtio_net_device_ops virtio_net_device_ops = }; static void * -start_cuse_session_loop(void *dummy OVS_UNUSED) +start_vhost_loop(void *dummy OVS_UNUSED) { pthread_detach(pthread_self()); /* Put the cuse thread into quiescent state. */ @@ -1698,10 +1734,17 @@ start_cuse_session_loop(void *dummy OVS_UNUSED) static int dpdk_vhost_class_init(void) +{ + rte_vhost_driver_callback_register(&virtio_net_device_ops); + ovs_thread_create("vhost_thread", start_vhost_loop, NULL); + return 0; +} + +static int +dpdk_vhost_cuse_class_init(void) { int err = -1; - rte_vhost_driver_callback_register(&virtio_net_device_ops); /* Register CUSE device to handle IOCTLs. * Unless otherwise specified on the vswitchd command line, cuse_dev_name @@ -1714,7 +1757,14 @@ dpdk_vhost_class_init(void) return -1; } - ovs_thread_create("cuse_thread", start_cuse_session_loop, NULL); + dpdk_vhost_class_init(); + return 0; +} + +static int +dpdk_vhost_user_class_init(void) +{ + dpdk_vhost_class_init(); return 0; } @@ -1923,6 +1973,33 @@ unlock_dpdk: NULL, /* rxq_drain */ \ } +static int +process_vhost_flags(char *flag, char *default_val, int size, + char **argv, char **new_val) +{ + int changed = 0; + + /* Depending on which version of vhost is in use, process the vhost-specific + * flag if it is provided on the vswitchd command line, otherwise resort to + * a default value. + * + * For vhost-user: Process "-cuse_dev_name" to set the custom location of + * the vhost-user socket(s). + * For vhost-cuse: Process "-vhost_sock_dir" to set the custom name of the + * vhost-cuse character device. + */ + if (!strcmp(argv[1], flag) && (strlen(argv[2]) <= size)) { + changed = 1; + *new_val = strdup(argv[2]); + VLOG_INFO("User-provided %s in use: %s", flag, *new_val); + } else { + VLOG_INFO("No %s provided - defaulting to %s", flag, default_val); + *new_val = default_val; + } + + return changed; +} + int dpdk_init(int argc, char **argv) { @@ -1937,27 +2014,29 @@ dpdk_init(int argc, char **argv) argc--; argv++; - /* If the cuse_dev_name parameter has been provided, set 'cuse_dev_name' to - * this string if it meets the correct criteria. Otherwise, set it to the - * default (vhost-net). - */ - if (!strcmp(argv[1], "--cuse_dev_name") && - (strlen(argv[2]) <= NAME_MAX)) { - - cuse_dev_name = strdup(argv[2]); +#ifdef VHOST_CUSE + if (process_vhost_flags("-cuse_dev_name", strdup("vhost-net"), + PATH_MAX, argv, &cuse_dev_name)) { +#else + if (process_vhost_flags("-vhost_sock_dir", strdup(ovs_rundir()), + NAME_MAX, argv, &vhost_sock_dir)) { + struct stat s; + int err; - /* Remove the cuse_dev_name configuration parameters from the argument + err = stat(vhost_sock_dir, &s); + if (err) { + VLOG_ERR("vHostUser socket DIR '%s' does not exist.", + vhost_sock_dir); + return err; + } +#endif + /* Remove the vhost flag configuration parameters from the argument * list, so that the correct elements are passed to the DPDK * initialization function */ argc -= 2; - argv += 2; /* Increment by two to bypass the cuse_dev_name arguments */ + argv += 2; /* Increment by two to bypass the vhost flag arguments */ base = 2; - - VLOG_ERR("User-provided cuse_dev_name in use: /dev/%s", cuse_dev_name); - } else { - cuse_dev_name = "vhost-net"; - VLOG_INFO("No cuse_dev_name provided - defaulting to /dev/vhost-net"); } /* Keep the program name argument as this is needed for call to @@ -2012,11 +2091,25 @@ static const struct netdev_class dpdk_ring_class = netdev_dpdk_get_status, netdev_dpdk_rxq_recv); -static const struct netdev_class dpdk_vhost_class = +static const struct netdev_class dpdk_vhost_cuse_class = NETDEV_DPDK_CLASS( - "dpdkvhost", - dpdk_vhost_class_init, - netdev_dpdk_vhost_construct, + "dpdkvhostcuse", + dpdk_vhost_cuse_class_init, + netdev_dpdk_vhost_cuse_construct, + netdev_dpdk_vhost_destruct, + netdev_dpdk_vhost_set_multiq, + netdev_dpdk_vhost_send, + netdev_dpdk_vhost_get_carrier, + netdev_dpdk_vhost_get_stats, + NULL, + NULL, + netdev_dpdk_vhost_rxq_recv); + +const struct netdev_class dpdk_vhost_user_class = + NETDEV_DPDK_CLASS( + "dpdkvhostuser", + dpdk_vhost_user_class_init, + netdev_dpdk_vhost_user_construct, netdev_dpdk_vhost_destruct, netdev_dpdk_vhost_set_multiq, netdev_dpdk_vhost_send, @@ -2039,7 +2132,11 @@ netdev_dpdk_register(void) dpdk_common_init(); netdev_register_provider(&dpdk_class); netdev_register_provider(&dpdk_ring_class); - netdev_register_provider(&dpdk_vhost_class); +#ifdef VHOST_CUSE + netdev_register_provider(&dpdk_vhost_cuse_class); +#else + netdev_register_provider(&dpdk_vhost_user_class); +#endif ovsthread_once_done(&once); } } diff --git a/lib/netdev.c b/lib/netdev.c index 03a754979..186c1e2e4 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -111,7 +111,8 @@ netdev_is_pmd(const struct netdev *netdev) { return (!strcmp(netdev->netdev_class->type, "dpdk") || !strcmp(netdev->netdev_class->type, "dpdkr") || - !strcmp(netdev->netdev_class->type, "dpdkvhost")); + !strcmp(netdev->netdev_class->type, "dpdkvhostcuse") || + !strcmp(netdev->netdev_class->type, "dpdkvhostuser")); } static void diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c index a1b33dad9..96bb1d8ca 100644 --- a/vswitchd/ovs-vswitchd.c +++ b/vswitchd/ovs-vswitchd.c @@ -72,6 +72,10 @@ main(int argc, char *argv[]) set_program_name(argv[0]); retval = dpdk_init(argc,argv); + if (retval < 0) { + return retval; + } + argc -= retval; argv += retval; @@ -252,9 +256,18 @@ usage(void) daemon_usage(); vlog_usage(); printf("\nDPDK options:\n" - " --dpdk options Initialize DPDK datapath.\n" - " --cuse_dev_name BASENAME override default character device name\n" - " for use with userspace vHost.\n"); + " --dpdk [VHOST] [DPDK] Initialize DPDK datapath.\n" + " where DPDK are options for initializing DPDK lib and VHOST is\n" +#ifdef VHOST_CUSE + " option to override default character device name used for\n" + " for use with userspace vHost\n" + " -cuse_dev_name NAME\n" +#else + " option to override default directory where vhost-user\n" + " sockets are created.\n" + " -vhost_sock_dir DIR\n" +#endif + ); printf("\nOther options:\n" " --unixctl=SOCKET override default control socket name\n" " -h, --help display this help message\n"