cascardo/ovs.git
9 years agoFix remaining "void function returning a value" warning by MSVC.
Gurucharan Shetty [Mon, 15 Sep 2014 17:04:32 +0000 (10:04 -0700)]
Fix remaining "void function returning a value" warning by MSVC.

MSVC complains about a void function returning a value if there is a
statement of the form - 'return foo()' even if foo() has a void return
type.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-atomic-msvc: Disable a compiler warning.
Gurucharan Shetty [Mon, 15 Sep 2014 15:41:14 +0000 (08:41 -0700)]
ovs-atomic-msvc: Disable a compiler warning.

MSVC does not support c11 style atomics for the C compiler.
Windows has different InterLocked* functions for different data
sizes.  ovs-atomic-msvc.h maps the api in ovs-atomic.h (which is similar
to c11 atomics) to the available atomic functions in Windows. In some
cases, this causes compiler warnings about mismatched data sizes because
the generated code has 'if else' conditions on different data sizes and
proper casting is not possible.

In current OVS code base, we get one compiler warning through ovs-rcu.h
which says "‘void *’ differs in levels of indirection from LONGLONG."
This comes from the following in ovs-atomic-msvc.h for atomic_read64():
*(DST) = InterlockedOr64((int64_t volatile *) (SRC), 0);
when *DST is a void pointer (because InterLockedOr64 returns LONGLONG).
But this code path is only every hit for 64 bit data. So it should be safe to
disable the warning. (Any real bugs in api calls would hopefully be caught
while compiling on Linux using gcc/clang).

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Eitan Eliahu <eliahue@vmware.com>
9 years agonetdev-dpdk: Fix thread-safety breach.
Alex Wang [Mon, 15 Sep 2014 20:15:38 +0000 (13:15 -0700)]
netdev-dpdk: Fix thread-safety breach.

dpdk_eth_dev_init() must be called with dpdk_mutex.  However,
netdev_dpdk_set_multiq() fails to follow this rule.  This commit
fixes this breach.

Found by clang.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
9 years agonetdev-dpdk: Make get_config() report correct queue info.
Alex Wang [Mon, 15 Sep 2014 20:01:12 +0000 (13:01 -0700)]
netdev-dpdk: Make get_config() report correct queue info.

With the separation of tx queue and rx queue configuration
in netdev-dpdk module, the netdev_dpdk_get_config() can no
longer report 'n_rxq' as tx queue configuration.

This commit fixes the above issue.

Reported-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
9 years agodpif-netdev: Create multiple pmd threads by default.
Alex Wang [Fri, 5 Sep 2014 21:14:20 +0000 (14:14 -0700)]
dpif-netdev: Create multiple pmd threads by default.

With this commit, ovs by default will create one pmd thread
for each numa node and pin the pmd thread to available cpu
core on the numa node.

NON_PMD_CORE_ID (currently 0) is used to reserve a particular
cpu core for the I/O of all non-pmd threads.  No pmd thread
can be pinned to this reserved core.

As side-effects of this commit:

-  pmd thread will not be created, if there is no dpdk interface
   from the corresponding numa node added to ovs.

- the exact-match cache for non-pmd threads is removed from
  'struct dp_netdev'.  Instead, all non-pmd threads will use
  the exact-match cache defined in the 'struct dp_netdev_pmd_thread'
  for NON_PMD_CORE_ID.

- the rx packet processing functions are refactored to use
  'struct dp_netdev_pmd_thread' as input.

- the 'netdev_send()' function will be called with the proper
  queue id.

- both pmd and non-pmd threads can call the dpif_netdev_execute().
  so, use a per-thread key to help recognize the calling thread.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetdev-dpdk: Remove the tx queue spinlock.
Alex Wang [Fri, 5 Sep 2014 17:56:18 +0000 (10:56 -0700)]
netdev-dpdk: Remove the tx queue spinlock.

The previous commit makes OVS create one tx queue for each
cpu core, each pmd thread will use a separate tx queue.
Also, tx of non-pmd threads on dpdk interface is all through
'NON_PMD_THREAD_TX_QUEUE', protected by the 'nonpmd_mempool_mutex'.
Therefore, the spinlock is no longer needed.  And this commit
removes it from 'struct dpdk_tx_queue'.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetdev-dpdk: Add indicator for flushing tx queue.
Alex Wang [Thu, 4 Sep 2014 20:09:22 +0000 (13:09 -0700)]
netdev-dpdk: Add indicator for flushing tx queue.

Previous commit makes OVS create one tx queue for each cpu
core.  An upcoming patch will allow multiple pmd threads be
created and pinned to cpu cores.  So each pmd thread will use
the tx queue corresponding to its core id.

Moreover, the pmd threads running on different numa node than
the dpdk interface (called non-local pmd thread) will not
handle the rx of the interface.  Consequently, there need to
be a way to flush the tx queues of the non-local pmd threads.

To address the queue flushing issue, this commit introduces a
new flag 'flush_tx' in the 'struct dpdk_tx_queue' which is
set if the queue is to be used by a non-local pmd thread.
Then, when enqueueing the tx pkts, if the flag is set, the tx
queue will always be flushed immediately after the enqueue.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodpif-netdev: Create multiple tx/rx queues when adding dpdk interface.
Alex Wang [Tue, 17 Jun 2014 17:52:20 +0000 (10:52 -0700)]
dpif-netdev: Create multiple tx/rx queues when adding dpdk interface.

Before this commit, ovs creates one tx and one rx queue for
each dpdk interface and uses only one poll thread for handling
I/O of all dpdk interfaces.  An upcoming patch will allow multiple
poll threads be created.  As a preparation, this commit changes
the dpif-netdev to create multiple tx/rx queues when the dpdk
interface is added.

Specifically, the number of rx queues will still be one per-dpdk
interface for this commit.  But upcoming work will allow user
create multiple rx queues.  The number of tx queues will be the
number of cpu cores on the machine.  Although not all the tx queues
will be used, each poll thread will have its own queue for
transmission on the dpdk interface.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetdev: Add function for configuring tx and rx queues.
Alex Wang [Mon, 8 Sep 2014 21:52:54 +0000 (14:52 -0700)]
netdev: Add function for configuring tx and rx queues.

This commit adds a new API to the 'struct netdev_class' which
allows user to configure the number of tx queues and rx queues
of 'netdev'.  Upcoming patches will use this function to set
multiple tx/rx queues when adding the netdev to dpif-netdev.

Currently, only netdev-dpdk module implements this function.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoofproto: Do not update stats on fake bond interface.
Pravin B Shelar [Fri, 12 Sep 2014 23:00:50 +0000 (16:00 -0700)]
ofproto: Do not update stats on fake bond interface.

There are couple of reasons to remove this support:
*   This is used in very old OVS use-case. It is much better
    to read stats directly from OVS.
*   Forthcoming commit will remove support for setting stats
    for vport. The stats update depends on stats-set.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Improve robustness of this_cpu_ptr definition in compat layer
Andy Zhou [Wed, 10 Sep 2014 22:36:06 +0000 (15:36 -0700)]
datapath: Improve robustness of this_cpu_ptr definition in compat layer

Current autoconfig detection logic for HAVE_PER_CPU_PTR is not robust.
Depends on linux kernel version, the definition can be in either
linux/percpu.h or asm/percpu.h

Turns out it is simpler and safer to handle missing percpu.h
definitions in linux/percpu.h rather than asm/percpu.h. With this
change, there is no need for the autoconfig detection logic above.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoovs-dev.py: do not pass --enable-dummy to ovsdb
Daniele Di Proietto [Sat, 13 Sep 2014 01:35:10 +0000 (01:35 +0000)]
ovs-dev.py: do not pass --enable-dummy to ovsdb

--enable-dummy was useless anyway for ovsdb-server. Now it is an error to pass
it.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agoofproto: Increase default datapath max_idle time.
Joe Stringer [Fri, 12 Sep 2014 06:03:56 +0000 (06:03 +0000)]
ofproto: Increase default datapath max_idle time.

The datapath max_idle value determines how long to wait before deleting
an idle datapath flow when operating below the flow_limit. This patch
increases the max_idle to 10 seconds, which allows datapath flows to be
remain cached even if they are used less consistently, and provides a
small improvement in the supported number of flows when operating around
the flow_limit.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
9 years agodatapath: Add IS_ERR_OR_NULL for backward compatibility.
Pravin B Shelar [Fri, 12 Sep 2014 23:03:34 +0000 (16:03 -0700)]
datapath: Add IS_ERR_OR_NULL for backward compatibility.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoopenvswitch: rename ->sync to ->syncp
WANG Cong [Fri, 12 Sep 2014 21:12:24 +0000 (14:12 -0700)]
openvswitch: rename ->sync to ->syncp

Openvswitch defines u64_stats_sync as ->sync rather than ->syncp,
so fails to compile with netdev_alloc_pcpu_stats(). So just rename it to ->syncp.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 1c213bd24ad04f4430031 (net: introduce netdev_alloc_pcpu_stats() for drivers)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: introduce netdev_alloc_pcpu_stats() for drivers
WANG Cong [Fri, 12 Sep 2014 21:05:11 +0000 (14:05 -0700)]
datapath: introduce netdev_alloc_pcpu_stats() for drivers

There are many drivers calling alloc_percpu() to allocate pcpu stats
and then initializing ->syncp. So just introduce a helper function for them.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: Use IS_ERR_OR_NULL
Himangi Saraogi [Fri, 12 Sep 2014 18:34:04 +0000 (11:34 -0700)]
datapath: Use IS_ERR_OR_NULL

This patch introduces the use of the macro IS_ERR_OR_NULL in place of
tests for NULL and IS_ERR.

The following Coccinelle semantic patch was used for making the change:

@@
expression e;
@@

- e == NULL || IS_ERR(e)
+ IS_ERR_OR_NULL(e)
 || ...

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agodatapath: fix duplicate #include headers
Jean Sacren [Fri, 12 Sep 2014 18:31:27 +0000 (11:31 -0700)]
datapath: fix duplicate #include headers

The #include headers net/genetlink.h and linux/genetlink.h both were
included twice, so delete each of the duplicate.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: dev@openvswitch.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: Replace rcu_dereference() with rcu_access_pointer()
Andreea-Cristina Bernat [Fri, 12 Sep 2014 18:26:01 +0000 (11:26 -0700)]
datapath: Replace rcu_dereference() with rcu_access_pointer()

The "rcu_dereference()" call is used directly in a condition.
Since its return value is never dereferenced it is recommended to use
"rcu_access_pointer()" instead of "rcu_dereference()".
Therefore, this patch makes the replacement.

The following Coccinelle semantic patch was used:
@@
@@

(
 if(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
|
 while(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
)

Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetdev: Add n_txq to 'struct netdev'.
Alex Wang [Wed, 3 Sep 2014 21:37:35 +0000 (14:37 -0700)]
netdev: Add n_txq to 'struct netdev'.

This commit adds new variable n_txq to 'struct netdev' for recording
the number of tx queues.  Correspondingly, the send_*() functions are
extended to accept queue id as input argument.

All 'netdev-*' implementation will ignore the queue id since having
multiple tx queues is not supported.  Upcomping patches will start
using it and create multiple tx queues for dpdk netdev.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetdev: Add function for getting the numa node id of netdev.
Alex Wang [Wed, 11 Jun 2014 23:33:08 +0000 (16:33 -0700)]
netdev: Add function for getting the numa node id of netdev.

This commit adds a new API to the 'struct netdev_class' which
allows user to query the numa node id the 'netdev' is on.

Currently, only netdev-dpdk module implements this function.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoovs-rcu: Make ovsrcu_quiesce() flush the callback event set.
Alex Wang [Tue, 9 Sep 2014 18:01:52 +0000 (11:01 -0700)]
ovs-rcu: Make ovsrcu_quiesce() flush the callback event set.

On current master, the per-thread callback event set is flushed
when ovsrcu_quiesce_start() is called or when the callback
event set is full.  For threads that only call 'ovsrcu_quiesce()'
to indicate quiescient state, their callback event set will not
be flushed for execution until the set is full.  And this could
take a very long time.

Theoretically, this should not be an issue, since rcu postponed
callback events should only free the old version of objects.
However, current ovs does not follow this rule, and some callback
events include other activities like unregistering the netdev
from global name-netdev map.  The delay of unregistering the netdev
(by threads that only calls ovsrcu_quiesce()) will prevent the
recreate of same netdev indefinitely.

As a short-term workaround, this commit makes every call to
ovsrcu_quiesce() flush the callback event set.  In the long run,
there will be a refactor of the use of ovs-rcu module, in which all
callback events only free the old version of objects.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoNetlink_socket.c Join/Unjoin an MC group for event subscription
Eitan Eliahu [Thu, 11 Sep 2014 17:01:02 +0000 (10:01 -0700)]
Netlink_socket.c Join/Unjoin an MC group for event subscription

Use a specific out of band device control to subscribe/unsubscribe a socket
to the driver event queue for notification.

Signed-off-by: Eitan Eliahu <eliahue@vmware.com>
Acked-by: Nithin Raju <nithin@vmware.com>
Acked-by: Saurabh Shah <ssaurabh@vmware.com>
Acked-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath-windows/Netlink: Nested attributes put/parse.
Ankur Sharma [Thu, 11 Sep 2014 00:36:22 +0000 (17:36 -0700)]
datapath-windows/Netlink: Nested attributes put/parse.

Added APIs for creating and parsing nested netlink attributes.
APIs are on similar lines as userspace netlink code.

Signed-off-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Nithin Raju <nithin@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath-windows/NetlinkBuf.h: Added NlBufSize
Ankur Sharma [Wed, 10 Sep 2014 23:20:16 +0000 (16:20 -0700)]
datapath-windows/NetlinkBuf.h: Added NlBufSize

Added an inline function to return used size in the buffer.

Signed-off-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Nithin Raju <nithin@vmware.com>
Acked-by: Samuel Ghinet <sghinet@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodebian: Don't depened on $RUNLEVEL at startup to create bridges.
Gurucharan Shetty [Thu, 11 Sep 2014 16:35:10 +0000 (09:35 -0700)]
debian: Don't depened on $RUNLEVEL at startup to create bridges.

Commit b2a0daa5bd (debian: Don't recreate bridges during manual restart.)
added a check on $RUNLEVEL to only create bridges and ports when the
system starts up. This fix does not work with systemd.

This commit uses a different approach to solve the same problem.

Reported-at: https://bugs.debian.org/686518
Reported-by: Philipp S. Schmidt <phils@in-panik.de>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Tested-by: Philipp S. Schmidt <phils@in-panik.de>
9 years agoAvoid uninitialized variable warnings with OBJECT_OFFSETOF() in MSVC.
Gurucharan Shetty [Tue, 9 Sep 2014 21:23:07 +0000 (14:23 -0700)]
Avoid uninitialized variable warnings with OBJECT_OFFSETOF() in MSVC.

Implementation of OBJECT_OFFSETOF() for non-GNUC compilers like MSVC
causes "uninitialized variable" warnings. Since OBJECT_OFFSETOF() is
indirectly used through all the *_FOR_EACH() (through ASSIGN_CONTAINER()
and  OBJECT_CONTAINING()) macros, the OVS build
on Windows gets littered with "uninitialized variable" warnings.
This patch attempts to workaround the problem.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Acked-by: Saurabh Shah <ssaurabh@vmware.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agounixctl: Make command description all lowercase.
Alex Wang [Fri, 22 Aug 2014 23:27:22 +0000 (16:27 -0700)]
unixctl: Make command description all lowercase.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovsdb-server: Remove the 'enable-dummy' option.
Alex Wang [Thu, 21 Aug 2014 20:54:58 +0000 (13:54 -0700)]
ovsdb-server: Remove the 'enable-dummy' option.

There is no use case of this option in ovsdb-server.  Also,
it causes dpif-dummy and netdev-dummy module register unrelated
unixctl commands.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoofproto-dpif: Probe for userdata after backer is fully operational.
Jarno Rajahalme [Thu, 11 Sep 2014 20:27:29 +0000 (13:27 -0700)]
ofproto-dpif: Probe for userdata after backer is fully operational.

When probing for variable length userdata before handler threads are
set, the pid included in the userspace action will be 0, which is
flagged as an error by the linux kernel datapath.  As a result the
feature probe will produce an unnecessary log message.  By probing for
variable length userdata later the probe works as intended and the
unnecessary log message is avoided.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agohash.h: Avoid compiler warnings with MSVC.
Gurucharan Shetty [Tue, 9 Sep 2014 21:16:16 +0000 (14:16 -0700)]
hash.h: Avoid compiler warnings with MSVC.

The lack of 'const' in function declaration causes MSVC to complain
because the function definition uses it.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-ofctl: Workaround a compiler warning on MSVC.
Gurucharan Shetty [Tue, 9 Sep 2014 21:12:57 +0000 (14:12 -0700)]
ovs-ofctl: Workaround a compiler warning on MSVC.

MSVC complains about a void function returning a value if there is a
statement of the form - 'return foo()' even if foo() has a void return
type.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agotravis: Fix DPDK build and treat bad-function-cast warning as non-error
Thomas Graf [Thu, 11 Sep 2014 19:34:22 +0000 (21:34 +0200)]
travis: Fix DPDK build and treat bad-function-cast warning as non-error

A missing " prevented the DPDK build in the matrix from functioning
so far. This patch enables the DPDK build by properly building DPDK
as a single library and by pointing the OVS build to the corresponding
build directory. Also removes the 'make install' as it is not required
and only slows down the build.

Due to incorrect casts in the DPDK headers, we have to disable
bad-function-cast and cast-align warnings as being treated as errors
for now.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Co-authored-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agobuild: Respect CFLAGS and LDFLAGS passed to make
Thomas Graf [Thu, 11 Sep 2014 19:34:21 +0000 (21:34 +0200)]
build: Respect CFLAGS and LDFLAGS passed to make

configure cannot expect that the user will not pass additional CFLAGS
and LDFLAGS at make time [0]. Use OVS_CFLAGS and OVS_LDFLAGS instead to
collect compiler and linker flags and substitute in Makefile.am.

This allows for:
./configure --with-dpdk=[...]
make CFLAGS=-Wno-error=foo

[0] http://www.gnu.org/software/automake/manual/html_node/Flag-Variables-Ordering.html

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Add this_cpu_{read, inc, dec} APIs for backward compatibility
Andy Zhou [Wed, 10 Sep 2014 20:22:08 +0000 (13:22 -0700)]
datapath: Add this_cpu_{read, inc, dec} APIs for backward compatibility

The upstream modules uses this_cpu_xxx APIs. Add those functions for
older kernel (<3.0.0) that does not provide them.

VMware-BZ: #1319082

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetlink-socket: Convert from error number to string correctly.
Gurucharan Shetty [Tue, 9 Sep 2014 18:55:45 +0000 (11:55 -0700)]
netlink-socket: Convert from error number to string correctly.

As mentioned in the comment above the function ovs_strerror(), it
should not be used to convert WINAPI error numbers to string.
Use ovs_lasterror_to_string() instead.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
9 years agodatapath: Backport __ip_select_ident() function
Pravin B Shelar [Wed, 25 Sep 2013 01:42:43 +0000 (18:42 -0700)]
datapath: Backport __ip_select_ident() function

definition of __ip_select_ident() changed in newer kernel and
it is backported to stable kernel, Therefore adding configure
check to detect the new function.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agoopenvswitch.h: Fix the type of struct ovs_key_nd nd_target field.
Jarno Rajahalme [Wed, 10 Sep 2014 20:02:46 +0000 (13:02 -0700)]
openvswitch.h: Fix the type of struct ovs_key_nd nd_target field.

Should be the same as other IPv6 address fields.

Current master produces sparse warnings without this change.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoovs-vtep: Handle physical ports with '-' in its name.
Gurucharan Shetty [Wed, 10 Sep 2014 17:32:26 +0000 (10:32 -0700)]
ovs-vtep: Handle physical ports with '-' in its name.

As of now, if a physical port has a '-' in its name, ovs-vtep
throws a ValueError exception. This patch fixes the problem.

Reported-by: Mark Maglana <mmaglana@gmail.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agolib/rstp-common: Remove double spaces.
Daniele Venturino [Wed, 10 Sep 2014 16:28:03 +0000 (16:28 +0000)]
lib/rstp-common: Remove double spaces.

Signed-off by: Daniele Venturino <daniele.venturino@m3s.it>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agolib/rstp: Use ovs_refcount_unref_relaxed.
Daniele Venturino [Wed, 10 Sep 2014 16:28:01 +0000 (16:28 +0000)]
lib/rstp: Use ovs_refcount_unref_relaxed.

Access to RSTP and RSTP port objects is protected by a mutex, so the
refcount unref operation can have relaxed memory order semantics (See
commit 24f8381214966e90819bf4a9ecabf076cbfc1b08).

Signed-off by: Daniele Venturino <daniele.venturino@m3s.it>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agolib/rstp: Use RSTP_OPER_P2P_MAC_STATE_ENABLED instead of 1.
Daniele Venturino [Wed, 10 Sep 2014 16:28:00 +0000 (16:28 +0000)]
lib/rstp: Use RSTP_OPER_P2P_MAC_STATE_ENABLED instead of 1.

Signed-off by: Daniele Venturino <daniele.venturino@m3s.it>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agodatapath-windows: update CodingStyle guideline for variable names
Nithin Raju [Tue, 9 Sep 2014 16:02:36 +0000 (09:02 -0700)]
datapath-windows: update CodingStyle guideline for variable names

During a review, it seemed that some of the conventions were not clear.
Fixing them in this patch.

Signed-off-by: Nithin Raju <nithin@vmware.com>
Reported-by: Samuel Ghinet <sghinet@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Acked-by: Samuel Ghinet <sghinet@cloudbasesolutions.com>
9 years agorconn: Prevent redefinition of 'MAX_MONITORS' in Windows.
Gurucharan Shetty [Tue, 9 Sep 2014 20:19:22 +0000 (13:19 -0700)]
rconn: Prevent redefinition of 'MAX_MONITORS' in Windows.

Windows already has a MAX_MONITORS defined in ddeml.h.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agoofproto-dpif-xlate: Work around Linux netdev_max_backlog limit.
Ben Pfaff [Tue, 9 Sep 2014 22:06:52 +0000 (15:06 -0700)]
ofproto-dpif-xlate: Work around Linux netdev_max_backlog limit.

Linux has an internal queue that temporarily holds packets transmitted to
certain network devices.  If too many packets are transmitted to such
network devices within a single list of actions, then packets tend to get
dropped.  Broadcast or flooded or multicast packets on bridges with
thousands of ports are examples of how this can occur.

This commit avoids the problem by implementing a flow in userspace when it
outputs its packet more times than the maximum length of the queue.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Tested-by: Flavio Leitner <fbl@redhat.com>
9 years agolib/odp-util: Reduce duplicated code.
Jarno Rajahalme [Fri, 5 Sep 2014 22:44:20 +0000 (15:44 -0700)]
lib/odp-util: Reduce duplicated code.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/odp-util: Fix mapping to Netlink frag mask.
Jarno Rajahalme [Fri, 5 Sep 2014 22:44:20 +0000 (15:44 -0700)]
lib/odp-util: Fix mapping to Netlink frag mask.

The frag member in the Netlink interface is an uint8_t enumeration
type, not a bitfield, so it should always be either fully masked or
not masked at all.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/odp: Use masked set actions.
Jarno Rajahalme [Fri, 5 Sep 2014 23:00:49 +0000 (16:00 -0700)]
lib/odp: Use masked set actions.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoofproto: Probe for masked set action support.
Jarno Rajahalme [Fri, 5 Sep 2014 22:44:20 +0000 (15:44 -0700)]
ofproto: Probe for masked set action support.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/odp-util: Skip ignored fields when parsing and formatting.
Jarno Rajahalme [Tue, 9 Sep 2014 21:50:36 +0000 (14:50 -0700)]
lib/odp-util: Skip ignored fields when parsing and formatting.

When a whole field of a key value is ignored, skip it when formatting
the key, and allow it to be left out when parsing the key from a
string.  However, when the 'verbose' formatting is requested those are
still formatted, as it may help in debugging.

Now the named key fields can also be given in arbitrary order.
Duplicate field values are not checked for, so the last one will
remain in effect.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/odp-execute: Use dpif_packet_set_dp_hash() instead of ->dp_hash
Daniele Di Proietto [Tue, 9 Sep 2014 21:21:41 +0000 (14:21 -0700)]
lib/odp-execute: Use dpif_packet_set_dp_hash() instead of ->dp_hash

When building with DPDK support, 'struct dpif_packet' won't have 'dp_hash'
member. dpif_packet_set_dp_hash() and dpif_packet_get_dp_hash() should be used.

Furthermore, the masked set action shouldn't read 'md->dp_hash' (which is
shared in a batch), but should use dpif_packet_get_dp_hash() to get each packet
private hash.

This commit fixes the build with DPDK.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agonetlink-socket: remove local variable in do_lookup_genl_family.
Nithin Raju [Tue, 9 Sep 2014 20:50:57 +0000 (13:50 -0700)]
netlink-socket: remove local variable in do_lookup_genl_family.

'sock' is not initialized and hence should not be un-initialized
as well in the failure path.

Reported-by: Gurucharan Shetty <shettyg@nicira.com>
Signed-off-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agodatapath-windows: refactor code to setup dump start state
Nithin Raju [Tue, 9 Sep 2014 20:14:31 +0000 (13:14 -0700)]
datapath-windows: refactor code to setup dump start state

Per review comment, in this patch we refactor the code to create a
OvsSetupDumpStart() which can be leveraged by dump functions in the
future. I have not refactored the code that continues the dump
operation primarily since it is not final yet. Once the netlink set
APIs are in place, we can refactor that too.

Signed-off-by: Nithin Raju <nithin@vmware.com>
Acked-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Samuel Ghinet <sghinet@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agonetlink-socket: Add support for async notification on Windows.
Eitan Eliahu [Tue, 9 Sep 2014 03:08:12 +0000 (20:08 -0700)]
netlink-socket: Add support for async notification on Windows.

We keep an outstanding, out of band, I/O request in the driver at all time.
Once an event generated the driver queues the event message, completes the
pending I/O and unblocks the calling thread through setting the event in the
overlapped structure in the NL socket. The thread will read all all event
messages synchronously through the call of nl_sock_recv()

Signed-off-by: Eitan Eliahu <eliahue@vmware.com>
Acked-by: Samuel Ghinet <sghinet@cloudbasesolutions.com>
Acked-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean at cloudbasesolutions.com>
Acked-by: Saurabh Shah <ssaurabh@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoNEWS: Mention RSTP.
Jarno Rajahalme [Tue, 9 Sep 2014 18:21:36 +0000 (11:21 -0700)]
NEWS: Mention RSTP.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agolib/rstp: Use hmap instead of a list for ports.
Jarno Rajahalme [Fri, 22 Aug 2014 16:01:36 +0000 (09:01 -0700)]
lib/rstp: Use hmap instead of a list for ports.

Finding a given port is faster.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Venturino <daniele.venturino@m3s.it>
9 years agolib/rstp: Eliminate ports_count.
Jarno Rajahalme [Tue, 9 Sep 2014 18:13:26 +0000 (11:13 -0700)]
lib/rstp: Eliminate ports_count.

It was only used to guard against unintialized list.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Venturino <daniele.venturino@m3s.it>
9 years agolib/rstp: Simplify priority vector comparison.
Jarno Rajahalme [Fri, 22 Aug 2014 16:01:36 +0000 (09:01 -0700)]
lib/rstp: Simplify priority vector comparison.

Testing for sameness first makes the logic simpler to follow.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Venturino <daniele.venturino@m3s.it>
9 years agolib/rstp: CodingStyle fixes.
Jarno Rajahalme [Fri, 22 Aug 2014 16:01:35 +0000 (09:01 -0700)]
lib/rstp: CodingStyle fixes.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Venturino <daniele.venturino@m3s.it>
9 years agolib/rstp: Remove lock recursion.
Jarno Rajahalme [Fri, 22 Aug 2014 16:01:35 +0000 (09:01 -0700)]
lib/rstp: Remove lock recursion.

Change the RSTP send_bpdu interface so that a recursive mutex is not
needed.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Venturino <daniele.venturino@m3s.it>
9 years agolib/rstp: More robust thread safety.
Jarno Rajahalme [Tue, 9 Sep 2014 18:11:18 +0000 (11:11 -0700)]
lib/rstp: More robust thread safety.

Current code expects there to be a single thread that is responsible
for creating rstp and creating and deleting rstp_port objects.  rstp
objects are also deleted from other threads, as managed by reference
counting.

rstp port objects are not reference counted, which means that
references to rstp ports may only be held while holding the rstp
mutex, or by the thread that creates and deletes them.

This patch adds reference counting to RSTP ports, which allows ports
to be passed from ofproto-dpif to ofproto-dpif-xlate without using the
RSTP port number.  This simplifies RSTP port reconfiguration, as the
port need not be resynchronized with xlate if just the port number
changes.  This also avoids lookups on the processing of RSTP BPDUs.

This patch also:

1. Exposes the rstp mutex so that related thread safety annotations
   can be used also within rstp-state-machines.c.

2. Internal variants of most setter an getter functions are defined,
   suffixed with two underscores.  These are annotated to be callable
   only when the mutex is held.

3. Port setters were only called in a specific pattern.  The new external
   port setter combines them in a single rspt_port_set() function.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Venturino <daniele.venturino@m3s.it>
9 years agolib/rstp: Inline trivial predicate functions.
Jarno Rajahalme [Fri, 22 Aug 2014 16:01:35 +0000 (09:01 -0700)]
lib/rstp: Inline trivial predicate functions.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Venturino <daniele.venturino@m3s.it>
9 years agolib/rstp: CodingStyle changes.
Jarno Rajahalme [Fri, 22 Aug 2014 16:01:35 +0000 (09:01 -0700)]
lib/rstp: CodingStyle changes.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Venturino <daniele.venturino@m3s.it>
9 years agolib/rstp: Refactor port initialization.
Jarno Rajahalme [Fri, 22 Aug 2014 16:01:35 +0000 (09:01 -0700)]
lib/rstp: Refactor port initialization.

Prior to this patch the default values for ports were set in three
different places.  This refactors them all to one helper function.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Venturino <daniele.venturino@m3s.it>
9 years agolib/rstp: Refactor port number allocation.
Jarno Rajahalme [Fri, 22 Aug 2014 16:01:35 +0000 (09:01 -0700)]
lib/rstp: Refactor port number allocation.

Port number allocation was O(N^3), this refactoring will make it O(N^2).

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Venturino <daniele.venturino@m3s.it>
9 years agolib/rstp: Refactor priority vector recalculation.
Jarno Rajahalme [Fri, 22 Aug 2014 16:01:35 +0000 (09:01 -0700)]
lib/rstp: Refactor priority vector recalculation.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Venturino <daniele.venturino@m3s.it>
9 years agolib/rstp: Better debug messages, style fixes.
Jarno Rajahalme [Tue, 9 Sep 2014 16:01:16 +0000 (09:01 -0700)]
lib/rstp: Better debug messages, style fixes.

Remove unused struct rstp_priority_vector4 definition, fix coding
style, fix sparse warnings.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Venturino <daniele.venturino@m3s.it>
9 years agolib/stp: Some debugging support.
Jarno Rajahalme [Fri, 22 Aug 2014 16:01:34 +0000 (09:01 -0700)]
lib/stp: Some debugging support.

Set the stp port name before enabling it, so that debugging messages
have the name to print out.

Do not treat the first state initialization as a state change.  Zero
is not a valid state, so changing from zero to STP_DISABLED is not a
state change.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Venturino <daniele.venturino@m3s.it>
9 years agolib/stp,rstp: Add unit more unit tests.
Jarno Rajahalme [Fri, 22 Aug 2014 16:01:34 +0000 (09:01 -0700)]
lib/stp,rstp: Add unit more unit tests.

Existing STP and RSTP test cases only test the protocols with test
utilities.  These tests test them as part of OVS using the
netdev-dummy device.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Venturino <daniele.venturino@m3s.it>
9 years agovswitch.xml: Fix RSTP configuration documentation.
Jarno Rajahalme [Fri, 22 Aug 2014 16:01:35 +0000 (09:01 -0700)]
vswitch.xml: Fix RSTP configuration documentation.

Move port's configuration options where they belong, add typing, and
correct errors.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Venturino <daniele.venturino@m3s.it>
9 years agoRapid Spanning Tree Protocol (IEEE 802.1D).
Daniele Venturino [Fri, 22 Aug 2014 16:01:34 +0000 (09:01 -0700)]
Rapid Spanning Tree Protocol (IEEE 802.1D).

This is the v5 from June 12th, 2014, rebased to OVS master, further
changes in following patches.

Signed-off by: Daniele Venturino <daniele.venturino@m3s.it>
Signed-off by: Martino Fornasa <mf@fornasa.it>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Venturino <daniele.venturino@m3s.it>
9 years agolib/dpif-netdev: Make emc_mutex recursive.
Jarno Rajahalme [Mon, 8 Sep 2014 22:33:00 +0000 (15:33 -0700)]
lib/dpif-netdev: Make emc_mutex recursive.

dpif_netdev_execute may be called while doing upcall processing.
Since the context of the input port is not tracked upto this point, we
use the shared dp->emc_cache for packet execution, where the emc_cache
is needed for recirculation.

While recursive mutexes can make thread safety analysis hard, for now
we change emc_mutex to be recursive.  Forthcoming new unit tests will
fail with the current non-recursive mutex.  Later improvements may
remove the need for this recursion.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
9 years agolib/odp-util: Add tunnel tp_src, tp_dst parsing and formatting.
Jarno Rajahalme [Fri, 5 Sep 2014 22:44:20 +0000 (15:44 -0700)]
lib/odp-util: Add tunnel tp_src, tp_dst parsing and formatting.

tp_src and tp_dst fields were recently added to struct flow_tnl, but
parsing and printing was missing.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib: Unify flags parsing and formatting.
Jarno Rajahalme [Fri, 5 Sep 2014 22:44:20 +0000 (15:44 -0700)]
lib: Unify flags parsing and formatting.

Use the "+-" syntax more uniformly when printing masked flags, and use
the syntax of delimited 1-flags also for formatting fully masked TCP
flags.

The "+-" syntax only deals with masked flags, but if there are many of
those, the printout becomes long and confusing.  Typically there are
many flags only when flags are fully masked, but even then most of
them are zeros, so it makes sense to print the flags that are set
(ones) and omit the zero flags.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/odp-util: Refine odp_mask_attr_is_exact().
Jarno Rajahalme [Fri, 5 Sep 2014 22:44:20 +0000 (15:44 -0700)]
lib/odp-util: Refine odp_mask_attr_is_exact().

Some attributes are exact matches even when all bits are not ones.
Make odp_mask_attr_is_exact() to return true if the mask is set for
all the bits we actually care about.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/util: Change is_all_zeros and is_all_ones to take a void *.
Jarno Rajahalme [Fri, 5 Sep 2014 22:44:19 +0000 (15:44 -0700)]
lib/util: Change is_all_zeros and is_all_ones to take a void *.

is_all_zeros() and is_all_ones() operate on bytes, but just like with
memset, it is easier to use if the first argument is a void *.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/odp: Masked set action execution and printing.
Jarno Rajahalme [Fri, 5 Sep 2014 22:44:19 +0000 (15:44 -0700)]
lib/odp: Masked set action execution and printing.

Add a new action type OVS_ACTION_ATTR_SET_MASKED, and support for
parsing, printing, and committing them.

Masked set actions add a mask, immediately following the netlink
attribute data, within the netlink attribute itself.  Thus the key
attribute size for a masked set action is exactly double of the
non-masked set action.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Remove unused dp parameter.
Pravin B Shelar [Mon, 8 Sep 2014 19:46:03 +0000 (12:46 -0700)]
datapath: Remove unused dp parameter.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agoofproto-dpif-upcall: Fix a free of uninitialized memory.
Alex Wang [Mon, 8 Sep 2014 17:41:36 +0000 (10:41 -0700)]
ofproto-dpif-upcall: Fix a free of uninitialized memory.

On current master, when 'upcall_receive()' returns error, the
ofpbuf 'upcall->put_actions' is uninitialized.  In some usecase,
the failure of 'upcall_receive()' will cause uninitialize of
'upcall->put_actions' and free of uninitialized pointer.

This commit fixes the issue by making the caller not conduct
the uninitialize of the 'upcall' when there is error.

Found by inspection.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-numa: Fix a missing initialization.
Alex Wang [Mon, 8 Sep 2014 15:24:15 +0000 (08:24 -0700)]
ovs-numa: Fix a missing initialization.

This commit updates the pointer to 'struct numa_node'
when initializing the 'struct cpu_core'.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Set packet egress_tun_info.
Pravin B Shelar [Sun, 7 Sep 2014 22:18:07 +0000 (15:18 -0700)]
datapath: Set packet egress_tun_info.

packet execute is setting egress_tun_info in skb->cb, rather
than packet->cb. skb is netlink msg skb. This causes corruption
in netlink skb state stored in skb->cb (NETLINK_CB) which
results in following deadlock in netlink code.

=============================================
[ INFO: possible recursive locking detected ]
3.2.62 #2
---------------------------------------------
handler55/22851 is trying to acquire lock:
 (genl_mutex){+.+.+.}, at: [<ffffffff81471ad7>] genl_lock+0x17/0x20

but task is already holding lock:
 (genl_mutex){+.+.+.}, at: [<ffffffff81471ad7>] genl_lock+0x17/0x20

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(genl_mutex);
  lock(genl_mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

1 lock held by handler55/22851:
 #0:  (genl_mutex){+.+.+.}, at: [<ffffffff81471ad7>] genl_lock+0x17/0x20

stack backtrace:
Pid: 22851, comm: handler55 Tainted: G           O 3.2.62 #2
Call Trace:
 [<ffffffff81097bb2>] print_deadlock_bug+0xf2/0x100
 [<ffffffff81099b99>] validate_chain+0x579/0x860
 [<ffffffff8109a17c>] __lock_acquire+0x2fc/0x4f0
 [<ffffffff8109aab0>] lock_acquire+0xa0/0x180
 [<ffffffff81519070>] __mutex_lock_common+0x60/0x420
 [<ffffffff8151959a>] mutex_lock_nested+0x4a/0x60
 [<ffffffff81471ad7>] genl_lock+0x17/0x20
 [<ffffffff81471af6>] genl_rcv+0x16/0x40
 [<ffffffff8146ff72>] netlink_unicast+0x2f2/0x310
 [<ffffffff81470159>] netlink_ack+0x109/0x1f0
 [<ffffffff8147030b>] netlink_rcv_skb+0xcb/0xd0
 [<ffffffff81471b05>] genl_rcv+0x25/0x40
 [<ffffffff8146ff72>] netlink_unicast+0x2f2/0x310
 [<ffffffff8147134c>] netlink_sendmsg+0x28c/0x3d0
 [<ffffffff8143375f>] sock_sendmsg+0xef/0x120
 [<ffffffff81435766>] ___sys_sendmsg+0x416/0x430
 [<ffffffff81435949>] __sys_sendmsg+0x49/0x90
 [<ffffffff814359a9>] sys_sendmsg+0x19/0x20
 [<ffffffff8152432b>] system_call_fastpath+0x16/0x1b

Reported-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agodatapath: distinguish between the dropped and consumed skb
Li RongQing [Sun, 7 Sep 2014 21:49:02 +0000 (14:49 -0700)]
datapath: distinguish between the dropped and consumed skb

distinguish between the dropped and consumed skb, not assume the skb
is consumed always

Cc: Thomas Graf <tgraf@noironetworks.com>
Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: fix panic with multiple vlan headers
Jiri Benc [Sun, 7 Sep 2014 21:36:01 +0000 (14:36 -0700)]
datapath: fix panic with multiple vlan headers

When there are multiple vlan headers present in a received frame, the
first one is put into vlan_tci and protocol is set to ETH_P_8021Q.
Anything in the skb beyond the VLAN TPID may be still non-linear,
including the inner TCI and ethertype. While ovs_flow_extract takes
care of IP and IPv6 headers, it does nothing with ETH_P_8021Q. Later,
if OVS_ACTION_ATTR_POP_VLAN is executed, __pop_vlan_tci pulls the
next vlan header into vlan_tci.

This leads to two things:

1. Part of the resulting ethernet header is in the non-linear part of
   the skb. When eth_type_trans is called later as the result of
   OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also,
   __pop_vlan_tci is in fact accessing random data when it reads
   past the TPID.

2. network_header points into the ethernet header instead of behind it.
   mac_len is set to a wrong value (10), too.

Reported-by: Yulong Pei <ypei@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
I have dropped second change. Since it assumes inner mac header is of
ETH_HLEN len which is not always true.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: Implement recirc action without recursion
Andy Zhou [Mon, 11 Aug 2014 07:14:05 +0000 (00:14 -0700)]
datapath: Implement recirc action without recursion

Since kernel stack is limited in size, it is not wise to using
recursive function with large stack frames.

This patch provides an alternative implementation of recirc action
without using recursion.

A per CPU fixed sized, 'deferred action FIFO', is used to store either
recirc or sample actions encountered during execution of an action
list. Not executing recirc or sample action in place, but rather execute
them laster as 'deferred actions' avoids recursion.

Deferred actions are only executed after all other actions has been
executed, including the ones triggered by loopback from the kernel
network stack.

The size of the private FIFO, currently set to 20, limits the number
of total 'deferred actions' any one packet can accumulate.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: Remove recirc stack depth limit check
Andy Zhou [Fri, 15 Aug 2014 08:53:30 +0000 (01:53 -0700)]
datapath: Remove recirc stack depth limit check

Future patches will change the recirc action implementation to not
using recursion. The stack depth detection is no longer necessary.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoovs-numa: Add module description.
Alex Wang [Fri, 5 Sep 2014 06:17:34 +0000 (06:17 +0000)]
ovs-numa: Add module description.

Add a short description of the module and its assumption.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoovs-numa: Add function for getting numa node id from core id.
Alex Wang [Fri, 5 Sep 2014 06:17:33 +0000 (06:17 +0000)]
ovs-numa: Add function for getting numa node id from core id.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoovs-numa: Relax the ovs_numa_*() input argument check.
Alex Wang [Fri, 5 Sep 2014 06:17:32 +0000 (06:17 +0000)]
ovs-numa: Relax the ovs_numa_*() input argument check.

Many of the ovs_numa_*() functions abort the program when the
input cpu socket or core id is invalid.  This commit relaxes
the input check and makes these functions return OVS_*_UNSPEC
when the check fails.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoovs-numa: Replace name 'cpu_socket' with 'numa_node'.
Alex Wang [Fri, 5 Sep 2014 06:17:31 +0000 (06:17 +0000)]
ovs-numa: Replace name 'cpu_socket' with 'numa_node'.

'numa' and 'socket' are currently used interchangeably in ovs-numa.
But they are not always equivalent as some platform can have multiple
sockets on a numa node.  To avoid confusion, this commit renames all
the 'cpu_socket' to 'numa_node'.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agocccl: Ability to enable compiler optimization.
Gurucharan Shetty [Thu, 28 Aug 2014 16:25:56 +0000 (09:25 -0700)]
cccl: Ability to enable compiler optimization.

MSVC has a '-O2' compiler optimization flag which makes code run
fast and is the recommended option for released code. For e.g.,
running "./tests/ovstest.exe test-cmap benchmark 1000000 3 1"
shows a 3x improvement for some cmap micro-benchmarks.

In the Visual Studio world, there is a concept of "release" build
(fast code, harder to debug) and a "debug" build (easier to debug).
The IDE provides this option and the IDE users expect something similar
for command line build.

So this commit, introduces a "--with-debug" configure option for Windows
and does not use '-O2' as a compiler option when specified. This can
be extended further if there are more compiler options that distinguish
a "release" build vs "debug" build.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Saurabh Shah <ssaurabh@vmware.com>
9 years agocccl: Enable ability to parallel build.
Gurucharan Shetty [Thu, 28 Aug 2014 16:20:21 +0000 (09:20 -0700)]
cccl: Enable ability to parallel build.

The /FS option allows serial access to PDB file creation letting
parallel builds succeed with mingw32-make (with some tricks). The
'make' that comes with MSYS has a bug that causes hangs with
parallel builds which supposedly has been fixed in the upcoming
1.0.19 release.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Saurabh Shah <ssaurabh@vmware.com>
9 years agoovs-atomics: Add atomic support Windows.
Gurucharan Shetty [Thu, 21 Aug 2014 20:57:37 +0000 (13:57 -0700)]
ovs-atomics: Add atomic support Windows.

Before this change (i.e., with pthread locks for atomics on Windows),
the benchmark for cmap and hmap was as follows:

$ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1
Benchmarking with n=10000000, 3 threads, 1.00% mutations:
cmap insert:  61070 ms
cmap iterate:  2750 ms
cmap search:  14238 ms
cmap destroy:  8354 ms

hmap insert:   1701 ms
hmap iterate:   985 ms
hmap search:   3755 ms
hmap destroy:  1052 ms

After this change, the benchmark is as follows:
$ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1
Benchmarking with n=10000000, 3 threads, 1.00% mutations:
cmap insert:   3666 ms
cmap iterate:   365 ms
cmap search:   2016 ms
cmap destroy:  1331 ms

hmap insert:   1495 ms
hmap iterate:  1026 ms
hmap search:   4167 ms
hmap destroy:  1046 ms

So there is clearly a big improvement for cmap.

But the correspondig test on Linux (with gcc 4.6) yeilds the following:

./tests/ovstest test-cmap benchmark 10000000 3 1
Benchmarking with n=10000000, 3 threads, 1.00% mutations:
cmap insert:   3917 ms
cmap iterate:   355 ms
cmap search:    871 ms
cmap destroy:  1158 ms

hmap insert:   1988 ms
hmap iterate:  1005 ms
hmap search:   5428 ms
hmap destroy:   980 ms

So for this particular test, except for "cmap search", Windows and
Linux have similar performance. Windows is around 2.5x slower in "cmap search"
compared to Linux. This has to be investigated.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
[With a lot of inputs and help from Jarno]
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agoAUTHORS: Add Ariel Tubaltsev to AUTHORS.
Gurucharan Shetty [Thu, 4 Sep 2014 22:55:56 +0000 (15:55 -0700)]
AUTHORS: Add Ariel Tubaltsev to AUTHORS.

I missed it while adding commit 6ee1400bbff(vtep: additions to BFD
configuration and status reporting)

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agodatapath-windows: add support for GET_DP command to dump datpaths
Nithin Raju [Fri, 29 Aug 2014 22:48:10 +0000 (15:48 -0700)]
datapath-windows: add support for GET_DP command to dump datpaths

In this patch, we add support for the GET_DP netlink command to dump
the datpaaths. The userspace workflow to get this to work is the same
as on Linux. dpif-linux.c initiates a dump start by writing a netlink
message, and after that continues to read data from the kernel while
the kernel has data. The state is maintained in the kernel, and not in
userspace. This approach was taken since there was not great benefit
of maintaining state in userspace, and also to avoid userspace changes
specific to Windows.

This hopefully serves as a template to base the other dump commands on.

validation:
- With a hacked up dpif-linux.c to work on Windows,
  dpif_linux_enumerate() successfully enumerated the datapaths in the
  kernel.

Signed-off-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Saurabh Shah <ssaurabh@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath-windows: add a context structure for user parameters
Nithin Raju [Fri, 29 Aug 2014 22:47:49 +0000 (15:47 -0700)]
datapath-windows: add a context structure for user parameters

In this patch we add a context structure for collecting all the parameters
passed from usersapce in one place. The idea is to reduce the number of
parameters being passed to the netlink command handler functions.

It can be argued that not all functions require all the arguments, but this
approach keeps the code clean, IMO.

Signed-off-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Saurabh Shah <ssaurabh@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath-windows: make NL version a UIN8 and add a validateDp arg
Nithin Raju [Fri, 29 Aug 2014 22:47:37 +0000 (15:47 -0700)]
datapath-windows: make NL version a UIN8 and add a validateDp arg

I didn't realize earlier that version in a netlink message was a
UINT8. So, fixing that here.

Also, some of the commands don't pass a valid DP value. Hence adding
a field to identify such commands.

Signed-off-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Saurabh Shah <ssaurabh@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath-windows: Data structures and functions for dump state
Nithin Raju [Fri, 29 Aug 2014 22:47:21 +0000 (15:47 -0700)]
datapath-windows: Data structures and functions for dump state

Signed-off-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Saurabh Shah <ssaurabh@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoofp-errors: Migrate EXT-444 errors to ONF experimenter ID.
Jean Tourrilhes [Thu, 21 Aug 2014 17:40:51 +0000 (10:40 -0700)]
ofp-errors: Migrate EXT-444 errors to ONF experimenter ID.

Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com>
[blp@nicira.com removed the definitions of these errors in OF1.1 and OF1.2]
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agoofp-errors: Fix bugs in treatment of OpenFlow experimenter errors.
Ben Pfaff [Thu, 21 Aug 2014 17:38:15 +0000 (10:38 -0700)]
ofp-errors: Fix bugs in treatment of OpenFlow experimenter errors.

OpenFlow 1.2 and later have "experimenter errors".  The OVS implementation
was buggy in a few ways.  First, a bug in extract-ofp-errors prevented
OF1.2+ experimenter errors from being properly decoded.  Second,
OF1.2+ experimenter errors have only a type, not a code, whereas all other
types of errors (standard errors, OF1.0/1.1 Nicira extension errors) have
both, but extract-ofp-errors didn't properly enforce that.

This commit fixes both problems and improves existing tests to verify that
encoding and decoding of experimenter errors now works properly.

This commit also fixes the definition of OFPBIC_DUP_INST.  It claimed to
have an OF1.1 experimenter error value although OF1.1 didn't have
experimenter errors.  This commit changes it to use a Nicira extension
error in OF1.1 instead.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agonx-match: Serialize standard xregs instead of Nicira registers, in OF1.5.
Ben Pfaff [Thu, 21 Aug 2014 03:59:43 +0000 (20:59 -0700)]
nx-match: Serialize standard xregs instead of Nicira registers, in OF1.5.

Commit 79fe0f4611b60 (meta-flow: Add 64-bit registers.) added support for
the OpenFlow 1.5 (draft) standardized registers, but neglected to cause
them to be serialized when Open vSwitch composes flow matches.  This meant
that they were always sent to a controller as pairs of Nicira extension
registers.  This commit fixes the problem.

Found by inspection.

ONF-JIRA: EXT-244
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>