cascardo/ovs.git
9 years agodatapath: Refactor action alloc and copy api.
Pravin B Shelar [Thu, 7 Aug 2014 19:51:14 +0000 (12:51 -0700)]
datapath: Refactor action alloc and copy api.

There are two separate API to allocate and copy actions list. Anytime
OVS needs to copy action list, it needs to call both functions.
Following patch moves action allocation to copy function to avoid
code duplication.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agodatapath: Update comments about 'OVS_KEY_ATTR_8021Q'.
Justin Pettit [Wed, 6 Aug 2014 21:15:52 +0000 (14:15 -0700)]
datapath: Update comments about 'OVS_KEY_ATTR_8021Q'.

Commit fea393b1 (datapath: Describe policy for extending flow key,
implement needed changes.) changed the key 'OVS_KEY_ATTR_8021Q' to
'OVS_KEY_ATTR_VLAN' and the size of the attribute structure.  A couple
of comments were missed, so this commit updates them.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: Use tun_info only for egress tunnel path.
Pravin B Shelar [Tue, 5 Aug 2014 20:49:57 +0000 (13:49 -0700)]
datapath: Use tun_info only for egress tunnel path.

Currently tun_info is used for passing tunnel information
on ingress and egress path, this cause confusion.  Following
patch removes its use on ingress path make it egress only parameter.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agodatapath: Optimize Flow mask cache hash collision case.
Pravin B Shelar [Tue, 5 Aug 2014 17:47:23 +0000 (10:47 -0700)]
datapath: Optimize Flow mask cache hash collision case.

In case hash collision on mask cache, OVS does extra flow lookup.
Following patch avoid it.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agodatapath: Avoid using wrong metadata for recic action.
Pravin B Shelar [Tue, 5 Aug 2014 20:43:45 +0000 (13:43 -0700)]
datapath: Avoid using wrong metadata for recic action.

Recirc action needs to extract flow key from packet, it uses tun_info
from OVS_CB for setting tunnel meta data in flow key. But tun_info
can be overwritten by tunnel send action. This would result in wrong
flow key for the recirculation.
Following patch copies flow-key meta data from OVS_CB packet key
itself thus avoids this bug.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agodatapath: refactor ovs flow extract API.
Pravin B Shelar [Tue, 29 Jul 2014 22:25:02 +0000 (15:25 -0700)]
datapath: refactor ovs flow extract API.

OVS flow extract is called on packet receive or packet
execute code path.  Following patch defines separate API
for extracting flow-key in packet execute code path.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agotest-atomic: Fix warnings for GCC4.9 and sparse
Daniele Di Proietto [Thu, 7 Aug 2014 01:35:11 +0000 (01:35 +0000)]
test-atomic: Fix warnings for GCC4.9 and sparse

There's no reason for the local variable 'old_count' to be atomic. In fact, if
it is atomic it triggers a GCC4.9 warning (Wunused-value)
The global variables 'a' and 'paux' could be static, according to sparse.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agodatapath-windows: Update ovsext.sln to properly build under "Win8 Debug".
Alin Serdean [Wed, 6 Aug 2014 01:28:08 +0000 (01:28 +0000)]
datapath-windows: Update ovsext.sln to properly build under "Win8 Debug".

"Win8 Debug" solution configuration  currently builds in fact under
"Win8.1 Debug".

This patch updates the configuration to properly build under "Win8 Debug".

Reported-at: https://github.com/openvswitch/ovs-issues/issues/7
Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Nithin Raju <nithin@vmware.com>
9 years agodatapath: Correct comment about 'tun_info' member in ovs_skb_cb.
Justin Pettit [Mon, 4 Aug 2014 23:00:40 +0000 (16:00 -0700)]
datapath: Correct comment about 'tun_info' member in ovs_skb_cb.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoclassifier: classifier_lookup_miniflow_batch() indicate failures.
Ethan Jackson [Mon, 28 Jul 2014 00:51:48 +0000 (17:51 -0700)]
classifier: classifier_lookup_miniflow_batch() indicate failures.

This patch causes classifier_lookup_miniflow_batch() to return a
boolean indicating whether any rules could not be successfully looked
up.  Used in future patches.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodpif-netdev: Avoid useless flow copy in dp_netdev_flow_add().
Ethan Jackson [Mon, 28 Jul 2014 01:56:45 +0000 (18:56 -0700)]
dpif-netdev: Avoid useless flow copy in dp_netdev_flow_add().

This patch gives dp_netdev_flow_add() a match with which it can
initialize the classifier rule.  This prevents it from needing to copy
a flow and flow_wildcards into the match first.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-dev.py: Support check-valgrind in the Makefile.
Ethan Jackson [Sat, 2 Aug 2014 01:15:14 +0000 (18:15 -0700)]
ovs-dev.py: Support check-valgrind in the Makefile.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agostyle: Replace TODO with XXX.
Ethan Jackson [Mon, 28 Jul 2014 20:37:24 +0000 (13:37 -0700)]
style: Replace TODO with XXX.

In accordance with CodingStyle.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/ovs-atomic: Native support for 32-bit 586 with GCC.
Jarno Rajahalme [Tue, 5 Aug 2014 20:51:19 +0000 (13:51 -0700)]
lib/ovs-atomic: Native support for 32-bit 586 with GCC.

XenServer runs OVS in dom0, which is a 32-bit VM.  As the build
environment lacks support for atomics, locked pthread atomics were
used with considerable performance hit.

This patch adds native support for ovs-atomic with 32-bit Pentium and
higher CPUs, when compiled with an older GCC.  We use inline asm with
the cmpxchg8b instruction, which was a new instruction to Intel
Pentium processors.  We do not expect anyone to run OVS on 486 or older
processor.

cmap benchmark before the patch on 32-bit XenServer build (uses
ovs-atomic-pthread):

$ tests/ovstest test-cmap benchmark 2000000 8 0.1
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert:   8835 ms
cmap iterate:   379 ms
cmap search:   6242 ms
cmap destroy:  1145 ms

After:

$ tests/ovstest test-cmap benchmark 2000000 8 0.1
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert:    711 ms
cmap iterate:    68 ms
cmap search:    353 ms
cmap destroy:   209 ms

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/ovs-atomic: Native support for x86_64 with GCC.
Jarno Rajahalme [Tue, 5 Aug 2014 20:51:19 +0000 (13:51 -0700)]
lib/ovs-atomic: Native support for x86_64 with GCC.

Some supported XenServer build environments lack compiler support for
atomic operations.  This patch provides native support for x86_64 on
GCC, which covers possible future 64-bit builds on XenServer.

Since this implementation is faster than the existing support prior to
GCC 4.7, especially for cmap inserts, we use this with GCC < 4.7 on
x86_64.

Example numbers with "tests/test-cmap benchmark 2000000 8 0.1" on
quad-core hyperthreaded laptop, built with GCC 4.6 -O2:

Using ovs-atomic-pthreads on x86_64:

Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert:   4725 ms
cmap iterate:   329 ms
cmap search:   5945 ms
cmap destroy:   911 ms

Using ovs-atomic-gcc4+ on x86_64:

Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert:    845 ms
cmap iterate:    58 ms
cmap search:    308 ms
cmap destroy:   295 ms

With the native support provided by this patch:

Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert:    530 ms
cmap iterate:    59 ms
cmap search:    305 ms
cmap destroy:   232 ms

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agotests/test-atomic: Cover more of the atomic API.
Jarno Rajahalme [Tue, 5 Aug 2014 20:51:19 +0000 (13:51 -0700)]
tests/test-atomic: Cover more of the atomic API.

This adds tests using all of the defined memory orders, as well as
simple two-thread test cases for the acquire-release and
consume-release patterns.

These new tests helped uncover a bug in the ovs-atomic-gcc4+
implementation, which was fixed in a preceding patch.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/ovs-atomic-gcc4+: Use 'volatile' to enforce memory access.
Jarno Rajahalme [Tue, 5 Aug 2014 20:51:19 +0000 (13:51 -0700)]
lib/ovs-atomic-gcc4+: Use 'volatile' to enforce memory access.

Use 'volatile' to enforce a new memory access on each lockless atomic
store and read.  Without this a loop consisting of an atomic_read with
memory_order_relaxed would be simply optimized away.  Also, using
volatile is cheaper than adding a full compiler barrier (also) in that
case.

This use of a volatile cast mirrors the Linux kernel ACCESS_ONCE macro.

Without this change the more rigorous atomic test cases introduced in
a following patch will hang due to the atomic accesses being optimized
away.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-atomic: Fix GCC4+ atomic_flag.
Jarno Rajahalme [Tue, 5 Aug 2014 20:51:19 +0000 (13:51 -0700)]
ovs-atomic: Fix GCC4+ atomic_flag.

The default memory order for atomic_flag is documented to be
memory_order_seq_cst (as in C11), but the GCC4+ implementation only
used the GCC builtins, which provide acquire and release semantics
only.  Additional barriers are needed for in other cases.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/ovs-atomic: Require memory_order be constant.
Jarno Rajahalme [Tue, 5 Aug 2014 20:51:19 +0000 (13:51 -0700)]
lib/ovs-atomic: Require memory_order be constant.

Compiler implementations may provide sub-optimal support for a
memory_order passed in as a run-time value (ref.
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html).

Document that OVS atomics require the memory order to be passed in as
a compile-time constant.

It should be noted, however, that when inlining is disabled (i.e.,
compiling without optimization) even compile-time constants may be
passed as run-time values to (non-inlined) functions.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/ovs-atomic: Elaborate memory_order documentation.
Jarno Rajahalme [Tue, 5 Aug 2014 20:51:19 +0000 (13:51 -0700)]
lib/ovs-atomic: Elaborate memory_order documentation.

The definition of memory_order_relaxed included a compiler barrier,
while it is not necessary, and indeed the following text on
atomic_thread_fence and atomic_signal_fence contradicted that.

memory_order_consume and memory_order_acq_rel are also more thoroughly
described.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoFix strict aliasing violations with GCC 4.1 and 4.4.
Jarno Rajahalme [Tue, 5 Aug 2014 20:51:19 +0000 (13:51 -0700)]
Fix strict aliasing violations with GCC 4.1 and 4.4.

The typical use of struct sockaddr_storage is flagged as a strict
aliasing violation by GCC 4.4.7.  Using an explicit union lets the
compiler know that accessing the same location via different types is
not an error.

GCC 4.1.2 had a similar complaint about a cast of ukey's key_buf to
nlattr.  After this patch there are no further warnings with the
XenServer build, so we could start treating warnings as errors in the
builds.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath-windows: Rename solution name.
Samuel Ghinet [Tue, 5 Aug 2014 20:25:37 +0000 (20:25 +0000)]
datapath-windows: Rename solution name.

extensions.sln is the name of the MS sample solution.

Signed-off-by: Samuel Ghinet <sghinet@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Nithin Raju <nithin@vmware.com>
9 years agoINSTALL.Windows: Update build instructions to do 'make'.
Nithin Raju [Tue, 5 Aug 2014 23:30:31 +0000 (23:30 +0000)]
INSTALL.Windows: Update build instructions to do 'make'.

We no-longer need to build individual targets on Windows. This information
was lost when we committed the hyper-v changes to the repo.

Resurrecting that information in this change.

Also, added the testing instructions on how to setup a VLAN based network.

Signed-off-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agotests: Turn off appctl poll_loop logging for long output.
Ben Pfaff [Tue, 29 Jul 2014 16:54:17 +0000 (09:54 -0700)]
tests: Turn off appctl poll_loop logging for long output.

One of the VMware internal autobuilder builds failed due to extraneous
logging in these tests of the form:

   2014-07-28T21:11:07Z|00001|poll_loop|INFO|wakeup due to [POLLIN] on fd 3
   (...) at lib/stream-fd-unix.c:124 (93% CPU usage)

I think this must be because these tests have tons of output and so on a
loaded machine it can take some CPU to pull it down.  We don't want to fail
for that reason, so disable this logging.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ansis Atteka <aatteka@nicira.com>
9 years agoovs-appctl: Add logging options.
Ben Pfaff [Mon, 4 Aug 2014 19:36:04 +0000 (12:36 -0700)]
ovs-appctl: Add logging options.

Normally I would also add documentation for the logging options to the
ovs-appctl manpage, but I am concerned that in this case it would actually
make the manpage confusing, because one of the main purposes of ovs-appctl
is to modify the log levels of *other* programs, and these options only
modify the log level of ovs-appctl itself, which is rarely useful.

The following commit will start using these logging options in a test.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ansis Atteka <aatteka@nicira.com>
9 years agodpctl: add ovs-appctl dpctl/* commands to talk to dpif-netdev
Daniele Di Proietto [Fri, 18 Jul 2014 00:26:00 +0000 (17:26 -0700)]
dpctl: add ovs-appctl dpctl/* commands to talk to dpif-netdev

This commit introduces multiple appctl commands (dpctl/*)

They are needed to interact with userspace datapaths (dpif-netdev), because the
ovs-dpctl command runs in a separate process and cannot see the userspace
datapaths inside vswitchd.

This change moves most of the code of utilities/ovs-dpctl.c in lib/dpctl.c.

Both the ovs-dpctl command and the ovs-appctl dpctl/* commands make calls to
lib/dpctl.c functions, to interact with datapaths.

The code from utilities/ovs-dpctl.c has been moved to lib/dpctl.c and has been
changed for different reasons:
   - An exit() call in the old code made perfectly sense. Now (since the code
     can be run inside vswitchd) it would terminate the daemon. Same reasoning
     can be applied to ovs_fatal_*() calls.
   - The lib/dpctl.c code _should_ not leak memory.
   - All the print* have been replaced with a function pointer provided by the
     caller, since this code can be run in the ovs-dpctl process (in which
     case we need to print to stdout) or in response to a unixctl request (and
     in this case we need to send everything through a socket, using JSON
     encapsulation).

The syntax is
   ovs-appctl dpctl/(COMMAND) [OPTIONS] [PARAMETERS]
while the ovs-dpctl syntax (which _should_ remain the same after this change)
is
   ovs-dpctl [OPTIONS] (COMMAND) [PARAMETERS]

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
[blp@nicira.com made stylistic and documentation changes]
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoofproto-dpif: Fix tests broken by previous commit.
Ben Pfaff [Mon, 4 Aug 2014 19:46:37 +0000 (12:46 -0700)]
ofproto-dpif: Fix tests broken by previous commit.

This fixes tests that were broken by commit a7d1bbdcfe (ofproto-dpif: Use
DPIF_FP_CREATE but not DPIF_FP_MODIFY.)

Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoofproto-dpif: fix an ovs crash when dpif_recv_set returns error
Andy Zhou [Sun, 3 Aug 2014 23:16:40 +0000 (16:16 -0700)]
ofproto-dpif: fix an ovs crash when dpif_recv_set returns error

When dpif_recv_set returns an error, close_dpif_backer gets called,
which in term calls recirc_id_pool_destroy, which can lead to a crash
since recirc_id_pool_create() was not called before this patch.

Reported-by: Mukesh Hira <mhira@vmware.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoofproto-dpif: Use DPIF_FP_CREATE but not DPIF_FP_MODIFY.
Ben Pfaff [Sat, 2 Aug 2014 00:22:20 +0000 (17:22 -0700)]
ofproto-dpif: Use DPIF_FP_CREATE but not DPIF_FP_MODIFY.

A dpif reports EEXIST if a flow put operation that should create a new flow
instead attempts to modify an existing flow, or ENOENT if a flow put would
create a flow that overlaps some existing flow.  The latter does not always
indicate a bug in OVS userspace, because it can also mean that two
userspace OVS packet handler threads received packets that generated
the same megaflow for different microflows.  Until now, userspace has
logged this, which confuses users by making them suspect a bug.  We could
simply not log ENOENT in userspace, but that would suppress logging for
genuine bugs too.  Instead, this commit drops DPIF_FP_MODIFY from flow
put operations in ofproto-dpif, which transforms this particular
problem into EEXIST, which userspace already logs at "debug" level (see
flow_message_log_level()), effectively suppressing the logging in normal
circumstances.

It appears that in practice ofproto-dpif doesn't actually ever need to
modify flows in the datapath, only create and delete them, so this
shouldn't cause problems.

Suggested-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
9 years agoodp-netlink.h: Use 32-bit aligned 64-bit types.
Ben Pfaff [Fri, 13 Jun 2014 22:28:29 +0000 (15:28 -0700)]
odp-netlink.h: Use 32-bit aligned 64-bit types.

Open vSwitch userspace uses special types to indicate that a particular
object may not be naturally aligned.  Netlink is one source of such
problems: in Netlink, 64-bit integers are often aligned only on 32-bit
boundaries.  This commit changes the odp-netlink.h that is transformed from
<linux/openvswitch.h> to use these types to make it harder to accidentally
access a misaligned 64-bit member.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
9 years agoDo not seemingly #include Linux-specific headers on other platforms.
Ben Pfaff [Mon, 4 Aug 2014 18:11:40 +0000 (11:11 -0700)]
Do not seemingly #include Linux-specific headers on other platforms.

Until now, the OVS source tree has had a whole maze of header files that
make "#include <linux/openvswitch.h>" work OK regardless of platform, but
this confuses everyone new to the tree, at first glance, and is difficult
to understand at second glance too.

This commit renames include/linux/openvswitch.h to
datapath/linux/compat/include/linux/openvswitch.h without other change,
then modifies the userspace build to generate a header that makes sense in
portable Open vSwitch userspace from that header.

It then removes all the remaining include/linux/* files since they are now
unused.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
9 years agodatapath: do not use vport type to determine presence of Geneve attributes
Ansis Atteka [Tue, 29 Jul 2014 22:39:35 +0000 (15:39 -0700)]
datapath: do not use vport type to determine presence of Geneve attributes

This patch fixes following kernel crash that could happen, if geneve
vport was not added yet, but revalidator thread attempted to dump flows.
To reproduce:
1. switch tunnel type between geneve and gre in a loop; and
2. run ping.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
IP: [<ffffffffa0385470>] ovs_nla_put_flow+0x3d0/0x7c0 [openvswitch]
PGD 3b32b067 PUD 3b2ef067 PMD 0
Oops: 0000 [#2] SMP
...
CPU: 0 PID: 6450 Comm: revalidator2 Tainted: GF     D    O
3.13.0-24-generic #46-Ubuntu
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference
Platform, BIOS 6.00 07/02/2012
task: ffff88003b4aafe0 ti: ffff88003d314000 task.ti: ffff88003d314000
RIP: 0010:[<ffffffffa0385470>]  [<ffffffffa0385470>]
ovs_nla_put_flow+0x3d0/0x7c0 [openvswitch]
RSP: 0018:ffff88003d315a10  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88003a9a9960 RCX: 0000000000000000
RDX: 0000000000000002 RSI: ffffffffffffffc8 RDI: ffff88003babcb80
RBP: ffff88003d315a68 R08: 0000000000000000 R09: 0000000000000004
R10: ffff880039c23034 R11: 0000000000000008 R12: ffff88003a861600
R13: ffff88003a9a9960 R14: ffff88003babcb80 R15: qffff88003a861600
FS:  00007ff0f5d94700(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000048 CR3: 000000003b55b000 CR4: 00000000000007f0
Stack:
ffffffff81385093 0000000000000000 0000000000000000 0000000000000000
ffff880000000000 ffff88003d315a58 ffff880039c23014 ffff88003a9a97a0
ffff88003babcb80 ffff880039c23018 ffff88003a861600 ffff88003d315ad0
Call Trace:
[<ffffffff81385093>] ? __nla_reserve+0x43/0x50
[<ffffffffa037e683>] ovs_flow_cmd_fill_info+0x93/0x2b0 [openvswitch]
[<ffffffffa0387159>] ? ovs_flow_tbl_dump_next+0x49/0xc0 [openvswitch]
[<ffffffffa037e920>] ovs_flow_cmd_dump+0x80/0xd0 [openvswitch]
[<ffffffff81645004>] netlink_dump+0x84/0x240
[<ffffffff816458eb>] __netlink_dump_start+0x1ab/0x220
[<ffffffff816498d7>] genl_family_rcv_msg+0x337/0x370
[<ffffffffa037e8a0>] ? ovs_flow_cmd_fill_info+0x2b0/0x2b0 [openvswitch]
[<ffffffff811a2778>] ? __kmalloc_node_track_caller+0x58/0x1e0
[<ffffffff81649910>] ? genl_family_rcv_msg+0x370/0x370
[<ffffffff816499a1>] genl_rcv_msg+0x91/0xd0
[<ffffffff81647a29>] netlink_rcv_skb+0xa9/0xc0
[<ffffffff81647f28>] genl_rcv+0x28/0x40
[<ffffffff81647055>] netlink_unicast+0xd5/0x1b0
[<ffffffff8164742f>] netlink_sendmsg+0x2ff/0x740
[<ffffffff816024eb>] sock_sendmsg+0x8b/0xc0
[<ffffffff811bbaa1>] ? __sb_end_write+0x31/0x60
[<ffffffff811d42bf>] ? touch_atime+0x10f/0x140
[<ffffffff811c2471>] ? pipe_read+0x371/0x400
[<ffffffff81602691>] SYSC_sendto+0x121/0x1c0
[<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
[<ffffffff81020d35>] ? syscall_trace_enter+0x145/0x250
[<ffffffff8160319e>] SyS_sendto+0xe/0x10
[<ffffffff8172663f>] tracesys+0xe1/0xe6

Signed-Off-By: Ansis Atteka <aatteka@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
9 years agodatapath-windows: Fix crash when internal port is removed.
Eitan Eliahu [Sat, 2 Aug 2014 00:11:51 +0000 (17:11 -0700)]
datapath-windows: Fix crash when internal port is removed.

Avoids a BSOD when AllowManagementOS is set to false.

Reported-at: https://github.com/openvswitch/ovs/issues/13
Reported-by: Alin Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Eitan Eliahu <eliahue@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Nithin Raju <nithin@vmware.com>
9 years agobitmap: Add test for bitmap_equal and bitmap_scan
Kmindg [Thu, 31 Jul 2014 05:16:04 +0000 (13:16 +0800)]
bitmap: Add test for bitmap_equal and bitmap_scan

Suggested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Kmindg <kmindg@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath-windows: Add temporary files to gitignore.
Saurabh Shah [Wed, 30 Jul 2014 20:29:54 +0000 (20:29 +0000)]
datapath-windows: Add temporary files to gitignore.

Github Issue: #12

Reported-by: Alin Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Saurabh Shah <ssaurabh@vmware.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
9 years agodatapath: Remove unlikely() for WARN_ON() conditions
Thomas Graf [Thu, 31 Jul 2014 01:05:01 +0000 (18:05 -0700)]
datapath: Remove unlikely() for WARN_ON() conditions

No need for the unlikely(), WARN_ON() and BUG_ON() internally use
unlikely() on the condition.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath-windows: Rename the extension & vendor in the INF file.
Saurabh Shah [Wed, 30 Jul 2014 16:03:20 +0000 (09:03 -0700)]
datapath-windows: Rename the extension & vendor in the INF file.

Github issue: #14

Signed-off-by: Saurabh Shah <ssaurabh@vmware.com>
Reported-by: Alessandro Pilotti <apilotti@cloudbasesolutions.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
9 years agodatapath: Remove redundant key ref from upcall_info.
Pravin B Shelar [Tue, 29 Jul 2014 17:04:52 +0000 (10:04 -0700)]
datapath: Remove redundant key ref from upcall_info.

struct dp_upcall_info has pointer to pkt_key which is already
available in OVS_CB.  This also simplifies upcall handling
for gso packet.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agodatapath: Drop packets when interdev is not up
Chunhe Li [Wed, 30 Jul 2014 01:49:01 +0000 (09:49 +0800)]
datapath: Drop packets when interdev is not up

If the internal device is not up, it should drop received
packets. Sometimes it receive the broadcast or multicast
packets, and the ip protocol stack will casue more cpu
usage wasted.

Signed-off-by: Chunhe Li <lichunhe@huawei.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoconnmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.
Alex Wang [Tue, 29 Jul 2014 17:50:07 +0000 (10:50 -0700)]
connmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.

connmgr_wants_packet_in_on_miss() is called by multiple threads
and thusly should be protected by the mutex.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
9 years agoconnmgr: Fix a typo.
Alex Wang [Tue, 29 Jul 2014 18:10:21 +0000 (11:10 -0700)]
connmgr: Fix a typo.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agotests: Add another test for extended registers.
Ben Pfaff [Tue, 29 Jul 2014 16:27:11 +0000 (09:27 -0700)]
tests: Add another test for extended registers.

ONF-JIRA: EXT-244
Suggested-by: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jean Tourrilhes <jt@hpl.hp.com>
9 years agoAdd more files to the openvswitch library on MSVC
Alin Serdean [Tue, 29 Jul 2014 15:24:08 +0000 (15:24 +0000)]
Add more files to the openvswitch library on MSVC

Add netlink related files to the windows build.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agonetlink-socket: Adapt to Windows and MSVC.
Alin Serdean [Tue, 29 Jul 2014 15:23:28 +0000 (15:23 +0000)]
netlink-socket: Adapt to Windows and MSVC.

Add two functions set_sock_pid_in_kernel and portid_next. This will allow
the channel identification for the kernel extension to send back messages.

Replace send with WriteFile equivalent and ignore nl_sock_drain for the moment
under MSVC.

Replace sendmsg and recvmsg with ReadFile and WriteFile equivalents.

On MSVC put in handle instead of fd(sock->fd becomes sock->handle).

Creation of the netlink socket will be replaced by CreateFile equivalent.

Add MAX_STACK_LENGTH for MSVC.  This will be our maximum size for on-stack
copy buffer.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agonetlink-socket: Allow compiling on MSVC even without HAVE_NETLINK.
Alin Serdean [Tue, 29 Jul 2014 15:22:37 +0000 (15:22 +0000)]
netlink-socket: Allow compiling on MSVC even without HAVE_NETLINK.

Bypass the error compilation when compiling under MSVC.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agonetlink-protocol: Add more definitions.
Alin Serdean [Tue, 29 Jul 2014 15:22:03 +0000 (15:22 +0000)]
netlink-protocol: Add more definitions.

Add MAX_LINKS define needed for nl_pool.

Add NLM_F_CREATE and NLM_F_EXCL defines also.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoAdd defines, enums and headers for MSVC
Alin Serdean [Tue, 29 Jul 2014 15:21:16 +0000 (15:21 +0000)]
Add defines, enums and headers for MSVC

Add defines needed to compile netlink-socket.c and netlink.c.

Add a wrapper and the functionality behind it for syconf.

Add the newly created files to the noinst_HEADERS in windows/automake.mk

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agocmap: Merge CMAP_FOR_EACH_SAFE into CMAP_FOR_EACH.
Ben Pfaff [Tue, 29 Jul 2014 16:02:23 +0000 (09:02 -0700)]
cmap: Merge CMAP_FOR_EACH_SAFE into CMAP_FOR_EACH.

There isn't any significant downside to making cmap iteration "safe" all
the time, so this drops the _SAFE variant.

Similar changes to CMAP_CURSOR_FOR_EACH and CMAP_CURSOR_FOR_EACH_CONTINUE.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agonetlink-socket: Add conceptual documentation.
Ben Pfaff [Tue, 29 Jul 2014 15:59:40 +0000 (08:59 -0700)]
netlink-socket: Add conceptual documentation.

Based on a conversation with the VMware Hyper-V team earlier today.

This commit also changes a couple of functions that were only used with
netlink-socket.c into static functions.  I couldn't think of a reason for
code outside that file to use them.

Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Use currect rcu API in exact match flow lookup function.
Pravin B Shelar [Sat, 26 Jul 2014 03:34:48 +0000 (20:34 -0700)]
datapath: Use currect rcu API in exact match flow lookup function.

exact match cache lookup is always done under ovs lock. So
use ovsl_dereference() API for rcu access.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agoconnmgr: Only send role status messages to OpenFlow 1.4+ controllers.
Ben Pfaff [Tue, 22 Jul 2014 22:54:55 +0000 (15:54 -0700)]
connmgr: Only send role status messages to OpenFlow 1.4+ controllers.

Only OpenFlow 1.4 and later support role status messages, but this code
tried to send them to all controllers, which caused an assertion failure.

Also, add tests to check that role status messages work, and that they
don't cause trouble with OF1.2.

Reported-by: Anup Khadka <khadka.py@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agoconnmgr: Demote service controllers too in ofconn_set_role().
Ben Pfaff [Mon, 28 Jul 2014 17:17:56 +0000 (10:17 -0700)]
connmgr: Demote service controllers too in ofconn_set_role().

Service controllers can set their roles, so it's necessary to demote them
to slaves if another controller becomes master.  Unfortunately the
'controllers' hmap only contains primary controllers, so this was omitted.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agoovs-ofctl: Add --unixctl command line option.
Ben Pfaff [Mon, 28 Jul 2014 17:31:25 +0000 (10:31 -0700)]
ovs-ofctl: Add --unixctl command line option.

This matches the option offered by some other Open vSwitch daemons.  I
intend to use it in tests in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agodatapath-windows: Kernel module for HyperV.
Saurabh Shah [Mon, 28 Jul 2014 00:26:58 +0000 (17:26 -0700)]
datapath-windows: Kernel module for HyperV.

The kernel switch extension has support for bridged back forwarding & tunneling
over VXLAN. There is no Netlink integration as it is still being worked out.

Co-Authored-By: Ankur Sharma <ankursharma@vmware.com>
Signed-off-by: Ankur Sharma <ankursharma@vmware.com>
Co-Authored-By: Eitan Eliahu <eliahue@vmware.com>
Signed-off-by: Eitan Eliahu <eliahue@vmware.com>
Co-Authored-By: Guolin Yang <gyang@vmware.com>
Signed-off-by: Guolin Yang <gyang@vmware.com>
Co-Authored-By: Linda Sun <lsun@vmware.com>
Signed-off-by: Linda Sun <lsun@vmware.com>
Co-Authored-By: Nithin Raju <nithin@vmware.com>
Signed-off-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Saurabh Shah <ssaurabh@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agometa-flow: Add 64-bit registers.
Ben Pfaff [Mon, 28 Jul 2014 16:50:37 +0000 (09:50 -0700)]
meta-flow: Add 64-bit registers.

These 64-bit registers are intended to conform with the OpenFlow 1.5
draft specification.

EXT-244.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agoofproto-dpif-upcall: Fix sparse warnings.
Ben Pfaff [Sat, 26 Jul 2014 19:19:03 +0000 (12:19 -0700)]
ofproto-dpif-upcall: Fix sparse warnings.

Fixes these warnings from "sparse":

../ofproto/ofproto-dpif-upcall.c:761:1: warning: symbol 'free_upcall' was
    not declared. Should it be static?
../ofproto/ofproto-dpif-upcall.c:849:1: warning: symbol 'convert_upcall'
    was not declared. Should it be static?

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agoFix two memory leaks.
yinpeijun [Mon, 28 Jul 2014 07:21:17 +0000 (15:21 +0800)]
Fix two memory leaks.

Found by coverity.

Signed-off-by: yinpeijun <yinpeijun@huawei.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoRemove assumption that there are 64 or fewer fields.
Ben Pfaff [Sat, 26 Jul 2014 19:15:26 +0000 (12:15 -0700)]
Remove assumption that there are 64 or fewer fields.

An upcoming commit will increase the number of fields beyond 64.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agobitmap: Add new functions.
Ben Pfaff [Sat, 26 Jul 2014 05:23:44 +0000 (22:23 -0700)]
bitmap: Add new functions.

These will be used in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agometa-flow: Simplify handling of a variable number of registers.
Ben Pfaff [Thu, 15 May 2014 15:19:11 +0000 (08:19 -0700)]
meta-flow: Simplify handling of a variable number of registers.

At the time that Open vSwitch implemented registers, there was a high cost
to adding additional fields, so I wrote the code so that the number of
registers could be reduced at compile time.  Now, fields are cheaper
(though not free) and in the meantime I have never heard of anyone reducing
the number of registers.  Since I intend to add more code that would
require awkward "#if"s like this, I think that this is a good time to
simplify it by requiring FLOW_N_REGS to be fixed.  This commit does that.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agodatapath: Fix buffer overrun in mask array realloc.
Pravin B Shelar [Fri, 25 Jul 2014 23:22:46 +0000 (16:22 -0700)]
datapath: Fix buffer overrun in mask array realloc.

mask realloc copies elements from old array to new array. When
shrinking array it can go beyond allocated memory.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agodpif-netdev: Polling threads directly call ofproto upcall functions.
Ryan Wilson [Sat, 26 Jul 2014 06:51:55 +0000 (06:51 +0000)]
dpif-netdev: Polling threads directly call ofproto upcall functions.

Typically, kernel datapath threads send upcalls to userspace where
handler threads process the upcalls. For TAP and DPDK devices, the
datapath threads operate in userspace, so there is no need for
separate handler threads.

This patch allows userspace datapath threads to directly call the
ofproto upcall functions, eliminating the need for handler threads
for datapaths of type 'netdev'.

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
9 years agodatapath: Add NULL check for mask pointer.
Pravin B Shelar [Thu, 24 Jul 2014 20:32:35 +0000 (13:32 -0700)]
datapath: Add NULL check for mask pointer.

There is race in datapath when last mask in mask array deleted can
result in NULL pointer dereference in datapath.
datapath lookup does not check mask pointer if its index is less than
mask-array count. That is safe because delete operation moves last valid
pointer to the deleted element. But this does not work if we are
deleting last element in array. Following patch adds NULL check for the
mask pointer.
This patch also avoids accessing ma->count without any locks.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agocfm: Reduce "long delay" message from WARN to INFO, to match BFD behavior.
Ben Pfaff [Fri, 25 Jul 2014 16:19:17 +0000 (09:19 -0700)]
cfm: Reduce "long delay" message from WARN to INFO, to match BFD behavior.

These messages can cause the testsuite to fail on a busy build machine
since the testsuite treats WARN or ERR log messages as failures.  BFD
uses an INFO message instead of WARN, so this just changes CFM to match.

Alternatively, the testsuite could ignore "long delay" messages (it ignores
some other categories of messages).  In that case I'd expect that we'd
want to change BFD to match CFM since I don't know of a reason why they
should log differently.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agodatapath: reorder action netlink attribute definition for upstreaming
Andy Zhou [Thu, 24 Jul 2014 09:17:48 +0000 (02:17 -0700)]
datapath: reorder action netlink attribute definition for upstreaming

Keeping the order of netlink attribute definition in the order of
upstreaming is the best way to keep all released user space program
forward compatible with upstreamed kernel modules.

Adjust action netlink attribute order to match with the current
upstreaming plan.

Recirc and hash actions are introduced in branch 2.3, which will be
fixed by the patch. The MPLS actions have been released since
branch-2.1 but there is no kernel implementation of them prior to
branch 2.3. Thus the ordering change should not affect them.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetdev-provider: Clarify comment where 'get_next_hop' function looks.
Ben Pfaff [Wed, 23 Jul 2014 19:48:42 +0000 (12:48 -0700)]
netdev-provider: Clarify comment where 'get_next_hop' function looks.

Some readers thought it was looking in an ofproto- or netdev-specific
table.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
9 years agodatapath/flow_netlink: Avoid wildcarding tunnel key with disabled megaflows
Daniele Di Proietto [Thu, 17 Jul 2014 06:36:05 +0000 (06:36 +0000)]
datapath/flow_netlink: Avoid wildcarding tunnel key with disabled megaflows

If the userspace wants to match on a flow with some tunnel attributesset to 0,
it simply omits them in the netlink attributes stream.
Since our wildcarding logic (when megaflows are disabled) is based on the
attributes in the netlink stream, we set our mask incorrectly.

This commit adds a check to detect if the userspace wants to match on a tunnel,
in which case we simply unwildcard the whole tun_key

Reported-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
9 years agodatapath: flow_netlink: Fix a bug.
Alex Wang [Wed, 23 Jul 2014 06:14:50 +0000 (23:14 -0700)]
datapath: flow_netlink: Fix a bug.

Commit 62974663fe (datapath/flow_netlink: Create right mask with
disabled megaflows) introduced the bug which caused
ovs_nla_get_match() returns immediately after parsing the flow
mask for OVS_KEY_ATTR_ENCAP.  Consequently, when vlan encapsulated
packets are present, the corresponding datapath flows will have
incorrect mask like below.  And the incorrect flows could affect
other non-vlan packets.

~/ovs# ovs-dpctl dump-flows
in_port(3/0xffff0000),eth_type(0x8100),encap(), packets:0,
bytes:0, used:never, actions:2

This commit fixes the bug by checking and handling the return
value of the parsing function correctly.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodpif-netdev: Initialize upcall->packet when queuing to userspace.
Ben Pfaff [Wed, 23 Jul 2014 04:06:44 +0000 (21:06 -0700)]
dpif-netdev: Initialize upcall->packet when queuing to userspace.

Commit db73f7166a6 (netdev-dpdk: Fix race condition with DPDK mempools in
non pmd threads) switched to a new way of setting up 'upcall->packet', but
only initialized two of the fields in the packet.  This could cause
core dumps and other strange behavior.  In particular it caused failures in
several unit tests on XenServer.

This commit fixes the problem by initializing the entire ofpbuf.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agodpif-netdev: Reduce netdev_flow_key size
Daniele Di Proietto [Wed, 23 Jul 2014 00:06:23 +0000 (17:06 -0700)]
dpif-netdev: Reduce netdev_flow_key size

struct 'miniflow' already contains MINI_N_INLINE values, therefore
we can save few bytes in netdev_flow_key.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetdev-dpdk: Increase tx queue size and rx batch size
Daniele Di Proietto [Wed, 23 Jul 2014 00:09:10 +0000 (17:09 -0700)]
netdev-dpdk: Increase tx queue size and rx batch size

These values has been found to give the best throughput
in simple cases (1 flow 64 bytes UDP packets).

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agostp: Add more logging points for debug.
Alex Wang [Sat, 19 Jul 2014 04:49:52 +0000 (04:49 +0000)]
stp: Add more logging points for debug.

This commit adds more logging points in stp module for debugging.
Also, it makes the log print out the port name.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agonetlink-notifier: Exit loop if nl_sock_recv() returns an error
Daniele Di Proietto [Tue, 22 Jul 2014 06:38:57 +0000 (06:38 +0000)]
netlink-notifier: Exit loop if nl_sock_recv() returns an error

An error from nl_sock_recv() could mean that there are some issues with the
netlink socket (EBADF, ENOTSOCK, ...). Calling nl_sock_recv() in this case is
harmful: nln_run() will never return and since we are calling it from the main
thread, vswitchd becomes unresponsive.
Also, with this commit we avoid calling the notifier callback in case of error
(except for ENOBUFS, which means that there could be too many notifications)

Suggested-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
9 years agoconfigure: Don't check for malloc hooks that we no longer use.
Ben Pfaff [Tue, 22 Jul 2014 23:18:35 +0000 (16:18 -0700)]
configure: Don't check for malloc hooks that we no longer use.

Commit 825da1c6d1c7 (leak-checker: Remove because it cannot be made
thread-safe.) removed the only uses of these hooks but neglected to remove
the test for them.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agoovsdb-server: Document RFC 7047 extensions to ovsdb <error>s.
Ben Pfaff [Tue, 22 Jul 2014 00:08:54 +0000 (17:08 -0700)]
ovsdb-server: Document RFC 7047 extensions to ovsdb <error>s.

Reported-by: Madhu Venugopal <vmadhu@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agoovs-ctl: Add comment to explain why we only save ofports for pre-1.10.
Ben Pfaff [Tue, 22 Jul 2014 23:09:26 +0000 (16:09 -0700)]
ovs-ctl: Add comment to explain why we only save ofports for pre-1.10.

We've had a couple of questions about this lately, including one suggestion
that we should always save the OpenFlow port numbers.  This explains why
the behavior is as it is.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agodebian: Avoid -Wformat-zero-length warnings.
Ben Pfaff [Tue, 22 Jul 2014 18:50:37 +0000 (11:50 -0700)]
debian: Avoid -Wformat-zero-length warnings.

Debian puts an extra "-Wformat" in the CFLAGS following OVS's own
"-Wformat -Wno-format-zero-length", which therefore overrides
-Wno-format-zero-length, so this commit adds an extra
-Wno-format-zero-length to avoid those false positives.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agoUse xstrdup() instead of strdup(), xmalloc() instead of malloc().
Ben Pfaff [Tue, 22 Jul 2014 22:46:25 +0000 (15:46 -0700)]
Use xstrdup() instead of strdup(), xmalloc() instead of malloc().

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agodatapath: Refactor get_dp() function into multiple access APIs.
Andy Zhou [Tue, 15 Jul 2014 21:38:29 +0000 (14:38 -0700)]
datapath: Refactor get_dp() function into multiple access APIs.

Avoid recursive read_rcu_lock() by using the lighter weight get_dp_rcu()
API. Add proper locking assertions to get_dp().

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agotests: Wait for ofctl_monitor.log in ofproto-dpif controller.
Ben Pfaff [Tue, 22 Jul 2014 04:42:49 +0000 (21:42 -0700)]
tests: Wait for ofctl_monitor.log in ofproto-dpif controller.

Fixes a race in the test case.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoclassifier: Refactor cls_cursor_advance() to make it easier to follow.
Ben Pfaff [Tue, 22 Jul 2014 04:00:34 +0000 (21:00 -0700)]
classifier: Refactor cls_cursor_advance() to make it easier to follow.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agocmap, classifier: Avoid unsafe aliasing in iterators.
Ben Pfaff [Tue, 22 Jul 2014 04:00:04 +0000 (21:00 -0700)]
cmap, classifier: Avoid unsafe aliasing in iterators.

CMAP_FOR_EACH and CLS_FOR_EACH and their variants tried to use void ** as
a "pointer to any kind of pointer".  That is a violation of the aliasing
rules in ISO C which technically yields undefined behavior.  With GCC 4.1,
it causes both warnings and actual misbehavior.  One option would to add
-fno-strict-aliasing to the compiler flags, but that would only help with
GCC; who knows whether this can be worked around with other compilers.

Instead, this commit rewrites the iterators to avoid disallowed pointer
aliasing.

VMware-BZ: #1287651
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agovswitchd/bridge: Fix setting default prefix fields.
Jarno Rajahalme [Mon, 21 Jul 2014 21:34:02 +0000 (14:34 -0700)]
vswitchd/bridge: Fix setting default prefix fields.

The check for the need of default values was in the wrong place,
causing no prefix tracking to be used when database had no
configuration for a flow table.  Missing configuration means that
defaults should be used.

To limit clutter on the log, we now log the prefix tracking
configuration when it is explicitly set in the database.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-atomic: Avoid evaluating arguments multiple times.
Jarno Rajahalme [Mon, 21 Jul 2014 21:19:06 +0000 (14:19 -0700)]
ovs-atomic: Avoid evaluating arguments multiple times.

ovs-atomic-gcc4+ did expand arguments again, if locked set/get was
called.

Also fix atomic_is_lock_free().

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/ovs-rcu: evaluate argument of ovsrcu_get only once.
Jarno Rajahalme [Mon, 21 Jul 2014 21:19:06 +0000 (14:19 -0700)]
lib/ovs-rcu: evaluate argument of ovsrcu_get only once.

As ovsrcu_get() looks like a function call, it is reasonable for the
callers to expect that the arguments are evaluated only once.
CONST_CAST expands its 'POINTER' argument multiple times, and the
exact effect of this turned out to be compiler dependent.  Fix this by
expanding the macro argument before CONST_CAST, and removing
unnecessary CONST_CASTs.

VMware-BZ: #1287651

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agonetdev-dpdk: Fix race condition with DPDK mempools in non pmd threads
Daniele Di Proietto [Thu, 17 Jul 2014 21:29:36 +0000 (14:29 -0700)]
netdev-dpdk: Fix race condition with DPDK mempools in non pmd threads

DPDK mempools rely on rte_lcore_id() to implement a thread-local cache.
Our non pmd threads had rte_lcore_id() == 0. This allowed concurrent access to
the "thread-local" cache, causing crashes.

This commit resolves the issue with the following changes:

- Every non pmd thread has the same lcore_id (0, for management reasons), which
  is not shared with any pmd thread (lcore_id for pmd threads now start from 1)
- DPDK mbufs must be allocated/freed in pmd threads. When there is the need to
  use mempools in non pmd threads, like in dpdk_do_tx_copy(), a mutex must be
  held.
- The previous change does not allow us anymore to pass DPDK mbufs to handler
  threads: therefore this commit partially revert 143859ec63d45e. Now packets
  are copied for upcall processing. We can remove the extra memcpy by
  processing upcalls in the pmd thread itself.

With the introduction of the extra locking, the packet throughput will be lower
in the following cases:

- When using internal (tap) devices with DPDK devices on the same datapath.
  Anyway, to support internal devices efficiently, we needed DPDK KNI devices,
  which will be proper pmd devices and will not need this locking.
- When packets are processed in the slow path by non pmd threads. This overhead
  can be avoided by handling the upcalls directly in pmd threads (a change that
  has already been proposed by Ryan Wilson)

Also, the following two fixes have been introduced:
- In dpdk_free_buf() use rte_pktmbuf_free_seg() instead of rte_mempool_put().
  This allows OVS to run properly with CONFIG_RTE_LIBRTE_MBUF_DEBUG DPDK option
- Do not bulk free mbufs in a transmission queue. They may belong to different
  mempools

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetlink-socket: Do not make flow_dump block on netlink socket.
Alex Wang [Fri, 18 Jul 2014 21:27:36 +0000 (14:27 -0700)]
netlink-socket: Do not make flow_dump block on netlink socket.

Commit 93295354 (netlink-socket: Simplify multithreaded dumping
to match Linux reality.) makes the call to recvmsg() block if no
messages are available.  This can cause revalidator threads hanging
for long time or even deadlock when main thread tries to stop the
revalidator threads.

This commit fixes the issue by enabling the MSG_DONTWAIT flag in
the call to recvmsg().

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovsdb: Don't add ovsdb-server.c to libovsdb.
Gurucharan Shetty [Fri, 18 Jul 2014 01:15:17 +0000 (18:15 -0700)]
ovsdb: Don't add ovsdb-server.c to libovsdb.

Without this change, with shared libraries, VLOG
constructor for ovsdb-server would get called twice corrupting
the 'vlog_modules' list causing an infinite loop.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Reported-by: Gur Stavi <gstavi@mrv.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/classifier: Clarify subtable skipping.
Jarno Rajahalme [Fri, 18 Jul 2014 09:24:27 +0000 (02:24 -0700)]
lib/classifier: Clarify subtable skipping.

Clarify comments for trie-based subtable skipping.

Perform the cheaper check first.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agolib/classifier: Return all matching prefix lengths from trie lookup.
Jarno Rajahalme [Fri, 18 Jul 2014 09:24:26 +0000 (02:24 -0700)]
lib/classifier: Return all matching prefix lengths from trie lookup.

Previously we only returned the last matching prefix length
encountered during a trie lookup, and skipped subtables that had
prefixes longer than that.  This patch changes the trie lookup
functions to return all matching prefix lengths seen, so that all
non-matching prefix lengths can be skipped.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/classifier: Change local variable names.
Jarno Rajahalme [Fri, 18 Jul 2014 09:24:26 +0000 (02:24 -0700)]
lib/classifier: Change local variable names.

These stylistic changes makes the following patch a bit simpler.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agolib/classifier: Unify struct classifier and cls_classifier.
Jarno Rajahalme [Fri, 18 Jul 2014 09:24:26 +0000 (02:24 -0700)]
lib/classifier: Unify struct classifier and cls_classifier.

Now that it is clear that struct cls_classifier itself does not
need RCU indirection and pvector is defined in its own header, it
is possible get rid of the indirection from struct classifier to
struct cls_classifier.

Suggested-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agonetdev-dpdk: Refactor dpdk_class_init()
Daniele Di Proietto [Thu, 17 Jul 2014 00:10:59 +0000 (17:10 -0700)]
netdev-dpdk: Refactor dpdk_class_init()

The following changes were made:

- Since we have two dpdk classes, we should split the initial operations needed
  by both classes from the initialization needed by each class.
- The dpdk_ring class does not need an initialization function: it has been
  removed. This also prevents many testcase from failing, because
  dpdk_ring_class_init() was printing an unexpected log message
  (OVS_VSWITCHD_START at tests/ofproto-macros.at:54 check for a specific set of
  startup log messages)
- If the user doesn't pass the --dpdk option we do not register the dpdk*
  classes
- Do not call VLOG_ERR if there are 0 dpdk ethernet device. OVS can now be used
  with dpdk_ring devices.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoofproto: Report controller rate limiting statistics in database.
Ben Pfaff [Thu, 17 Jul 2014 17:31:03 +0000 (10:31 -0700)]
ofproto: Report controller rate limiting statistics in database.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agosmap: New function smap_add_nocopy().
Ben Pfaff [Wed, 16 Jul 2014 20:34:03 +0000 (13:34 -0700)]
smap: New function smap_add_nocopy().

An upcoming commit will add a caller that needs to format both the key and
the value.  That isn't cleanly possible with the current interface.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agoSimplify ofproto_controller_info by using a struct smap in place of array.
Ben Pfaff [Thu, 17 Jul 2014 17:27:00 +0000 (10:27 -0700)]
Simplify ofproto_controller_info by using a struct smap in place of array.

The only client for ofproto_controller_info was transforming the array of
pairs into an smap anyway.  It's easy for the code that fills in the array
to generate it as an smap directly, and it's also easier to extend later.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agopinsched: Report queued packet count correctly.
Ben Pfaff [Thu, 17 Jul 2014 17:23:52 +0000 (10:23 -0700)]
pinsched: Report queued packet count correctly.

'n_txq' was initialized to 0 and never modified, so pinsched_count_txqlen()
always returned 0.  Instead, return the correct number.

This only affected the results of "ovs-appctl memory/show", and only if
controller rate limiting was turned on, so it is not a serious bug.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agovswitch.xml: Fix typo in documentation.
Ben Pfaff [Wed, 16 Jul 2014 20:34:22 +0000 (13:34 -0700)]
vswitch.xml: Fix typo in documentation.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agostp: Make stp-disabled port forward stp bpdu packets.
Alex Wang [Wed, 16 Jul 2014 01:52:19 +0000 (18:52 -0700)]
stp: Make stp-disabled port forward stp bpdu packets.

Commit 0d1cee123a84 (stp: Fix bpdu tx problem in listening state)
makes ovs drop the stp bpdu packets if stp is not enabled on the
input port.  However, when pif bridge is used and stp is enabled
on the integration bridge.  The flow translation of stp bpdu
packets will go through a level of resubmission which changes
the input port to the corresponding peer port.  Since, the
patch port on the pif bridge does not have stp enabled, the
flow translation will drop the bpdu packets.

This commit fixes the issue by making ovs forward stp bpdu packets
on stp-disabled port.

VMware-BZ: #1284695

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agocmap: Fix cmap_next_position()
Daniele Di Proietto [Wed, 16 Jul 2014 17:46:20 +0000 (10:46 -0700)]
cmap: Fix cmap_next_position()

cmap_next_position() didn't update the node pointer while iterating through a
list of nodes with the same hash.
This commit fixes the bug and improve test-cmap to detect it.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>