cascardo/ovs.git
8 years agodp-packet: Add private data
Pravin B Shelar [Wed, 18 May 2016 00:32:17 +0000 (17:32 -0700)]
dp-packet: Add private data

This scratchpad can be used by any layer to keep private data.
STT will use it for TCP reassembly state.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
8 years agonetdev: Return number of packet from netdev_pop_header()
Pravin B Shelar [Wed, 18 May 2016 00:32:06 +0000 (17:32 -0700)]
netdev: Return number of packet from netdev_pop_header()

Current tunnel-pop API does not allow the netdev implementation
retain a packet but STT can keep a packet from batch of packets
during TCP reassembly processing. To return exact count of
valid packet STT need to pass this number of packet parameter
as a reference.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
8 years agonetdev-vport: Factor-out tunnel Push-pop code into separate module.
Pravin B Shelar [Wed, 18 May 2016 00:31:33 +0000 (17:31 -0700)]
netdev-vport: Factor-out tunnel Push-pop code into separate module.

It is better to move tunnel push-pop action specific functions into
separate module.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
8 years agonat: documentation and parsing fixes.
Jarno Rajahalme [Wed, 18 May 2016 23:28:36 +0000 (16:28 -0700)]
nat: documentation and parsing fixes.

Add the missing NAT documentation to ovs-ofctl man page and add
validation of the NAT flags to NAT action decoding and parsing.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
8 years agoovs-dev.py: Update for python3.
Joe Stringer [Sat, 14 May 2016 22:08:08 +0000 (15:08 -0700)]
ovs-dev.py: Update for python3.

Adapt to python-2.6+, including support for 3.

Signed-off-by: Joe Stringer <joe@ovn.org>
8 years agoovs-dev.py: PEP-8ify.
Joe Stringer [Sat, 14 May 2016 21:18:27 +0000 (14:18 -0700)]
ovs-dev.py: PEP-8ify.

Signed-off-by: Joe Stringer <joe@ovn.org>
8 years agotests: Enable color output for unit tests, if available.
Flavio Fernandes [Wed, 18 May 2016 15:00:49 +0000 (11:00 -0400)]
tests: Enable color output for unit tests, if available.

Reference thread in mailing list:
http://openvswitch.org/pipermail/discuss/2016-May/021339.html

Signed-off-by: Flavio Fernandes <flavio@flaviof.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agoovs-vtep: Support running multiple ovs-vtep processes
nickcooper-zhangtonghao [Fri, 6 May 2016 03:07:57 +0000 (23:07 -0400)]
ovs-vtep: Support running multiple ovs-vtep processes

Include ovs-vtep physical switch name as part of logical switch name to
support running multiple ovs-vtep processes sharing the same ovsdb and vswitchd.

Signed-off-by: nickcooper-zhangtonghao <nickcooper-zhangtonghao@opencloud.tech>
Tested-by: Darrell Ball <dlu998@gmail.com>
Acked-by: Justin Pettit <jpettit@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agotests: Add test for partial map updates.
Edward Aymerich [Mon, 2 May 2016 20:07:20 +0000 (14:07 -0600)]
tests: Add test for partial map updates.

Insert basic functionality for testing partial map updates
and add a new test table named "simple2".

Signed-off-by: Edward Aymerich <edward.aymerich@hpe.com>
Signed-off-by: Arnoldo Lutz <arnoldo.lutz.guevara@hpe.com>
Co-authored-by: Arnoldo Lutz <arnoldo.lutz.guevara@hpe.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agoovsdb-idlc.in: Autogenerate partial map updates functions.
Edward Aymerich [Mon, 2 May 2016 20:01:46 +0000 (14:01 -0600)]
ovsdb-idlc.in: Autogenerate partial map updates functions.

Code inserted that autogenerates corresponding map functions to set and
delete elements in map columns.
Inserts description to the functions that are autogenerated.

Signed-off-by: Edward Aymerich <edward.aymerich@hpe.com>
Signed-off-by: Arnoldo Lutz <arnoldo.lutz.guevara@hpe.com>
Co-authored-by: Arnoldo Lutz <arnoldo.lutz.guevara@hpe.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agoovsdb-idl: Add partial map updates functionality.
Edward Aymerich [Mon, 2 May 2016 19:59:44 +0000 (13:59 -0600)]
ovsdb-idl: Add partial map updates functionality.

In the current implementation, every time an element of either a map or set
column has to be modified, the entire content of the column is sent to the
server to be updated. This is not a major problem if the information contained
in the column for the corresponding row is small, but there are cases where
these columns can have a significant amount of elements per row, or these
values are updated frequently, therefore the cost of the modifications becomes
high in terms of time and bandwidth.

In this solution, the ovsdb-idl code is modified to use the RFC 7047 'mutate'
operation, to allow sending partial modifications on map columns to the server.
The functionality is exposed to clients in the vswitch idl. This was
implemented through map operations.

A map operation is defined as an insertion, update or deletion of a key-value
pair inside a map. The idea is to minimize the amount of map operations
that are send to the OVSDB server when a transaction is committed.

In order to keep track of the requested map operations, structs map_op and
map_op_list were defined with accompanying functions to manipulate them. These
functions make sure that only one operation is send to the server for each
key-value that wants to be modified, so multiple operation on a key value are
collapsed into a single operation.

As an example, if a client using the IDL updates several times the value for
the same key, the functions will ensure that only the last value is send to
the server, instead of multiple updates. Or, if the client inserts a key-value,
and later on deletes the key before committing the transaction, then both
actions cancel out and no map operation is send for that key.

To keep track of the desired map operations on each transaction, a list of map
operations (struct map_op_list) is created for every column on the row on which
a map operation is performed. When a new map operation is requested on the same
column, the corresponding map_op_list is checked to verify if a previous
operations was performed on the same key, on the same transaction. If there is
no previous operation, then the new operation is just added into the list. But
if there was a previous operation on the same key, then the previous operation
is collapsed with the new operation into a single operation that preserves the
final result if both operations were to be performed sequentially. This design
keep a small memory footprint during transactions.

When a transaction is committed, the map operations lists are checked and
all map operations that belong to the same map are grouped together into a
single JSON RPC "mutate" operation, in which each map_op is transformed into
the necessary "insert" or "delete" mutators. Then the "mutate" operation is
added to the operations that will be send to the server.

Once the transaction is finished, all map operation lists are cleared and
deleted, so the next transaction starts with a clean board for map operations.

Using different structures and logic to handle map operations, instead of
trying to force the current structures (like 'old' and 'new' datums in the row)
to handle then, ensures that map operations won't mess up with the current
logic to generate JSON messages for other operations, avoids duplicating the
whole map for just a few changes, and is faster for insert and delete
operations, because there is no need to maintain the invariants in the 'new'
datum.

Signed-off-by: Edward Aymerich <edward.aymerich@hpe.com>
Signed-off-by: Arnoldo Lutz <arnoldo.lutz.guevara@hpe.com>
Co-authored-by: Arnoldo Lutz <arnoldo.lutz.guevara@hpe.com>
[blp@ovn.org made style changes and factored out error checking]
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agodpif-netlink: Only warn when OVS datapath Netlink family is unavailable.
Ciara Loftus [Tue, 17 May 2016 13:28:39 +0000 (14:28 +0100)]
dpif-netlink: Only warn when OVS datapath Netlink family is unavailable.

OVS using DPDK (or the userspace datapath without DPDK) can still function
correctly without the module loaded.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agonetdev: Initialise DPDK netdev classes only once
Ciara Loftus [Tue, 17 May 2016 13:28:38 +0000 (14:28 +0100)]
netdev: Initialise DPDK netdev classes only once

DPDK netdev classes were being initialised twice, resulting in warning
logs like so:

netdev|WARN|attempted to register duplicate netdev provider: dpdk

This commit removes one of the initialisation calls.

Fixes: 0692257923fe ("netdev: Fix potential deadlock.")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agoappveyor: Update OpenSSL version
Alin Serdean [Wed, 11 May 2016 20:49:09 +0000 (20:49 +0000)]
appveyor: Update OpenSSL version

OpenSSL version changed from 1.0.2g to 1.0.2h this patch bumps the
version.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Acked-by: Sairam Venugopal <vsairam@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agoofproto-dpif-xlate: Fix compilation with GCC 4.6.
Ben Pfaff [Tue, 17 May 2016 23:29:39 +0000 (16:29 -0700)]
ofproto-dpif-xlate: Fix compilation with GCC 4.6.

Without this change, GCC 4.6 reports:

ofproto/ofproto-dpif-xlate.c: In function ‘xlate_actions’:
ofproto/ofproto-dpif-xlate.c:5117:27: error: missing initializer
ofproto/ofproto-dpif-xlate.c:5117:27: error: (near initialization for
    ‘(anonymous).masks.vlan_tci’)

Reported-by: Joe Stringer <joe@ovn.org>
Reported-at: https://travis-ci.org/openvswitch/ovs/builds/130256491
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
8 years agotests: Add support for helgrind thread error detector.
William Tu [Sat, 30 Apr 2016 05:13:46 +0000 (22:13 -0700)]
tests: Add support for helgrind thread error detector.

Helgrind is a Valgrind tool for detecting thread errors, reporting three
classes of errors: misuses of the POSIX pthreads API, potential deadlocks
arising from lock ordering problems, and data races -- accessing memory
without adequate locking.  Similar to valgrind, users do "make check-helgrind"
and results will be saved at tests/testsuite.dir/<N>/helgrind.*.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agotests: Remove redundant ofport_request.
William Tu [Fri, 29 Apr 2016 17:11:25 +0000 (10:11 -0700)]
tests: Remove redundant ofport_request.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agopinctrl: Fix "sparse" warning.
Ben Pfaff [Tue, 17 May 2016 14:44:06 +0000 (07:44 -0700)]
pinctrl: Fix "sparse" warning.

The ofport member should be an ofp_port_t, since it represents an OpenFlow
port number.

Fixes: 0ee8aaf658dd ("ovn: Send GARP on localnet.")
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
8 years agovtep: Add other_config to Global table.
Dennis Sam [Wed, 11 May 2016 18:51:29 +0000 (11:51 -0700)]
vtep: Add other_config to Global table.

Extend the Global table to allow for additional configurations by re-using
the idea of an other_config column.

Signed-off-by: Dennis Sam <dsam@arista.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agoofproto-dpif-upcall: Fix UFID usage with flow_modify.
Joe Stringer [Fri, 13 May 2016 21:17:12 +0000 (14:17 -0700)]
ofproto-dpif-upcall: Fix UFID usage with flow_modify.

As per the delete_op_init{,__}() functions, the UFID should only be
passed down if ukey->ufid_present is set. Otherwise it is possible to
request a flow modification only using a UFID in a datapath that doesn't
support UFID, which will fail.

Fixes: 43b2f131a229 ("ofproto: Allow in-place modifications of datapath flows.")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
8 years agodpctl: Sort port listing in "show" command.
Justin Pettit [Thu, 12 May 2016 00:28:54 +0000 (17:28 -0700)]
dpctl: Sort port listing in "show" command.

The port listing did not consistently print in the same order.  While it
is a better user experience to see the ports printed in order, more
importantly, this fixes a unit test ("dpctl - add-if set-if del-if")
that would occasionally fail due to expecting that the ports are printed
in order.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
8 years agodatapath-windows: Validate Netlink packets' integrity.
Paul Boca [Wed, 27 Apr 2016 08:05:47 +0000 (08:05 +0000)]
datapath-windows: Validate Netlink packets' integrity.

Solved access violation when trying to access Netlink message - obtained
with forged IOCTLs.

Signed-off-by: Paul-Daniel Boca <pboca@cloudbasesolutions.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agoclassifier: Use ccmaps for staged lookup indices.
Jarno Rajahalme [Sat, 23 Apr 2016 02:40:09 +0000 (19:40 -0700)]
classifier: Use ccmaps for staged lookup indices.

Use the new ccmap type instead of cmap for staged lookup indices to
fix the problem with slow removal of rules with large number of
duplicates.  This was problematic especially when many rules shared
the same match in packet metadata (e.g., a port number, but nothing
else), causing a large number of duplicates to be inserted into the
staged lookup index.  ccmap only keeps the count of inserted (hash)
values, so duplicates do not add any performance penalty.

Reported-by: Alok Kumar Maurya <alok-kumar.maurya@hpe.com>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
8 years agolib: Add new 'counting cmap' type.
Jarno Rajahalme [Sat, 23 Apr 2016 02:40:09 +0000 (19:40 -0700)]
lib: Add new 'counting cmap' type.

cmap implements duplicates as linked lists, which causes removal of
rules to become (O^2) with large number of duplicates.  This patch
fixes the problem by introducing a new 'counting' variant of the cmap
(ccmap), which can be efficiently used to keep counts of inserted hash
values provided by the caller.  This does not require a node in the
user data structure, so this makes the user implementation a bit more
memory efficient, too.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
8 years agoovn: Fix localnet ports deletion and recreation sometimes after restart.
Ramu Ramamurthy [Fri, 29 Apr 2016 00:23:59 +0000 (20:23 -0400)]
ovn: Fix localnet ports deletion and recreation sometimes after restart.

On graceful restart of ovn-controller, the chassis row is inserted in the
Chassis table. During this transaction, there is a window of time where an
idl row-read may not return the newly created row - even though the row
should exist, but the transaction is in an incomplete state.  As a result,
get_chassis() in binding_run() returns a null chassis record binding_run
exits early, and does not create local_datapaths, and patch_run deletes
localnet patch ports. In a later run, the localnet patch ports are
recreated.

This is reproducable consistently but not on every restart.  The fix is to
handle the case that the chassis record may be null in binding_run, and yet
create local_datapaths.

Restart logs follow with commentary:

2016-04-28T18:35:42.448Z|00001|vlog|INFO|opened log file /home/ovs/ovs/tests/testsuite.dir/2035/hv/ovn-controller.log
2016-04-28T18:35:42.449Z|00002|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/db.sock: connecting...
2016-04-28T18:35:42.449Z|00003|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/db.sock: connected
2016-04-28T18:35:42.452Z|00004|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/ovn-sb/ovn-sb.sock: connecting...
2016-04-28T18:35:42.452Z|00005|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/ovn-sb/ovn-sb.sock: connected
2016-04-28T18:35:42.454Z|00006|ovsdb_idl|INFO|ovsdb_idl_txn_insert:
                Chassis row inserted into transaction above
2016-04-28T18:35:42.454Z|00007|binding|INFO|Claiming lport localvif2 for this chassis.
2016-04-28T18:35:42.454Z|00008|binding|INFO|Claiming lport localvif3 for this chassis.
2016-04-28T18:35:42.454Z|00009|binding|INFO|Claiming lport localcif4 for this chassis.
2016-04-28T18:35:42.454Z|00010|binding|INFO|Claiming lport localcif5 for this chassis.
2016-04-28T18:35:42.454Z|00011|binding|INFO|Claiming lport localcif1 for this chassis.
2016-04-28T18:35:42.454Z|00012|binding|INFO|Claiming lport localvif1 for this chassis.
2016-04-28T18:35:42.454Z|00013|binding|INFO|Claiming lport localvif201 for this chassis.
2016-04-28T18:35:42.454Z|00014|binding|INFO|Claiming lport localcif3 for this chassis.
2016-04-28T18:35:42.454Z|00015|binding|INFO|Claiming lport localcif2 for this chassis.
               Binding run found the chassis record and has claimed the vifs
2016-04-28T18:35:42.455Z|00016|ofctrl|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting to switch
2016-04-28T18:35:42.455Z|00017|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting...
2016-04-28T18:35:42.455Z|00018|pinctrl|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting to switch
2016-04-28T18:35:42.456Z|00019|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting...
2016-04-28T18:35:42.457Z|00020|ovsdb_idl|INFO|ovsdb_idl_row_clear_new:
                At this point read of Chassis table returns no rows, and
                the transaction status is still incomplete.
2016-04-28T18:35:42.457Z|00021|binding|INFO|no chassis rec!
                Binding run exits early because chassis_rec was null
2016-04-28T18:35:42.459Z|00022|patch|INFO|removing port patch-br-int-to-localnet201
2016-04-28T18:35:42.459Z|00023|patch|INFO|removing port patch-br-int-to-localnet1
2016-04-28T18:35:42.459Z|00024|patch|INFO|removing port patch-localnet1-to-br-int
2016-04-28T18:35:42.459Z|00025|patch|INFO|removing port patch-localnet201-to-br-int
               Localnet ports are removed above, because local_datapaths dont exist
2016-04-28T18:35:42.459Z|00026|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connected
2016-04-28T18:35:42.460Z|00027|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connected
2016-04-28T18:35:42.460Z|00028|ovsdb_idl|INFO|ovsdb_idl_row_create:
               Now, the transaction is complete

Signed-off-by: Ramu Ramamurthy <ramu.ramamurthy@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agoovn: Send GARP on localnet.
Ramu Ramamurthy [Tue, 26 Apr 2016 21:31:07 +0000 (17:31 -0400)]
ovn: Send GARP on localnet.

In some use cases such as VM migration or when VMs reuse IP addresses, VMs
become unreachable externally because external switches/routers on localnet
have stale port-mac or ARP caches. The problem resolves after some time
when the caches ageout which could be minutes for port-mac bindings or
hours for ARP caches.

To fix this, send some gratuitous ARPs when a logical port on a localnet
datapath gets added. Such gratuitous ARPs help on a best-effort basis to
update the mac-port bindings and ARP caches of external switches and
routers on the localnet.

Reported-at: https://bugs.launchpad.net/networking-ovn/+bug/1545897
Reported-by: Kyle Mestery <mestery@mestery.com>
Signed-off-by: Ramu Ramamurthy <ramu.ramamurthy@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agoovn: Move extract_lport_addresses
Ramu Ramamurthy [Tue, 26 Apr 2016 21:31:06 +0000 (17:31 -0400)]
ovn: Move extract_lport_addresses

Move the function extract_lport_addresses to a file
in ovn/lib since that function can be used by ovn-controller also
to parse addresses stored in the mac column of the
port_binding table. Currently that function is used only
in ovn_northd.

Signed-off-by: Ramu Ramamurthy <ramu.ramamurthy@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agodaemon-unix: Properly handle missing users or groups.
Christian Ehrhardt [Mon, 25 Apr 2016 07:12:19 +0000 (09:12 +0200)]
daemon-unix: Properly handle missing users or groups.

From the manpages of getgrnam_r (getpwnam_r is similar):
"If no matching group record was found, these functions return 0 and
store NULL in *result."

The code checked only against errors, but non existing users didn't set
e != 0 therefore the code could try to set arbitrary uid/gid values.

Fixes: e91b927d lib/daemon: support --user option for all OVS daemon
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agomcast-snooping: Trigger revalidation when adding a new multicast group.
Ben Pfaff [Mon, 16 May 2016 20:13:36 +0000 (13:13 -0700)]
mcast-snooping: Trigger revalidation when adding a new multicast group.

Otherwise it takes a long time for flows to be updated when a new group
entry is added.

Reported-by: "O'Reilly, Darragh" <darragh.oreilly@hpe.com>
Reported-at: http://openvswitch.org/pipermail/discuss/2016-May/021224.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: "O'Reilly, Darragh" <darragh.oreilly@hpe.com>
Tested-at: http://openvswitch.org/pipermail/discuss/2016-May/021244.html

8 years agoacinclude.m4: Fix skb_get_hash function detection
Markos Chandras [Tue, 10 May 2016 08:21:00 +0000 (09:21 +0100)]
acinclude.m4: Fix skb_get_hash function detection

Commit e2f3178f0582 ("datapath: Add support for kernel 3.14.") added
support for 3.14 kernels and a new OVS_GREP_IFELSE check for the
"skg_get_hash" function in the process. "skb_get_hash" was introduced
in the Linux kernel commit 3958afa1b272 ("net: Change skb_get_rxhash to
skb_get_hash") which exists in >=3.14 but the OVS_GREP_IFELSE macro
also matches the "skb_get_hash_raw" function which exists in older
kernels. As a result of which, the check makes the build system
behave as if the "skb_get_hash" function is available in these older
kernels leading to build failures. We fix this by explicitly checking
for "skb_get_hash(" which matches the function definition.

Signed-off-by: Markos Chandras <mchandras@suse.de>
Signed-off-by: Jesse Gross <jesse@kernel.org>
8 years agoovsdb-server: Fix memory leak reported by Valgind.
William Tu [Fri, 13 May 2016 17:33:07 +0000 (10:33 -0700)]
ovsdb-server: Fix memory leak reported by Valgind.

Reported by test 1657: ovsdb-server/add-db and remove-db.
  ds_put_format (dynamic-string.c:142)
  query_db_remotes (ovsdb-server.c:798)
  reconfigure_remotes (ovsdb-server.c:988)
  main_loop (ovsdb-server.c:156)

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agoovn-controller: Fix errors reported by Valgrind.
William Tu [Fri, 13 May 2016 18:58:43 +0000 (11:58 -0700)]
ovn-controller: Fix errors reported by Valgrind.

Fix two errors reported by test 2026: ovn -- 3 HVs, 1 LS, 3 lports/HV.
1. Conditional jump or move depends on uninitialised value(s)
    physical_run (physical.c:366)
    main (ovn-controller.c:382)
2. Use of uninitialised value of size 8
    bitmap_set1 (bitmap.h:97)
    update_ct_zones (binding.c:115)
    binding_run (binding.c:228)
    main (ovn-controller.c:362)

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agoofproto-dpif-xlate: Always generate wildcards.
Ben Pfaff [Sat, 23 Apr 2016 00:45:03 +0000 (17:45 -0700)]
ofproto-dpif-xlate: Always generate wildcards.

Until now, the flow translation code has tried to avoid constructing a
set of wildcards during translation in the cases where it can, because
wildcards are large and somewhat expensive.  However, this has problems
that we hadn't previously realized.  Specifically, the generated actions
can depend on the constructed wildcards, to decide which bits of a field
need to be set in a masked set_field action.  This means that in practice
translation needs to always construct the wildcards.

(It might be possible to avoid masked set_field when we're not constructing
wildcards, but this would mean that we'd generate different actions
depending on whether wildcards were being constructed, which seems rather
confusing at best.  Also, the cases in which we don't need wildcards anyway
are fairly obscure, meaning that the benefits of avoiding them in those
cases are minimal and that it's going to be hard to get test coverage.  The
latter is probably why we didn't notice this until now.)

Reported-by: William Tu <u9012063@gmail.com>
Reported-at: http://openvswitch.org/pipermail/dev/2016-April/069219.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Tested-by: William Tu <u9012063@gmail.com>
8 years agonetdev-dpdk: Fix locking during get_stats.
Joe Stringer [Tue, 10 May 2016 22:50:42 +0000 (15:50 -0700)]
netdev-dpdk: Fix locking during get_stats.

Clang complains:
lib/netdev-dpdk.c:1860:1: error: mutex 'dev->mutex' is not locked on every path
      through here [-Werror,-Wthread-safety-analysis]
}
^
lib/netdev-dpdk.c:1815:5: note: mutex acquired here
    ovs_mutex_lock(&dev->mutex);
    ^
./include/openvswitch/thread.h:60:9: note: expanded from macro 'ovs_mutex_lock'
        ovs_mutex_lock_at(mutex, OVS_SOURCE_LOCATOR)
        ^

Fixes: d6e3feb57c44 ("Add support for extended netdev statistics based on RFC 2819.")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
8 years agotests: Add valgrind targets for ovn utilities and dameons.
Gurucharan Shetty [Thu, 12 May 2016 15:22:04 +0000 (08:22 -0700)]
tests: Add valgrind targets for ovn utilities and dameons.

Signed-off-by: Gurucharan Shetty <guru@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
8 years agoofproto-dpif-upcall: Pass key to dpif_flow_get().
Joe Stringer [Tue, 10 May 2016 22:42:01 +0000 (15:42 -0700)]
ofproto-dpif-upcall: Pass key to dpif_flow_get().

Windows datapath folks have reported instances where OVS userspace will
pass down a flow_get request to the datapath using a UFID even though the
datapath has no support for UFIDs. Since commit e672ff9b4d22
("ofproto-dpif: Restore metadata and registers on recirculation."), if a
flow dump provides a flow that userspace isn't aware of, and the flow
dump doesn't provide actions for that flow, then userspace will attempt
a flow_get using just the UFID. This is because the ofproto-dpif layer
doesn't pass the key down to the dpif layer even if it's available.
Prior to the above commit, the codepath was only hit if the key was not
available, which would have implied UFID support. This assumption is now
broken: An empty set of actions could also trigger flow_get, and
datapaths without UFID support are free to pass up empty actions lists.

Pass down the flow key if available, and don't pass down the UFID if
unavailable to be more consistent with the usage of other dpif APIs
within this file.

Fixes: e672ff9b4d22 ("ofproto-dpif: Restore metadata and registers on recirculation.")
Reported-by: Sairam Venugopal <vsairam@vmware.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
8 years agovtep: Add source node replication support.
Darrell Ball [Sat, 7 May 2016 16:21:21 +0000 (09:21 -0700)]
vtep: Add source node replication support.

This patch updates the vtep schema, vtep-ctl commands and vtep simulator
to support source node replication in addition to service node
replication per logical switch.  The default replication mode is service
node as that was the only mode previously supported.  Source node
replication mode is optionally configurable and clearing the replication
mode implicitly sets the replication mode back to a default of service
node.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Acked-by: Bruce Davie <bdavie@vmware.com>
Acked-by: Anupam Chanda <achanda@vmware.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
8 years agoofproto-dpif-xlate: fix for group liveness propagation
László Sürü [Wed, 11 May 2016 08:46:33 +0000 (08:46 +0000)]
ofproto-dpif-xlate: fix for group liveness propagation

According to OpenFlow v1.3.5 specification a group is considered live,
if it has at least one live bucket in it.  (6.5 Group Table
Modification Messages: "A group is considered live if a least one of
its buckets is live.")

However, OVS implementation incorrectly returns group as live when no
live bucket is found in group_is_alive() function of
ofproto-dpif-xlate.c.

Instead it should return true only if a live bucket is found (that is
!= NULL).

Signed-off-by: László Sűrű <laszlo.suru@ericsson.com>
Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
8 years agotests: Fix tunnel push pop test failure.
Pravin B Shelar [Wed, 11 May 2016 17:46:30 +0000 (10:46 -0700)]
tests: Fix tunnel push pop test failure.

Sort the list of arp entries to get predictable output.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
8 years agoofproto-dpif: Restore packet metadata when a continuation is resumed.
Numan Siddique [Tue, 10 May 2016 23:04:35 +0000 (16:04 -0700)]
ofproto-dpif: Restore packet metadata when a continuation is resumed.

Recirculations due to NXT_RESUME are failing if the packet metadata is not
restored prior to the packet execution.

Reported-at: http://openvswitch.org/pipermail/dev/2016-May/070723.html
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
8 years agoutil: Pass 128-bit arguments directly instead of using pointers.
Justin Pettit [Wed, 4 May 2016 01:20:51 +0000 (18:20 -0700)]
util: Pass 128-bit arguments directly instead of using pointers.

Commit f2d105b5 (ofproto-dpif-xlate: xlate ct_{mark, label} correctly.)
introduced the ovs_u128_and() function.  It directly takes ovs_u128
values as arguments instead of pointers to them.  As this is a bit more
direct way to deal with 128-bit values, modify the other utility
functions to do the same.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
8 years agosystem-traffic: Wait for availability of ftpd.
Joe Stringer [Thu, 5 May 2016 01:01:06 +0000 (18:01 -0700)]
system-traffic: Wait for availability of ftpd.

Some FTP tests had intermittent failures because the FTP daemons
might not load before the testsuite script iterated to running the
client. Add checks after launching FTP daemons to make these tests more
resilient.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
8 years agosystem-traffic: Wait for IPv6 connectivity.
Joe Stringer [Thu, 5 May 2016 01:01:05 +0000 (18:01 -0700)]
system-traffic: Wait for IPv6 connectivity.

Several of the tests have race conditions where the next step in the
test may run before the kernel actually provides IPv6 connectivity.
This causes intermittent testsuite failures. Some existing tests
would even sleep in an attempt to mitigate this issue.

Improve the resilience of these tests by waiting until IPv6 or FTP
connectivity are ready. This speeds the testsuite up by a couple of
percent.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
8 years agosystem-traffic: Drop auto ct helpers in namespaces.
Joe Stringer [Thu, 5 May 2016 01:01:03 +0000 (18:01 -0700)]
system-traffic: Drop auto ct helpers in namespaces.

Automatic helper assignment in conntrack can trigger an upstream bug
where namespace deletion followed by immediate unload of conntrack
helper modules may cause kernel crashes. Disable automatic helper
assignment within created namespaces to avoid this issue.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
8 years agotnl-neigh-cache: check for arp expiration.
Pravin B Shelar [Mon, 25 Apr 2016 22:58:33 +0000 (15:58 -0700)]
tnl-neigh-cache: check for arp expiration.

The neighbor entry expiry is only checked in dpif-poll
event handler, But in absence of any event we could keep
using arp entry forever. This patch changes it to check
expiration on each lookup.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
8 years agonetdev: Fix potential deadlock.
Ben Pfaff [Sat, 23 Apr 2016 00:03:22 +0000 (17:03 -0700)]
netdev: Fix potential deadlock.

Until now, netdev_class_mutex and route_table_mutex could be taken in
either order:

    * netdev_run() takes netdev_class_mutex, then netdev_vport_run() calls
      route_table_run(), which takes route_table_mutex.

    * route_table_init() takes route_table_mutex and then eventually calls
      netdev_open(), which takes netdev_class_mutex.

This commit fixes the problem by converting the netdev_classes hmap,
protected by netdev_class_mutex, into a cmap protected on the read
side by RCU.  Only a very small amount of code actually writes to the
cmap in question, so it's a lot easier to understand the locking rules
at that point.  In particular, there's no need to take netdev_class_mutex
from either netdev_run() or netdev_open(), so neither of the code paths
above determines a lock ordering any longer.

Reported-by: William Tu <u9012063@gmail.com>
Reported-at: http://openvswitch.org/pipermail/discuss/2016-February/020216.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Tested-by: William Tu <u9012063@gmail.com>
8 years agocmap: New macro CMAP_INITIALIZER, for initializing an empty cmap.
Ben Pfaff [Fri, 22 Apr 2016 23:51:03 +0000 (16:51 -0700)]
cmap: New macro CMAP_INITIALIZER, for initializing an empty cmap.

Sometimes code is much simpler if we can statically initialize data
structures.  Until now, this has not been possible for cmap-based data
structures, so this commit introduces a CMAP_INITIALIZER macro.

This works by adding a singleton empty cmap_impl that simply forces the
first insertion into any cmap that points to it to allocate a real
cmap_impl.  There could be some risk that rogue code modifies the
singleton, so for safety it is also marked 'const' to allow the linker to
put it into a read-only page.

This adds a new OVS_ALIGNED_VAR macro with GCC and MSVC implementations.
The latter is based on Microsoft webpages, so developers who know Windows
might want to scrutinize it.

As examples of the kind of simplification this can make possible, this
commit removes an initialization function from ofproto-dpif-rid.c and a
call to cmap_init() from tnl-neigh-cache.c.  An upcoming commit will add
another user.

CC: Jarno Rajahalme <jarno@ovn.org>
CC: Gurucharan Shetty <guru@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
8 years agoofproto-dpif: Do not count resubmit to later tables against limit.
Ben Pfaff [Thu, 21 Apr 2016 17:50:17 +0000 (10:50 -0700)]
ofproto-dpif: Do not count resubmit to later tables against limit.

Open vSwitch must ensure that flow translation takes a finite amount of
time.  Until now it has implemented this by limiting the depth of
recursion.  The initial limit, in version 1.0.1, was no recursion at all,
and then over the years it has increased to 8 levels, then 16, then 32,
and 64 for the last few years.  Now reports are coming in that 64 levels
are inadequate for some OVN setups.  The natural inclination would be to
double the limit again to 128 levels.

This commit attempts another approach.  Instead of increasing the limit,
it reduces the class of resubmits that count against the limit.  Since the
goal for the depth limit is to prevent an infinite amount of work, it's
not necessary to count resubmits that can't lead to infinite work.  In
particular, a resubmit from a table numbered x to a table y > x cannot do
this, because any OpenFlow switch has a finite number of tables.  Because
in fact a resubmit (or goto_table) from one table to a later table is the
most common form of an OpenFlow pipeline, I suspect that this will greatly
alleviate the pressure to increase the depth limit.

Reported-by: Guru Shetty <guru@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
8 years agoofproto-dpif: Rename "recurse" to "indentation".
Ben Pfaff [Thu, 21 Apr 2016 17:50:16 +0000 (10:50 -0700)]
ofproto-dpif: Rename "recurse" to "indentation".

The "recurse" member of struct xlate_in and struct xlate_ctx is used for
two purposes: to determine the amount of indentation in "ofproto/trace"
output and to limit the depth of recursion.  An upcoming commit will
separate these tasks, and so in preparation this commit renames "recurse"
to "indentation".

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
8 years agoovn-nbctl: Add sanity checking for lswitch-add.
Ben Pfaff [Sun, 8 May 2016 16:21:29 +0000 (09:21 -0700)]
ovn-nbctl: Add sanity checking for lswitch-add.

I don't think anyone really wants the painful behavior of creating multiple
logical switches with the same name to be the default.  This commit retains
the possibility of doing that in case someone really wants it, but refuses
by default for sanity.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
8 years agoovn-nbctl: Make error handling consistent with ovs-vsctl.
Ben Pfaff [Sun, 8 May 2016 16:21:41 +0000 (09:21 -0700)]
ovn-nbctl: Make error handling consistent with ovs-vsctl.

ovs-vsctl distinguishes between internal database inconsistencies, which
it logs, and errors in commands specified by the user, which cause fatal
exits.  ovn-nbctl wasn't as careful about this and tended to just log
everything.  This commit brings it up to the same standard as ovs-vsctl.

This commit also adds --if-exists and --may-exist options in the same kinds
of places as ovs-vsctl, to allow for scripting in cases where it's OK if
an operation has already occurred.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
8 years agoovn-nbctl: Mark lport-del commands as writing the database.
Ben Pfaff [Fri, 6 May 2016 17:54:04 +0000 (10:54 -0700)]
ovn-nbctl: Mark lport-del commands as writing the database.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
8 years agonetdev-dpdk: Print default vhost-sock-dir value & update documentation
Ciara Loftus [Fri, 6 May 2016 10:20:34 +0000 (11:20 +0100)]
netdev-dpdk: Print default vhost-sock-dir value & update documentation

When no vhost-sock-dir value is provided, print the default location.
Update the documentation to reflect the fact that vhost-sock-dir values
are now subdirectory loctions rather than full paths.

Fixes: d8a8f353c23e ("netdev-dpdk: Restrict vhost_sock_dir")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agoAdd support for extended netdev statistics based on RFC 2819.
mweglicx [Thu, 5 May 2016 08:46:01 +0000 (09:46 +0100)]
Add support for extended netdev statistics based on RFC 2819.

Implementation of new statistics extension for DPDK ports:
- Add new counters definition to netdev struct and open flow,
  based on RFC2819.
- Initialize netdev statistics as "filtered out"
  before passing it to particular netdev implementation
  (because of that change, statistics which are not
  collected are reported as filtered out, and some
  unit tests were modified in this respect).
- New statistics are retrieved using experimenter code and
  are printed as a result to ofctl dump-ports.
- New counters are available for OpenFlow 1.4+.
- Add new vendor id: INTEL_VENDOR_ID.
- New statistics are printed to output via ofctl only if those
  are present in reply message.
- Add new file header: include/openflow/intel-ext.h which
  contains new statistics definition.
- Extended statistics are implemented only for dpdk-physical
  and dpdk-vhost port types.
- Dpdk-physical implementation uses xstats to collect statistics.
- Dpdk-vhost implements only part of statistics (RX packet sized
  based counters).

Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
[blp@ovn.org made software devices more consistent]
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agoAdd change tracking documentation
RYAN D. MOATS [Fri, 22 Apr 2016 21:35:37 +0000 (16:35 -0500)]
Add change tracking documentation

Change tracking is a bit different from what someone with
"classic" database experience might expect, so let's add
the knowledged gained from the experience of making change
tracking work for incremental processing.

Signed-off-by: RYAN D. MOATS <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agoovn-sbctl: Display correct ovnsb sock location in help message.
Hui Kang [Tue, 19 Apr 2016 17:50:25 +0000 (13:50 -0400)]
ovn-sbctl: Display correct ovnsb sock location in help message.

Signed-off-by: Hui Kang <kangh@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agodpif-netdev: Fix dp_netdev_pmd_remove_flow().
Daniele Di Proietto [Tue, 3 May 2016 23:35:10 +0000 (16:35 -0700)]
dpif-netdev: Fix dp_netdev_pmd_remove_flow().

After removing a flow from the dpcls classifier there might still be
readers who have access to the flow, until the next grace period.

Setting flow->cr.mask to NULL can cause concurrent readers to crash,
so this commit avoids doing it.

The crash can be reproduced, for example, by invoking an operation
that cause datapath flows to be deleted (such as `ovs-appctl
upcall/enable-megaflows`) while traffic is running.

I think the assignment was intended just as a safety measure to catch
race conditions, and it should be safe to remove.

Here's a stack trace of a possible crash:

Program terminated with signal SIGSEGV, Segmentation fault.
rule=0x7f3ae8006190) at ../lib/dpif-netdev.c:4156
4156            if (OVS_UNLIKELY((value & *maskp++) != *keyp++)) {
(gdb) bt
rule=0x7f3ae8006190) at ../lib/dpif-netdev.c:4156
rules=0x7f3afa3f2e40, cnt=<optimized out>) at ../lib/dpif-netdev.c:4225
(pmd=pmd@entry=0x7f3afa3fc010, packets=packets@entry=0x7f3afa3fa420,
cnt=cnt@entry=32, keys=keys@entry=0x7f3afa3f6428,
batches=batches@entry=0x7f3afa3f4118,
n_batches=n_batches@entry=0x7f3afa3fa3b0)
    at ../lib/dpif-netdev.c:3483
(pmd=pmd@entry=0x7f3afa3fc010, packets=packets@entry=0x7f3afa3fa420,
cnt=<optimized out>, md_is_valid=md_is_valid@entry=false,
port_no=<optimized out>) at ../lib/dpif-netdev.c:3625
cnt=<optimized out>, packets=0x7f3afa3fa420, pmd=0x7f3afa3fc010) at
../lib/dpif-netdev.c:3642
rxq=<optimized out>, port=<optimized out>, port=<optimized out>) at
../lib/dpif-netdev.c:2574
../lib/dpif-netdev.c:2693
../lib/ovs-thread.c:340
pthread_create.c:312
../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Fixes: 361d808dd9e4("flow: Split miniflow's map.")
CC: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
8 years agoovn-northd: Add support for static_routes.
Steve Ruan [Tue, 3 May 2016 12:06:50 +0000 (07:06 -0500)]
ovn-northd: Add support for static_routes.

Logical patch ports are used to connect logical routers
together. Static routes are used to select between different logical router
ports when exiting a logical router.

Reported-by: Na Zhu <nazhu@cn.ibm.com>
Reported-by: Dustin Lundquist <dlundquist@linux.vnet.ibm.com>
Reported-at:
https://bugs.launchpad.net/networking-ovn/+bug/1545140
https://bugs.launchpad.net/networking-ovn/+bug/1539347

Signed-off-by: Steve Ruan <ruansx@cn.ibm.com>
[guru@ovn.org provided the unit test.]
Co-authored-by: Gurucharan Shetty <guru@ovn.org>
Signed-off-by: Gurucharan Shetty <guru@ovn.org>
8 years agocheck-kmod: Remove all OVS modules in this target.
Joe Stringer [Tue, 3 May 2016 22:44:15 +0000 (15:44 -0700)]
check-kmod: Remove all OVS modules in this target.

The make check-kmod target would previously attempt to only remove the
openvswitch module, which would fail if any vport modules were loaded.
Remove those modules too, to allow the target to proceed.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
8 years agoclassifier: Remove rare optimization case.
Jarno Rajahalme [Wed, 4 May 2016 20:00:06 +0000 (13:00 -0700)]
classifier: Remove rare optimization case.

This optimization applied when a staged lookup index would narrow down
to a single rule, which happens sometimes is simple test cases, but
presumably less often in more populated flow tables.  The result of
this optimization allowed a bit more general megaflows, but the bit
patterns produced were sometimes cryptic.  Finally, a later fix to a
more important performance problem does not allow for this
optimization any more, so remove it now.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Acked-by: Ben Pfaff <blp@ovn.org>
8 years agoclassifier: Remove logging.
Jarno Rajahalme [Wed, 4 May 2016 20:00:05 +0000 (13:00 -0700)]
classifier: Remove logging.

The only vlog line was a left over from debugging.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Acked-by: Ben Pfaff <blp@ovn.org>
8 years agoclassifier: Remove redundant index.
Jarno Rajahalme [Wed, 4 May 2016 20:00:05 +0000 (13:00 -0700)]
classifier: Remove redundant index.

The test for figuring out if the last index had the same fields as the
actual rules map as broken, resulting into keeping an unnecessary
index around.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Acked-by: Ben Pfaff <blp@ovn.org>
8 years agocompat: Remove skbuff header helper backports.
Joe Stringer [Tue, 3 May 2016 00:47:33 +0000 (17:47 -0700)]
compat: Remove skbuff header helper backports.

These have existed largely since v2.6.22, so it's well overdue.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
8 years agocompat: Remove unused ipv[46] backports.
Joe Stringer [Tue, 3 May 2016 00:47:32 +0000 (17:47 -0700)]
compat: Remove unused ipv[46] backports.

These pieces #if on kernel versions which are not supported since commit
f2ab1536ddbc ("compat: Backport conntrack strictly to v3.10+.")

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
8 years agocompat: Document nf_defrag_ipv[46] backport.
Joe Stringer [Mon, 2 May 2016 18:19:18 +0000 (11:19 -0700)]
compat: Document nf_defrag_ipv[46] backport.

Document how the IP(6) defrag backport works, and do minor style cleanups.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
8 years agodatapath: Fix template leak in error cases.
Joe Stringer [Mon, 2 May 2016 18:19:17 +0000 (11:19 -0700)]
datapath: Fix template leak in error cases.

Upstream commit:
    openvswitch: Fix template leak in error cases.

    Commit 2f3ab9f9fc23 ("openvswitch: Fix helper reference leak") fixed a
    reference leak on helper objects, but inadvertently introduced a leak on
    the ct template.

    Previously, ct_info.ct->general.use was initialized to 0 by
    nf_ct_tmpl_alloc() and only incremented when ovs_ct_copy_action()
    returned successful. If an error occurred while adding the helper or
    adding the action to the actions buffer, the __ovs_ct_free_action()
    cleanup would use nf_ct_put() to free the entry; However, this relies on
    atomic_dec_and_test(ct_info.ct->general.use). This reference must be
    incremented first, or nf_ct_put() will never free it.

    Fix the issue by acquiring a reference to the template immediately after
    allocation.

    Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action")
    Fixes: 2f3ab9f9fc23 ("openvswitch: Fix helper reference leak")
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: 90c7afc96cbb ("openvswitch: Fix template leak in error cases.")
Fixes: 11251c170d92 ("datapath: Allow attaching helpers to ct action")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
8 years agodatapath: Orphan skbs before IPv6 defrag
Joe Stringer [Mon, 2 May 2016 18:19:16 +0000 (11:19 -0700)]
datapath: Orphan skbs before IPv6 defrag

Upstream commit:
    openvswitch: Orphan skbs before IPv6 defrag

    This is the IPv6 counterpart to commit 8282f27449bf ("inet: frag: Always
    orphan skbs inside ip_defrag()").

    Prior to commit 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free
    clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would be
    cloned (implicitly orphaning) prior to queueing for reassembly. As such,
    when the IPv6 message is eventually reassembled, the skb->sk for all
    fragments would be NULL. After that commit was introduced, rather than
    cloning, the original skbs were queued directly without orphaning. The
    end result is that all frags except for the first and last may have a
    socket attached.

    This commit explicitly orphans such skbs during nf_ct_frag6_gather() to
    prevent BUG_ON(skb->sk) during a later call to ip6_fragment().

    kernel BUG at net/ipv6/ip6_output.c:631!
    [...]
    Call Trace:
     <IRQ>
     [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
     [<ffffffffa042c7c0>] ? do_output.isra.28+0x1b0/0x1b0 [openvswitch]
     [<ffffffff810bb8a2>] ? __lock_is_held+0x52/0x70
     [<ffffffffa042c587>] ovs_fragment+0x1f7/0x280 [openvswitch]
     [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
     [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50
     [<ffffffff81697ea0>] ? dst_discard_out+0x20/0x20
     [<ffffffff81697e80>] ? dst_ifdown+0x80/0x80
     [<ffffffffa042c703>] do_output.isra.28+0xf3/0x1b0 [openvswitch]
     [<ffffffffa042d279>] do_execute_actions+0x709/0x12c0 [openvswitch]
     [<ffffffffa04340a4>] ? ovs_flow_stats_update+0x74/0x1e0 [openvswitch]
     [<ffffffffa04340d1>] ? ovs_flow_stats_update+0xa1/0x1e0 [openvswitch]
     [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40
     [<ffffffffa042de75>] ovs_execute_actions+0x45/0x120 [openvswitch]
     [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch]
     [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40
     [<ffffffffa042def4>] ovs_execute_actions+0xc4/0x120 [openvswitch]
     [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch]
     [<ffffffffa04337f2>] ? key_extract+0x442/0xc10 [openvswitch]
     [<ffffffffa043b26d>] ovs_vport_receive+0x5d/0xb0 [openvswitch]
     [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
     [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
     [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
     [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50
     [<ffffffffa043c11d>] internal_dev_xmit+0x6d/0x150 [openvswitch]
     [<ffffffffa043c0b5>] ? internal_dev_xmit+0x5/0x150 [openvswitch]
     [<ffffffff8168fb5f>] dev_hard_start_xmit+0x2df/0x660
     [<ffffffff8168f5ea>] ? validate_xmit_skb.isra.105.part.106+0x1a/0x2b0
     [<ffffffff81690925>] __dev_queue_xmit+0x8f5/0x950
     [<ffffffff81690080>] ? __dev_queue_xmit+0x50/0x950
     [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
     [<ffffffff81690990>] dev_queue_xmit+0x10/0x20
     [<ffffffff8169a418>] neigh_resolve_output+0x178/0x220
     [<ffffffff81752759>] ? ip6_finish_output2+0x219/0x7b0
     [<ffffffff81752759>] ip6_finish_output2+0x219/0x7b0
     [<ffffffff817525a5>] ? ip6_finish_output2+0x65/0x7b0
     [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80
     [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50
     [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50
     [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40
     [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0
     [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0
     [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50
     [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80
     [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0
     [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50
     [<ffffffff817796cc>] icmpv6_push_pending_frames+0xac/0xe0
     [<ffffffff8177a4be>] icmpv6_echo_reply+0x42e/0x500
     [<ffffffff8177acbf>] icmpv6_rcv+0x4cf/0x580
     [<ffffffff81755ac7>] ip6_input_finish+0x1a7/0x690
     [<ffffffff81755925>] ? ip6_input_finish+0x5/0x690
     [<ffffffff817567a0>] ip6_input+0x30/0xa0
     [<ffffffff81755920>] ? ip6_rcv_finish+0x1a0/0x1a0
     [<ffffffff817557ce>] ip6_rcv_finish+0x4e/0x1a0
     [<ffffffff8175640f>] ipv6_rcv+0x45f/0x7c0
     [<ffffffff81755fe6>] ? ipv6_rcv+0x36/0x7c0
     [<ffffffff81755780>] ? ip6_make_skb+0x1c0/0x1c0
     [<ffffffff8168b649>] __netif_receive_skb_core+0x229/0xb80
     [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
     [<ffffffff8168c07f>] ? process_backlog+0x6f/0x230
     [<ffffffff8168bfb6>] __netif_receive_skb+0x16/0x70
     [<ffffffff8168c088>] process_backlog+0x78/0x230
     [<ffffffff8168c0ed>] ? process_backlog+0xdd/0x230
     [<ffffffff8168db43>] net_rx_action+0x203/0x480
     [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
     [<ffffffff817c156e>] __do_softirq+0xde/0x49f
     [<ffffffff81752768>] ? ip6_finish_output2+0x228/0x7b0
     [<ffffffff817c070c>] do_softirq_own_stack+0x1c/0x30
     <EOI>
     [<ffffffff8106f88b>] do_softirq.part.18+0x3b/0x40
     [<ffffffff8106f946>] __local_bh_enable_ip+0xb6/0xc0
     [<ffffffff81752791>] ip6_finish_output2+0x251/0x7b0
     [<ffffffff81754af1>] ? ip6_fragment+0xba1/0xc50
     [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80
     [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50
     [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50
     [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40
     [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0
     [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0
     [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50
     [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80
     [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0
     [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50
     [<ffffffff81778558>] rawv6_sendmsg+0xa28/0xe30
     [<ffffffff81719097>] ? inet_sendmsg+0xc7/0x1d0
     [<ffffffff817190d6>] inet_sendmsg+0x106/0x1d0
     [<ffffffff81718fd5>] ? inet_sendmsg+0x5/0x1d0
     [<ffffffff8166d078>] sock_sendmsg+0x38/0x50
     [<ffffffff8166d4d6>] SYSC_sendto+0xf6/0x170
     [<ffffffff8100201b>] ? trace_hardirqs_on_thunk+0x1b/0x1d
     [<ffffffff8166e38e>] SyS_sendto+0xe/0x10
     [<ffffffff817bebe5>] entry_SYSCALL_64_fastpath+0x18/0xa8
    Code: 06 48 83 3f 00 75 26 48 8b 87 d8 00 00 00 2b 87 d0 00 00 00 48 39 d0 72 14 8b 87 e4 00 00 00 83 f8 01 75 09 48 83 7f 18 00 74 9a <0f> 0b 41 8b 86 cc 00 00 00 49 8#
    RIP  [<ffffffff8175468a>] ip6_fragment+0x73a/0xc50
     RSP <ffff880072803120>

    Fixes: 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free clone
    operations")
Reported-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: 49e261a8a21e ("openvswitch: Orphan skbs before IPv6 defrag")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
8 years agocompat: nf_defrag_ipv6: fix NULL deref panic.
Joe Stringer [Mon, 2 May 2016 18:19:15 +0000 (11:19 -0700)]
compat: nf_defrag_ipv6: fix NULL deref panic.

Upstream commit:
    netfilter: ipv6: nf_defrag: fix NULL deref panic

    Valdis reports NULL deref in nf_ct_frag6_gather.
    Problem is bogus use of skb_queue_walk() -- we miss first skb in the list
    since we start with head->next instead of head.

    In case the element we're looking for was head->next we won't find
    a result and then trip over NULL iter.

    (defrag uses plain NULL-terminated list rather than one terminated by
     head-of-list-pointer, which is what skb_queue_walk expects).

    Fixes: 029f7f3b8701cc7a ("netfilter: ipv6: nf_defrag: avoid/free clone operations")
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Upstream: e97ac12859db ("netfilter: ipv6: nf_defrag: fix NULL deref panic")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
8 years agocompat: nf_defrag_ipv6: avoid nf_iterate recursion.
Joe Stringer [Mon, 2 May 2016 18:19:14 +0000 (11:19 -0700)]
compat: nf_defrag_ipv6: avoid nf_iterate recursion.

Upstream commit:
    netfilter: ipv6: avoid nf_iterate recursion

    The previous patch changed nf_ct_frag6_gather() to morph reassembled skb
    with the previous one.

    This means that the return value is always NULL or the skb argument.
    So change it to an err value.

    Instead of invoking NF_HOOK recursively with threshold to skip already-called hooks
    we can now just return NF_ACCEPT to move on to the next hook except for
    -EINPROGRESS (which means skb has been queued for reassembly), in which case we
    return NF_STOLEN.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Upstream: daaa7d647f81 ("netfilter: ipv6: avoid nf_iterate recursion")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
8 years agocompat: nf_defrag_ipv6: avoid/free clone operations.
Joe Stringer [Mon, 2 May 2016 18:19:13 +0000 (11:19 -0700)]
compat: nf_defrag_ipv6: avoid/free clone operations.

Upstream commit:
    netfilter: ipv6: nf_defrag: avoid/free clone operations

    commit 6aafeef03b9d9ecf
    ("netfilter: push reasm skb through instead of original frag skbs")
    changed ipv6 defrag to not use the original skbs anymore.

    So rather than keeping the original skbs around just to discard them
    afterwards just use the original skbs directly for the fraglist of
    the newly assembled skb and remove the extra clone/free operations.

    The skb that completes the fragment queue is morphed into a the
    reassembled one instead, just like ipv4 defrag.

    openvswitch doesn't need any additional skb_morph magic anymore to deal
    with this situation so just remove that.

    A followup patch can then also remove the NF_HOOK (re)invocation in
    the ipv6 netfilter defrag hook.

Cc: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Upstream: 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free clone operations")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
8 years agocompat: ipv6: Pass struct net into nf_ct_frag6_gather.
Joe Stringer [Mon, 2 May 2016 18:19:12 +0000 (11:19 -0700)]
compat: ipv6: Pass struct net into nf_ct_frag6_gather.

Upstream commit:
    ipv6: Pass struct net into nf_ct_frag6_gather

    The function nf_ct_frag6_gather is called on both the input and the
    output paths of the networking stack.  In particular ipv6_defrag which
    calls nf_ct_frag6_gather is called from both the the PRE_ROUTING chain
    on input and the LOCAL_OUT chain on output.

    The addition of a net parameter makes it explicit which network
    namespace the packets are being reassembled in, and removes the need
    for nf_ct_frag6_gather to guess.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: b72775977c39 ("ipv6: Pass struct net into nf_ct_frag6_gather")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
8 years agocompat: ipv4: Pass struct net into ip_defrag.
Joe Stringer [Mon, 2 May 2016 18:19:11 +0000 (11:19 -0700)]
compat: ipv4: Pass struct net into ip_defrag.

Upstream commit:
    ipv4: Pass struct net into ip_defrag and ip_check_defrag

    The function ip_defrag is called on both the input and the output
    paths of the networking stack.  In particular conntrack when it is
    tracking outbound packets from the local machine calls ip_defrag.

    So add a struct net parameter and stop making ip_defrag guess which
    network namespace it needs to defragment packets in.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: 19bcf9f203c8 ("ipv4: Pass struct net into ip_defrag and ip_check_defrag")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
8 years agocompat: Add a struct net parameter to l4_pkt_to_tuple.
Joe Stringer [Mon, 2 May 2016 18:19:10 +0000 (11:19 -0700)]
compat: Add a struct net parameter to l4_pkt_to_tuple.

Upstream commit:
    netfilter: nf_conntrack: Add a struct net parameter to l4_pkt_to_tuple

    As gre does not have the srckey in the packet gre_pkt_to_tuple
    needs to perform a lookup in it's per network namespace tables.

    Pass in the proper network namespace to all pkt_to_tuple
    implementations to ensure gre (and any similar protocols) can get this
    right.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Upstream: a31f1adc0948 ("netfilter: nf_conntrack: Add a struct net
parameter to l4_pkt_to_tuple")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
8 years agoflow: Fix flow_wc_map() for ICMPv6/IGMP type and code.
Daniele Di Proietto [Tue, 26 Apr 2016 02:01:47 +0000 (19:01 -0700)]
flow: Fix flow_wc_map() for ICMPv6/IGMP type and code.

flow_wc_map() should include 'tp_src' and 'tp_dst' for ICMPv6 and IGMP
packets, since they're used for type and code.

This caused installed flows in the userspace datapath to always have
ICMPv6 code and type wildcarded (there are no other users of this
function).

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
8 years agocompat: skbuff: Remove references to old kernels.
Joe Stringer [Fri, 29 Apr 2016 01:09:04 +0000 (18:09 -0700)]
compat: skbuff: Remove references to old kernels.

Since commit f2ab1536ddbc ("compat: Backport conntrack strictly to
v3.10+."), we haven't supported these kernel versions. Remove the old
code.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Simon Horman <simon.horman@netronome.com>
8 years agoFAQ: Update feature table.
Joe Stringer [Thu, 28 Apr 2016 21:39:09 +0000 (14:39 -0700)]
FAQ: Update feature table.

Linux kernel support for features in out-of-tree module no longer depend
on particular versions, as we only support kernels 3.10-4.3; Connection
tracking status has changed recently; and NAT is a brand new feature
with only support in the latest unreleased Linux kernel version.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
8 years agoFAQ: Shift IPFIX into the feature support table.
Joe Stringer [Thu, 28 Apr 2016 21:39:08 +0000 (14:39 -0700)]
FAQ: Shift IPFIX into the feature support table.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
8 years agonetdev-dpdk: Check dpdk-extra when reading db
Aaron Conole [Fri, 29 Apr 2016 17:44:05 +0000 (13:44 -0400)]
netdev-dpdk: Check dpdk-extra when reading db

A previous patch introduced the ability to pass arbitrary EAL command
line options via the dpdk_extras database entry. This commit enhances
that by warning the user when such a configuration is detected and
prefering the value in the database.

Suggested-by: Sean K Mooney <sean.k.mooney@intel.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Tested-by: Sean K Mooney <sean.k.mooney@intel.com>
Tested-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
8 years agonetdev-dpdk: Allow arbitrary eal arguments
Aaron Conole [Fri, 29 Apr 2016 17:44:04 +0000 (13:44 -0400)]
netdev-dpdk: Allow arbitrary eal arguments

A previous change moved some commonly used arguments from commandline to
the database, and with it the ability to pass arbitrary arguments to
EAL. This change allows arbitrary eal arguments to be provided
via a new db entry 'other_config:dpdk-extra' which will tokenize the
string and add it to the argument list. The only argument which will not
be supported with this change is '--no-huge', which appears to break the
system in other ways.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Tested-by: Sean K Mooney <sean.k.mooney@intel.com>
Tested-by: RobertX Wojciechowicz <robertx.wojciechowicz@intel.com>
Tested-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
8 years agonetdev-dpdk: Autofill lcore coremask if absent
Aaron Conole [Fri, 29 Apr 2016 17:44:03 +0000 (13:44 -0400)]
netdev-dpdk: Autofill lcore coremask if absent

The user has control over the DPDK internal lcore coremask, but this
parameter can be autofilled with a bit more intelligence. If the user
does not fill this parameter in, we use the lowest set bit in the
current task CPU affinity. Otherwise, we will reassign the current
thread to the specified lcore mask, in addition to the dpdk lcore
threads.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Tested-by: Sean K Mooney <sean.k.mooney@intel.com>
Tested-by: RobertX Wojciechowicz <robertx.wojciechowicz@intel.com>
Tested-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
8 years agonetdev-dpdk: Restrict vhost_sock_dir
Aaron Conole [Fri, 29 Apr 2016 17:44:02 +0000 (13:44 -0400)]
netdev-dpdk: Restrict vhost_sock_dir

Since the vhost-user sockets directory now comes from the database, it is
possible for any user with database access to program an arbitrary filesystem
location for the sockets directory. This could result in unprivileged users
creating or deleting arbitrary filesystem files by using specially crafted
names. To prevent this, 'vhost-sock-dir' is now relative to ovs_rundir()
and must not contain "..".

Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
8 years agonetdev-dpdk: Convert initialization from cmdline to db
Aaron Conole [Fri, 29 Apr 2016 17:44:01 +0000 (13:44 -0400)]
netdev-dpdk: Convert initialization from cmdline to db

Existing DPDK integration is provided by use of command line options which
must be split out and passed to librte in a special manner. However, this
forces any configuration to be passed by way of a special DPDK flag, and
interferes with ovs+dpdk packaging solutions.

This commit delays dpdk initialization until after the OVS database
connection is established, at which point ovs initializes librte. It
pulls all of the config data from the OVS database, and assembles a
new argv/argc pair to be passed along.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
8 years agonetdev-dpdk: Restore thread affinity after DPDK init
Aaron Conole [Fri, 29 Apr 2016 17:44:00 +0000 (13:44 -0400)]
netdev-dpdk: Restore thread affinity after DPDK init

When the DPDK init function is called, it changes the executing thread's
CPU affinity to a single core specified in -c. This will result in the
userspace bridge configuration thread being rebound, even if that is not
the intent.

This change fixes that behavior by rebinding to the original thread
affinity after calling dpdk_init().

Co-authored-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Tested-by: RobertX Wojciechowicz <robertx.wojciechowicz@intel.com>
Tested-by: Sean K Mooney <sean.k.mooney@intel.com>
Acked-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
8 years agoofp-actions: Fix use-after-free in decode_NOTE.
Joe Stringer [Thu, 28 Apr 2016 21:13:38 +0000 (14:13 -0700)]
ofp-actions: Fix use-after-free in decode_NOTE.

When decoding the 'note' action, variable-length data could be pushed to
a buffer immediately prior to calling ofpact_finish_NOTE(). The
ofpbuf_put() could cause reallocation, in which case the finish call
could access freed memory. Fix the issue by updating the local pointer
before passing it to ofpact_finish_NOTE().

If the memory was reused, it may trigger an assert in ofpact_finish():

assertion ofpact == ofpacts->header failed in ofpact_finish()

With the included test, make check-valgrind reports:

Invalid read of size 1
   at 0x500A9F: ofpact_finish_NOTE (ofp-actions.h:988)
   by 0x4FE5C1: decode_NXAST_RAW_NOTE (ofp-actions.c:4557)
   by 0x4FBC05: ofpact_decode (ofp-actions.inc2:3831)
   by 0x4F7E87: ofpacts_decode (ofp-actions.c:5780)
   by 0x4F709F: ofpacts_pull_openflow_actions__ (ofp-actions.c:5817)
   by 0x4F7856: ofpacts_pull_openflow_instructions (ofp-actions.c:6397)
   by 0x52CFF5: ofputil_decode_flow_mod (ofp-util.c:1727)
   by 0x5227A9: ofp_print_flow_mod (ofp-print.c:789)
   by 0x520823: ofp_to_string__ (ofp-print.c:3235)
   by 0x5204F6: ofp_to_string (ofp-print.c:3468)
   by 0x5925C8: do_recv (vconn.c:644)
   by 0x592372: vconn_recv (vconn.c:598)
   by 0x565CEA: rconn_recv (rconn.c:703)
   by 0x46CB62: ofconn_run (connmgr.c:1367)
   by 0x46C7AD: connmgr_run (connmgr.c:320)
   by 0x4224A9: ofproto_run (ofproto.c:1763)
   by 0x407C0D: bridge_run__ (bridge.c:2888)
   by 0x40767A: bridge_run (bridge.c:2943)
   by 0x4161B7: main (ovs-vswitchd.c:120)

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Ansis Atteka <ansisatteka@gmail.com>
8 years agostt: linearize for CONFIG_SLUB case
Pravin B Shelar [Wed, 27 Apr 2016 21:57:33 +0000 (14:57 -0700)]
stt: linearize for CONFIG_SLUB case

STT implementation I saw performance improvements with linearizing
skb for SLUB case.  So following patch skips zero copy operation
for such a case.
First change is to reassembly code where in-order packet is merged
to head, if there is no room to merge it then combined packet is
linearized.
Second case is of reassembly of out-of-order packets. In this case
the list of packets is linearized before sending it up to datapath.

Performance number for large packet TCP test using netperf.

OVS branch     TCP      Host0     Host1
version        Gbps     CPU%      CPU%
-----------------------------------------
2.5            9.4       272       315

master +       9.4       230       285
patch

Tested-By: Vasmi Abidi <vabidi@vmware.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
8 years agoRemove "VLAN splinters" feature.
Pravin B Shelar [Mon, 25 Apr 2016 18:27:58 +0000 (11:27 -0700)]
Remove "VLAN splinters" feature.

The "VLAN splinters" feature works around buggy device drivers in
old Linux versions. But support for the old kernel is dropped, So
now all supported kernel vlan drivers should be working fine with
OVS kernel datapath.
Following patch removes this deprecated feature.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
8 years agodatapath-windows: Fix recirculation when it is not the last attribute
Sairam Venugopal [Tue, 26 Apr 2016 23:53:30 +0000 (16:53 -0700)]
datapath-windows: Fix recirculation when it is not the last attribute

When the recirc action is in middle, the current code creates a clone of
the NBL. However, it overwrites the pointer to point to the cloned NBL
without completing it. This causes a memory leak that crashes the kernel.

Signed-off-by: Sairam Venugopal <vsairam@vmware.com>
Acked-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agodatapath-windows: Fix bug in OvsTcpGetWscale().
Daniele Di Proietto [Sat, 16 Apr 2016 00:04:53 +0000 (17:04 -0700)]
datapath-windows: Fix bug in OvsTcpGetWscale().

The userspace conntrack had a bug in tcp_wscale_get(), where the length
of an option would be read from the third octet of the option TLV
instead of the second.  This could cause an incorrect wscale value to
be returned, and it would at least impact performance.

Also use 'int' instead of 'unsigned' for 'len', since the value can be
negative.

CC: Sairam Venugopal <vsairam@vmware.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Sairam Venugopal <vsairam@vmware.com>
8 years agohmap: Add HMAP_FOR_EACH_POP.
Daniele Di Proietto [Thu, 7 Apr 2016 01:53:59 +0000 (18:53 -0700)]
hmap: Add HMAP_FOR_EACH_POP.

Makes popping each member of the hmap a bit easier.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
8 years agosystem-tests: Add tcp simple test.
Daniele Di Proietto [Mon, 11 Apr 2016 21:02:10 +0000 (14:02 -0700)]
system-tests: Add tcp simple test.

Useful to test the datapath ability to forward tcp packets without the
complexity of connection tracking.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
8 years agosystem-tests: Disable offloads in userspace tests.
Daniele Di Proietto [Fri, 15 Apr 2016 20:17:50 +0000 (13:17 -0700)]
system-tests: Disable offloads in userspace tests.

The system userspace testsuite uses the userspace datapath with
netdev-linux devices, connected to veth pairs with the AF_PACKET socket:

             (veth pair)     (AF_PACKET)
TCP stack -> p0 ---> ovs-p0  -------------> netdev-linux (userspace OVS)

Unfortunately this configuration has some problems with offloads: a
packet generated by the TCP stack maybe sent to p0 without being
checksummed or segmented. The AF_PACKET socket, by default, ignores the
offloads and just transmits the data of the packets to userspace, but:

1. The packet may need GSO, so the data will be too big to be received
   by the userspace datapath
2. The packet might have incomplete checksums, so it will likely be
   discarded by the receiver.

Problem 1 causes TCP connections to see a congestion window smaller than
the MTU, which hurts performance but doesn't prevent communication.

Problem 2 was hidden in the testsuite by a Linux kernel bug, fixed by
commit ce8c839b74e3("veth: don’t modify ip_summed; doing so treats
packets with bad checksums as good").  In the kernels that include the
fix, the userspace datapath is able to process pings, but not tcp or udp
data.

Unfortunately I couldn't find a way to ask the AF_PACKET to perform
offloads in kernel.  A possible fix would be to use the PACKET_VNET_HDR
sockopt and perform the offloads in userspace.

Until a proper fix is worked out for netdev-linux, this commit disables
offloads on the non-OVS side of the veth pair, as a workaround.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
8 years agodatapath-windows: Pause switch state on PnP event
Alin Serdean [Thu, 10 Mar 2016 13:33:42 +0000 (13:33 +0000)]
datapath-windows: Pause switch state on PnP event

A PnP(plug and play) event will be triggered before trying to disable
the extension. We could use this PnP event to prepare for detaching
the datapath.

This patch sets the switch into a paused state so no more net buffers
are queued.

Also clean some commentaries.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Acked-by: Sairam Venugopal <vsairam@vmware.com>
Acked-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
8 years agoovn-controller-vtep: Support BUM traffic for the VTEP Schema.
Darrell Ball [Tue, 5 Apr 2016 20:13:40 +0000 (13:13 -0700)]
ovn-controller-vtep: Support BUM traffic for the VTEP Schema.

This patch implements BUM support in the VTEP schema.  This relates to
BUM traffic flowing from a gateway towards HVs.  This code would be
relevant to HW gateways and the ovs-vtep simulator. In order to do this,
the mcast macs remote table in the VTEP schema is populated based on the
OVN SB port binding.  For each logical switch, the SB port bindings are
queried to find all the physical locators to send BUM traffic to and the
VTEP DB is updated.

Some test packets were enabled in the HW gateway test case to exercise
the new code.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
8 years agopackets: use flow protocol when recalculating ipv6 checksums
Simon Horman [Fri, 22 Apr 2016 12:22:56 +0000 (22:22 +1000)]
packets: use flow protocol when recalculating ipv6 checksums

When using masked actions the ipv6_proto field of an action
to set IPv6 fields may be zero rather than the prevailing protocol
which will result in skipping checksum recalculation.

This patch resolves the problem by relying on the protocol
in the packet rather than that in the set field action.

A similar fix for the kernel datapath has been accepted into David Miller's
'net' tree as b4f70527f052 ("openvswitch: use flow protocol when
recalculating ipv6 checksums").

Cc: Jarno Rajahalme <jrajahalme@nicira.com>
Fixes: 6d670e7f0d45 ("lib/odp: Masked set action execution and printing.")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Ben Pfaff <blp@ovn.org>
8 years agoutil.h: Restore stdarg.h which is necessary for va_list
YAMAMOTO Takashi [Fri, 22 Apr 2016 05:19:23 +0000 (05:19 +0000)]
util.h: Restore stdarg.h which is necessary for va_list

Fixes a regression in commit b44aaaaff8d826535025f4f8d12808c4ef36a7a8 .
("Misc cleanup with "util.h" header files")

Signed-off-by: YAMAMOTO Takashi <yamamoto@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
8 years agoofproto-dpif-xlate: Tidy up ct_mark xlate code.
Joe Stringer [Fri, 15 Apr 2016 18:36:05 +0000 (11:36 -0700)]
ofproto-dpif-xlate: Tidy up ct_mark xlate code.

Make the ct_mark netlink serialization more consistent with the way that
ct_label is serialized.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
8 years agoofproto-dpif-xlate: xlate ct_{mark, label} correctly.
Joe Stringer [Fri, 15 Apr 2016 18:36:04 +0000 (11:36 -0700)]
ofproto-dpif-xlate: xlate ct_{mark, label} correctly.

When translating multiple ct actions in a row which include modification
of ct_mark or ct_labels, these fields could be incorrectly translated
into datapath actions, resulting in modification of these fields for
entries when the OpenFlow rules didn't actually specify the change.

For instance, the following OpenFlow actions:
ct(zone=1,commit,exec(set_field(1->ct_mark))),ct(zone=2,table=1),...

Would translate into the datapath actions:
ct(zone=1,commit,mark=1),ct(zone=2,mark=1),recirc(...),...

This commit fixes the issue by zeroing the wildcards for these fields
prior to performing nested actions translation (and restoring
afterwards). As such, these fields do not hold both the match and the
field modification values at the same time. As a result, the ct_mark and
ct_labels don't leak from one ct action to the next.

Fixes: 8e53fe8cf7a1 ("Add connection tracking mark support.")
Fixes: 9daf23484fb1 ("Add connection tracking label support.")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
8 years agosystem-traffic: Add basic geneve tunnel sanity test.
Joe Stringer [Wed, 20 Apr 2016 23:07:52 +0000 (16:07 -0700)]
system-traffic: Add basic geneve tunnel sanity test.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
8 years agosystem-traffic: Add basic gre tunnel sanity test.
Joe Stringer [Wed, 20 Apr 2016 23:07:51 +0000 (16:07 -0700)]
system-traffic: Add basic gre tunnel sanity test.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
8 years agosystem-traffic: Fix IPv6 frag vxlan check.
Joe Stringer [Thu, 21 Apr 2016 21:10:11 +0000 (14:10 -0700)]
system-traffic: Fix IPv6 frag vxlan check.

This was missed before somehow, which would cause the test to fail
(rather than being skipped) if iproute2 didn't support setting the
vxlan dstport on the kernel tunnel device.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>