cascardo/ovs.git
11 years agoSet release date for 1.10.0. v1.10.0
Justin Pettit [Wed, 1 May 2013 21:30:38 +0000 (14:30 -0700)]
Set release date for 1.10.0.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
11 years agoworker: Prevent worker from being responsible for pidfile deletion.
Gurucharan Shetty [Mon, 29 Apr 2013 02:25:55 +0000 (19:25 -0700)]
worker: Prevent worker from being responsible for pidfile deletion.

Currently we are creating the worker process after creation of the pidfile.
This means that the responsibility of deleting the pidfile after process
termination rests with the worker process.

When we restart openvswitch using the startup scripts, we SIGTERM the main
process and once it is cleaned up, we start ovs-vswitchd again. This results
in a race condition. The new ovs-vswitchd will create a pidfile because it is
unlocked. But, if the old worker process exits after the start of new
ovs-vswitchd, it will simply delete the pidfile underneath the new ovs-vswitchd.
This will eventually result in multiple ovs-vswitchd daemons.

This patch gives the responsibility of deleting the pidfile to the main
process.

Bug #16669.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
11 years agovswitchd: Disable system stats collection on a concurrently running daemon.
Gurucharan Shetty [Sun, 28 Apr 2013 02:58:12 +0000 (19:58 -0700)]
vswitchd: Disable system stats collection on a concurrently running daemon.

There are very rare cases (ex: ovs-vswitchd.pid is inadvertantly deleted),
when multiple ovs-vswitchd daemons can end up running at the same time.
In a situation like that one of the daemons can wait on the poll()
with a 0 ms wait time as it would be expecting system stats to be collected.

But system stats are never run for the daemon that does not have the
lock on the database and hence it takes up 100% of the CPU if its state
machine for stats collection previously was S_WAITING.

With this patch, we disable the system stats collection for the daemon that
does not have the database lock. When it eventually gets the lock on the
database, system stats are automatically enabled if other_config:\
enable-statistics=true.

Bug #16669.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
11 years agodatapath: Account for RHEL6.4 backports in compat layer
Thomas Graf [Fri, 26 Apr 2013 10:03:11 +0000 (12:03 +0200)]
datapath: Account for RHEL6.4 backports in compat layer

Explicitly check the availability of several kernel API functions
instead of relying on the kernel version to account for Red Hat
Enterprise Linux backports.

Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
(cherry picked from commit 42d5dd9595cce35a8825a20be7d71a3a8f6f5640)

Conflicts:
datapath/linux/compat/include/asm/percpu.h
datapath/linux/compat/include/linux/netdevice.h

11 years agodatapath: Use openvswitch_handle_frame hook in >=RHEL6.4 to live side by side with...
Thomas Graf [Fri, 26 Apr 2013 10:03:10 +0000 (12:03 +0200)]
datapath: Use openvswitch_handle_frame hook in >=RHEL6.4 to live side by side with bridging

Due to the missing register rx_handler API in the kernel RHEL6 is
based on, the datapath currently falls back to using the bridging
hook with the consequence that bridging and OVS cannot be used in
parallel on any RHEL6 release.

For this purpose, >=RHEL6.4 releases provide a special rx frame hook
to be used by OVS. It captures frames at the same location in the
stack as the rx_handler would do in more recent kernel releases. In
order to store the vport pointer, the net_device's ax25_ptr field is
utilized under the assumption that an AX25 device will never be
attached to an OVS bridge.

Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
(cherry picked from commit f285d3e715512571c4b2f92a4d1c65022bbcc9d5)

Conflicts:
datapath/vport-netdev.c

11 years agoAdd FAQ entries around the VXLAN support in Open vSwitch.
Kyle Mestery [Fri, 26 Apr 2013 18:30:25 +0000 (14:30 -0400)]
Add FAQ entries around the VXLAN support in Open vSwitch.

Add a section to the FAQ explaining VXLAN with a pointer to the IETF draft.
Add sections detailing how much of the VXLAN protocol is currently supported
in OVS, along with a section explaining the default UDP port and how to change
this when creating VXLAN ports.

Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Conflicts:
FAQ

11 years agoUpdate the default VXLAN destination UDP port to the IANA assigned port
Kyle Mestery [Fri, 26 Apr 2013 18:30:24 +0000 (14:30 -0400)]
Update the default VXLAN destination UDP port to the IANA assigned port

VXLAN was recently assigned UDP port 4789 by IANA. This
comit updates the OVS VXLAN implementation to reflect the new UDP port
number.

Cc: Kenneth Duda <kduda@aristanetworks.com>
Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Conflicts:
NEWS

11 years agopython: fix a typo error in python/ovs/socket_util.py.
Alex Wang [Thu, 18 Apr 2013 00:35:04 +0000 (17:35 -0700)]
python: fix a typo error in python/ovs/socket_util.py.

The commit 89d7ffa9 (python: Workaround UNIX socket path
length limits), fixes most failed tests. But it has a
typo and the typo causes the failure of test <unixctl
server errors - Python> when the path length is very
long (e.g. more than 90 characters).

This patch fixes the above issue.

Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agopython/ovs/poller.py: workaround an eventlet bug
YAMAMOTO Takashi [Tue, 16 Apr 2013 06:56:31 +0000 (15:56 +0900)]
python/ovs/poller.py: workaround an eventlet bug

Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agoovs-vsctl: Fix a segfault.
Gurucharan Shetty [Wed, 10 Apr 2013 18:55:06 +0000 (11:55 -0700)]
ovs-vsctl: Fix a segfault.

The following two commands results in a ovs-vsctl segfault.
ovs-vsctl -vfatal_signal:off --timeout=0 wait-until \
Open_vswitch . external_ids:blah="1"
/etc/init.d/openvswitch-switch restart

This patch fixes the segfault by properly setting the global
varibale, the_idl_txn to NULL when the underlying memory is
freed.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
11 years agobridge: Complete initial configuration even with empty database.
Ben Pfaff [Thu, 11 Apr 2013 22:47:08 +0000 (15:47 -0700)]
bridge: Complete initial configuration even with empty database.

If the database was empty, that is, it did not even contain an Open_vSwitch
top-level configuration record, at ovs-vswitchd startup time, then
OVS failed to detach and used 100% CPU.  This commit fixes the problem.

This problem was introduced by commit 63ff04e82623e765 (bridge: Only
complete daemonization after db commits initial config.).

This problem did not manifest if the initscripts supplied with Open vSwitch
were used, because those initscripts always initialize the database before
starting ovs-vswitchd, so this problem affects only users with hand-rolled
local OVS startup scripts.

Bug #16090.
Reported-by: Pravin Shelar <pshelar@nicira.com>
Tested-by: Pravin Shelar <pshelar@nicira.com>
Reported-by: Paul Ingram <paul@nicira.com>
Reported-by: Amre Shakimov <ashakimov@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ansis Atteka <aatteka@nicira.com>
11 years agobridge: Only complete daemonization after db commits initial config.
Ben Pfaff [Wed, 10 Apr 2013 17:33:39 +0000 (10:33 -0700)]
bridge: Only complete daemonization after db commits initial config.

An earlier commit changed the Open vSwitch startup scripts so that they
connect to remote managers only after ovs-vswitchd does its initial
configuration, as signaled by ovs-vswitchd detaching from its parent
process.  However, a race window remains, because ovs-vswitchd detaching
does not mean that the database server has received and committed the
transaction, only that ovs-vswitchd has sent it.  This commit fixes that
race window, by changing ovs-vswitchd to complete detaching only after
the database server acknowledges the transaction.

It is still possible for unusual events to cause ovs-vswitchd to detach
before ephemeral columns are filled in.  There is always a slim possibility
that the transaction will fail or that some other client has added new
bridges, ports, etc. while ovs-vswitchd was configuring using an old
configuration.  The latter race is inherent to the design of the system
and cannot be avoided without radical changes.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ansis Atteka <aatteka@nicira.com>
Bug #15983.

11 years agoovs-ctl: Connect to remote OVSDB managers only after ovs-vswitchd starts.
Ben Pfaff [Wed, 10 Apr 2013 16:53:54 +0000 (09:53 -0700)]
ovs-ctl: Connect to remote OVSDB managers only after ovs-vswitchd starts.

Until now, ovs-ctl has started ovsdb-server with the full set of remote
managers configured.  This means that ovsdb-server immediately connects to
these managers, before ovs-vswitchd even starts.  Because the Open vSwitch
schema has several ephemeral columns, there will be considerable startup
churn in the database.   For example, ovs-vswitchd will initially fill in
the datapath-id and ofport columns as it starts and sets up the initial
configuration.  This churn wastes bandwidth to the remote managers and has
potential for confusing them.

This commit reduces the churn by changing ovs-ctl so that ovsdb-server
connects to the remote managers only after ovs-vswitchd has finished its
initial configuration.  This means that remote managers will initially
see a filled-in database, not one that has its ephemeral columns empty.

This commit does not mean that managers can ignore the possibility that
some columns have not yet been filled in.  For example, some columns will
still be briefly blank after a new bridge or a new port is added at
runtime, because adding a bridge or port occurs in one transaction (made by
the client adding the port, e.g. ovs-vsctl) and filling in those columns
happens in a different transaction (made by ovs-vswitchd).  But this commit
does reduce the quantity of empty columns that I would expect a database
client to observe in practice.

Reported-by: Jeff Merrick <jmerrick@vmware.com>
CC: Amar Padmanabhan <amar@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ansis Atteka <aatteka@nicira.com>
Bug #15983.

11 years agoovsdb-server: Add commands for adding and removing remotes at runtime.
Ben Pfaff [Wed, 10 Apr 2013 16:34:49 +0000 (09:34 -0700)]
ovsdb-server: Add commands for adding and removing remotes at runtime.

This will make it possible, in later commits, to make ovsdb-server connect
to OVS managers only after ovs-vswitchd has completed its initial
configuration.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ansis Atteka <aatteka@nicira.com>
11 years agoovsdb-server: Refactor parsing of remote names to avoid ovs_fatal().
Ben Pfaff [Wed, 10 Apr 2013 23:22:00 +0000 (16:22 -0700)]
ovsdb-server: Refactor parsing of remote names to avoid ovs_fatal().

The current users of parse_db_column() are content to terminate with a
fatal error if parsing fails.  An upcoming commit requires more flexibility,
so this commit refactors parse_db_column() to make this possible.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ansis Atteka <aatteka@nicira.com>
11 years agosset: New function sset_sort().
Ben Pfaff [Wed, 10 Apr 2013 16:27:49 +0000 (09:27 -0700)]
sset: New function sset_sort().

This will have its first caller in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ansis Atteka <aatteka@nicira.com>
11 years agodpif-linux: Reset epoll() on channel deletion.
Ethan Jackson [Wed, 10 Apr 2013 20:05:04 +0000 (13:05 -0700)]
dpif-linux: Reset epoll() on channel deletion.

The list of epoll events contains references to channels which may
be stale when one of those channels is deleted.  The safest thing
to do is simply refresh epoll() whenever a channel is deleted.

Bug #16057.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
11 years agoovs-lib: Do not tee the ovs-ctl o/p in case of strace.
Gurucharan Shetty [Sat, 6 Apr 2013 23:56:06 +0000 (16:56 -0700)]
ovs-lib: Do not tee the ovs-ctl o/p in case of strace.

Running the OVS daemons with strace option enabled
will block if we pipe the output. We use tee
to log the output of ovs-ctl to ovs-ctl.log

This patch disables the startup script logging when we run the
OVS daemons with the strace option.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
11 years agoofproto-dpif: Disable miss handling in rule_get_stats().
Ethan Jackson [Sat, 6 Apr 2013 22:22:14 +0000 (15:22 -0700)]
ofproto-dpif: Disable miss handling in rule_get_stats().

rule_get_stats() is often called when iterating over every rule in
the flow table.  To ensure up-to-date statistics, rule_get_stats()
calls push_all_stats() which can cause flow misses to be handled.
When using the learn action, this can cause rules to be added (and
potentially removed) from the OpenFlow table.  This could corrupt
the caller's data structures, leading to a segmentation fault.
This patch fixes the issue by disabling flow miss handling from
within rule_get_stats().

Bug #15999.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
11 years agoovs-appctl: dpif/show display bug fix
Andy Zhou [Thu, 4 Apr 2013 23:35:27 +0000 (16:35 -0700)]
ovs-appctl: dpif/show display bug fix

Fixes a bug where per ofproto moving average stats did not update
when there is no active dp flows.

Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
11 years agorhel: Add depmod.d conf file for rhel6 kmod package.
Gurucharan Shetty [Sun, 31 Mar 2013 01:32:25 +0000 (18:32 -0700)]
rhel: Add depmod.d conf file for rhel6 kmod package.

It looks like for Centos6.4, there is an upstream openvswitch
kernel module already installed. When we try to install kmod-openvswitch
package from this tree's pre-1.10 branches, we get the following warning:
"brcompat.ko needs unknown symbol ovs_dp_ioctl_hook".

Also, after installing the kmod-openvswitch package, if we run
"modprobe openvswitch", the upstream kernel module gets loaded.
We should instead load the kernel module compiled from this tree.

This patch fixes both the above issues.

Bug #15829.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
11 years agojsonrpc-server: Disconnect connections that queue too much data.
Ben Pfaff [Wed, 27 Mar 2013 21:38:11 +0000 (14:38 -0700)]
jsonrpc-server: Disconnect connections that queue too much data.

Consider this situation:

    * OVSDB client A executes transactions very quickly for a long time.

    * OVSDB client B monitors the tables that A modifies, but (either
      because B is connected over a slow network, or because B is slow to
      process updates) cannot keep up.

In this situation, the data that ovsdb-server has queued to send B grows
without bound and eventually ovsdb-server runs out of memory.  This commit
avoids the problem by noticing that more data is queued to B than necessary
to express the whole contents of the database and dropping the connection
to B.  When B reconnects later, it can then fetch the contents of the
database using less data than was previously queued to it.

(This is not entirely hypothetical.  We have seen this behavior in
intentional stress tests.)

Bug #15637.
Reported-by: Jeff Merrick <jmerrick@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agoovsdb-data: New functions for predicting serialized length of data.
Ben Pfaff [Wed, 27 Mar 2013 16:32:56 +0000 (09:32 -0700)]
ovsdb-data: New functions for predicting serialized length of data.

These will be used for the first time in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agojson: New function json_serialized_length().
Ben Pfaff [Mon, 1 Apr 2013 20:16:59 +0000 (13:16 -0700)]
json: New function json_serialized_length().

This will be used for the first time in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agoofproto-dpif: Don't rate limit facet_learn() with fin_timeouts.
Ethan Jackson [Tue, 2 Apr 2013 19:32:22 +0000 (12:32 -0700)]
ofproto-dpif: Don't rate limit facet_learn() with fin_timeouts.

In the standard case, rate limiting facet_learn() to once ever
500ms, makes sense.  The worst that can happen is a learning entry
is expired half a second to early.  However, when using
fin_timeouts, we really need react quickly to delete the newly
stale flow.

Bug #15915.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
11 years agoofproto: Increase default flow-eviction-threshold.
Ethan Jackson [Fri, 29 Mar 2013 21:19:04 +0000 (14:19 -0700)]
ofproto: Increase default flow-eviction-threshold.

The flow-eviction-threshold presents a trade off between the
expense of maintaining large numbers of datapath flows, and the
benefit of avoid unnecessary flow misses.  In some large Open
vSwitch deployments, we've seen the previous default flow eviction
threshold negatively impact performance with reasonably typical
traffic patterns.  This patch increases the default to a level
which should represent a better trade off: still relatively safe,
but much more amenable to large numbers of long lived flows.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
11 years agoofproto-dpif: Push statistics less frequently.
Ethan Jackson [Fri, 22 Mar 2013 02:04:52 +0000 (19:04 -0700)]
ofproto-dpif: Push statistics less frequently.

The most natural place to push facet statistics is in
update_stats() where they're pulled from the datapath.  However,
under load, update_stats() can be called as many as 10 times per
second causing us to push statistics so frequently it hurts
performance.  By pushing statistics much less frequently, this
patch generates a roughly 8% improvement in TCP_CRR performance.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
11 years agoofproto-dpif: Run fast internally.
Ethan Jackson [Wed, 27 Mar 2013 18:33:22 +0000 (11:33 -0700)]
ofproto-dpif: Run fast internally.

ofproto-dpif is responsible for quite a few book keeping tasks in
addition to handling flow misses.  Many of these tasks (flow
expiration, flow revalidation, etc) can take many hundreds of
milliseconds, during which no misses can be handled.  The ideal
long term solution to this problem, is to isolate flow miss
handling into it's own thread.  However, for now this patch
provides a 5% increase in TCP_CRR performance, and smooths out
results during revalidations.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
11 years agoofproto-dpif: Systematically push stats upon request.
Ethan Jackson [Sat, 30 Mar 2013 22:13:00 +0000 (15:13 -0700)]
ofproto-dpif: Systematically push stats upon request.

Commit bf1e8ff (ofproto-dpif: Push statistics in rule_get_stats()),
started down the road towards pushing stats on demand, but it
didn't go quite far enough.  First, it neglected to push stats in
port_get_stats() and mirror_get_stats().  Second, it only pushes
stats for a single ofproto, making it incomplete when patch ports
are used.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
11 years agoofproto-dpif.at: Fix timing issue in show rates test.
Jarno Rajahalme [Thu, 28 Mar 2013 13:01:18 +0000 (15:01 +0200)]
ofproto-dpif.at: Fix timing issue in show rates test.

Fix a test failure due to timing differences in different test runs.

Signed-off-by: Jarno Rajahalme <jarno.rajahalme@nsn.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agoofproto-dpif: Keep track of exact-match flow info
Andy Zhou [Tue, 26 Mar 2013 02:49:13 +0000 (19:49 -0700)]
ofproto-dpif: Keep track of exact-match flow info

This patch adds more flow related stats to the output of
"ovs-appctl dpif/show".  Specifically, the follow information
are added per ofproto:

- Max flow table size
- Average flow table size
- Average flow table add rate
- Average flow table delete rate
- Average flow entry life in milliseconds

Feature #15366

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agoovs-appctl: dpif/show display per bridge stats
Andy Zhou [Tue, 12 Mar 2013 21:19:18 +0000 (14:19 -0700)]
ovs-appctl: dpif/show display per bridge stats

This is to fix the fallout of single datapath change.
ovs-appctl dpif/show displays per bridge miss, hit
and flow counts on the screen, but the backend is
obtaining those information from the datapath.
With a single datapath, all bridges of the same
datapath would all display the same  (global)
counters maintained by the datapath, obviously
not correct.

This patch fixes the bug by maintaining per ofproto_dpif
miss and hit counts, which are used for display output.
The number of flows count is obtained by counting the
number facets per ofproto.

ovs-dpctl show still displays the counters maintain by
the datapath, as before.

Bug #15369

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agoofproto-dpif: Rate limit calls to facet_learn().
Ethan Jackson [Fri, 22 Mar 2013 02:40:49 +0000 (19:40 -0700)]
ofproto-dpif: Rate limit calls to facet_learn().

In the TCP_CRR benchmark, ovs-vswitchd spends so much time in
update_stats() that it has a significant impact on flow setup
performance.  Further work is needed in this area, but for now,
simply rate limiting facet_learn() has a roughly 10% improvement
with complex flow tables.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
11 years agoofproto-dpif: Rate limit facet_check_consistency()
Ethan Jackson [Thu, 21 Mar 2013 20:31:14 +0000 (13:31 -0700)]
ofproto-dpif: Rate limit facet_check_consistency()

With complex flow tables, facet_check_consistency() can be
expensive enough to show up in flow setup performance benchmarks.
In my testing this patch gives us a roughly 10% improvement in
TCP_CRR and ovs-benchmark.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
11 years agoovs-lib: Wait for a longer time after SIGKILL.
Gurucharan Shetty [Wed, 27 Mar 2013 21:15:05 +0000 (14:15 -0700)]
ovs-lib: Wait for a longer time after SIGKILL.

Currently, when we stop a daemon, we first send it SIGTERM.
If SIGTERM did not work within ~5 seconds, we send a SIGKILL.
After sending SIGKILL, we wait only for 4 seconds, before giving
up.

If the system is extremely busy, there is a chance that a
process is not killed by the kernel within 4 seconds. In such
a case, when we try to start the daemon immediately, we see that
the pid inside the pid-file is valid and assume that the daemon
is still running. This leaves us in a state, where the daemon is
actually not running.

This patch increases the time waiting for the kernel to kill the
process to 60 seconds.

Bug #15404.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
11 years agodatapath: Fix IP ID setting.
Jarno Rajahalme [Mon, 25 Mar 2013 19:03:38 +0000 (21:03 +0200)]
datapath: Fix IP ID setting.

Eliminate the extra call to ip_select_ident(), and place the
__ip_select_ident() call where the ip_select_ident() call was.
This fixes two problems: Before, the call to ip_select_ident() did
always zero out the value set earlier by __ip_select_ident().  Also,
when __ip_select_ident() was called before setting the iph->daddr,
ident calculation was possibly based on uninitialized data (but as
the result was masked by the later call to ip_select_ident() it was
not visible).

Signed-off-by: Jarno Rajahalme <jarno.rajahalme@nsn.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agodatapath: Factor out common code from *_build_header() to ovs_tnl_send().
Jarno Rajahalme [Mon, 25 Mar 2013 19:03:37 +0000 (21:03 +0200)]
datapath: Factor out common code from *_build_header() to ovs_tnl_send().

Signed-off-by: Jarno Rajahalme <jarno.rajahalme@nsn.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Conflicts:
datapath/vport-lisp.c

11 years agoovs-bugtool: Add iptables output for all tables.
Gurucharan Shetty [Mon, 25 Mar 2013 15:41:18 +0000 (08:41 -0700)]
ovs-bugtool: Add iptables output for all tables.

Currently we list all the rules only from the 'filter' table.
Include the rules from all the other tables too.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
11 years agoAdd binary option for command outputs collected by ovs-bugtool
Shih-Hao Li [Fri, 22 Feb 2013 16:54:04 +0000 (08:54 -0800)]
Add binary option for command outputs collected by ovs-bugtool

Current ovs-bugtool collects command outputs as text strings.
Thus it reads the output by lines. For commands that generate
huge binary data, it becomes very inefficient to read the output.

The change here is to use a 1MB buffer to read binary data
instead of reading them by lines.

Signed-off-by: Shih-Hao Li <shihli@vmware.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
11 years agoodp-utils: Fix memory corruption while flow parsing.
Gurucharan Shetty [Fri, 22 Mar 2013 23:25:36 +0000 (16:25 -0700)]
odp-utils: Fix memory corruption while flow parsing.

Currently, when flow attribute type is greater than OVS_KEY_ATTR_MAX,
we can write into a random memory address causing corruption. Fix it.

Bug #15702.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
11 years agoofproto-dpif: Push statistics in rule_get_stats().
Ethan Jackson [Sat, 23 Mar 2013 22:11:21 +0000 (15:11 -0700)]
ofproto-dpif: Push statistics in rule_get_stats().

As time goes on, and flow tables become more complicated, the
tradeoff between keeping up to date statistics, and the CPU
resources needed to maintain them, will become more important.
Commit 5c0243a (ofproto-dpif: xlate actions once with subfacets.)
delayed the reporting of some statistics in an effort to achieve
higher flow setup performance.  Future commits will continue in the
same direction.

This patch helps to alleviate the issue, by pushing statistics
rule_get_stats(), when users actually want them.  Presumably, this
happens rarely, and thus will not have a negative impact on
ovs-vswitchd performance.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
11 years agoofproto-dpif: xlate actions once with subfacets.
Ethan Jackson [Thu, 21 Mar 2013 18:17:00 +0000 (11:17 -0700)]
ofproto-dpif: xlate actions once with subfacets.

Before this patch, when ofproto-dpif decided that a particular flow
miss needed a facet, it would do action translation multiple times.
Once in subfacet_make_actions(), and once per packet in
subfacet_update_stats().  In the common case (once per miss), this
would double the amount of work required in xlate_actions().

The call to facet_push_stats() in subfacet_update_stats() is
unnecessary.  If the packets are simply accounted to the facet,
they will eventually be pushed to the relevant rules in
update_stats() or when the facet is removed.   Removing the
unnecessary step gives us a 20% improvement of the netperf TCP_CRR
benchmark with the complex flow tables installed by our controller.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
11 years agoovs-bugtool: Add ovs-ofctl commands to bugtool plugin scripts.
Gurucharan Shetty [Thu, 21 Mar 2013 20:46:15 +0000 (13:46 -0700)]
ovs-bugtool: Add ovs-ofctl commands to bugtool plugin scripts.

This patch adds two new scripts that run "ovs-ofctl show" and
"ovs-ofctl dump-flows" on each bridge.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
11 years agoovs-bugtool: Remove calls of ovs-ofctl on ovs-system.
Gurucharan Shetty [Thu, 21 Mar 2013 20:22:56 +0000 (13:22 -0700)]
ovs-bugtool: Remove calls of ovs-ofctl on ovs-system.

With single datapath, making ovs-ofctl calls on ovs-system
does not give the necessary o/p. This patch removes those calls.

The next patch adds the correct commands to bugtool plugin scripts.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
11 years agobridge: Rate-limit updates to "instant stats".
Ben Pfaff [Tue, 19 Mar 2013 21:02:48 +0000 (14:02 -0700)]
bridge: Rate-limit updates to "instant stats".

Some information in the database must be kept as up-to-date as
possible to allow controllers to respond rapidly to network outages.
We call these statistics "instant" stats.

Until now, the instant stats have been updated on every trip through
the main loop.  This work scales with the number of interfaces that
ovs-vswitchd manages.  With CFM enabled on 5000 interfaces, even with
a low transmission rate, we see ovs-vswitchd using 100% CPU just to
maintain statistics, even with no actual changes.

This commit rate-limits updates to instant stats to at most 10 times
per second.  Earlier tests I did with similar patches showed a major
reduction in CPU usage.  I have not rerun those tests with this patch,
but I expect that the CPU usage should similarly decline.

CC: Ram Jothikumar <rjothikumar@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
11 years agodebian: Re-add --timeout option for ifupdown script.
Gurucharan Shetty [Mon, 18 Mar 2013 19:33:17 +0000 (12:33 -0700)]
debian: Re-add --timeout option for ifupdown script.

Commit fba6bd1d3f(ovs-vsctl: Try connecting only once for active connections..)
removed the timeout option from ifupdown.sh. Removing the "--timeout=" option
can cause ifupdown script to hang if ovs-vswitchd is not running and ifupdown
script changes the OVSDB. So, re-add it.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
11 years agoovs-vsctl: Try connecting only once for active connections by default.
Ben Pfaff [Fri, 15 Mar 2013 23:14:28 +0000 (16:14 -0700)]
ovs-vsctl: Try connecting only once for active connections by default.

Until now, ovs-vsctl has kept trying to the database server until it
succeeded or the timeout expired (if one was specified with --timeout).
This meant that if ovsdb-server wasn't running, then ovs-vsctl would hang.
The result was that almost every ovs-vsctl invocation in scripts specified
a timeout on the off-chance that the database server might not be running.
But it's difficult to choose a good timeout.  A timeout that is too short
can cause spurious failures.  A timeout that is too long causes long delays
if the server really isn't running.

This commit should alleviate this problem.  It changes ovs-vsctl's behavior
so that, if it fails to connect to the server, it exits unsuccessfully.
This makes --timeout obsolete for the purpose of avoiding a hang if the
database server isn't running.  (--timeout is still useful to avoid a hang
if ovsdb-server is running but ovs-vswitchd is not, for ovs-vsctl commands
that modify the database.  --no-wait also avoids that issue.)

Bug #2393.
Bug #15594.
Reported-by: Jeff Merrick <jmerrick@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agoipsec: unset IPSEC_MARK flag from skb_mark after tunnel packet is decapsulated
Ansis Atteka [Thu, 14 Mar 2013 18:53:00 +0000 (11:53 -0700)]
ipsec: unset IPSEC_MARK flag from skb_mark after tunnel packet is decapsulated

After tunnel packet is unencapsulated we should unset IPsec flag from
skb_mark.

Otherwise, IPsec policies would be applied one more time on internal
interfaces, if there is one. This is especially necessary after we
will introduce global, low-priority IPsec drop policy that will make
sure that we never let through marked but unencrypted packets.

Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Issue: 15074

11 years agoovs-bugtool: Add ovs-ctl.log to debug bundle.
Gurucharan Shetty [Fri, 15 Mar 2013 16:21:25 +0000 (09:21 -0700)]
ovs-bugtool: Add ovs-ctl.log to debug bundle.

ovs-ctl.log will include the o/p of ovs-ctl when
run from rhel, debian and xenserver startup scripts.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
11 years agodebian, rhel, xenserver: Ability to collect ovs-ctl logs.
Gurucharan Shetty [Wed, 13 Mar 2013 22:07:06 +0000 (15:07 -0700)]
debian, rhel, xenserver: Ability to collect ovs-ctl logs.

We use ovs-ctl from startup scripts to start, stop, restart,
force-reload-kmod OVS daemons. ovs-ctl gives quite a descriptive
o/p while running the above commands. But the o/p goes to stdout.
Sometimes, this output is quite useful to debug issues.

With this patch, we store the o/p of ovs-ctl when called from
startup scripts in /var/log/openvswitch/ovs-ctl.log

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
11 years agotunnel: Remove references to multicast tunnels in schema documentation.
Jesse Gross [Wed, 13 Mar 2013 15:35:15 +0000 (08:35 -0700)]
tunnel: Remove references to multicast tunnels in schema documentation.

The vestigal multicast support in tunnels has been removed at this
point, so this deletes the remaining references in the documentation.

Reported-by: Guangvy <1965837689@qq.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agodatapath: Check for Centos 6.4 backports.
Jesse Gross [Tue, 12 Mar 2013 18:34:29 +0000 (11:34 -0700)]
datapath: Check for Centos 6.4 backports.

Centos 6.4 backported a number of additional functions so our existing
versions started causing conflicts.

Reported-by: Denis Iskandarov <d.iskandarov@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agobridge: Store the 'mac_in_use' for interfaces in OVSDB.
Justin Pettit [Tue, 12 Mar 2013 21:47:22 +0000 (14:47 -0700)]
bridge: Store the 'mac_in_use' for interfaces in OVSDB.

It can be useful to remotely determine the MAC addresses of attached
interfaces without going through OpenFlow.  This adds the MAC address to
a new 'mac_in_use' column on the Interface table.

Feature #15551

Requested-by: Paul Ingram <paul@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
11 years agodatapath: Reduce loop limit by one to 4.
Jesse Gross [Tue, 12 Mar 2013 19:36:03 +0000 (12:36 -0700)]
datapath: Reduce loop limit by one to 4.

We currently allow five trips through the kernel datapath
before dropping the packet to protect the stack.  However, there
have been a few reports recently involving tunneling that this is
still too much.  Although it's not a complete solution, this reduces
the limit by one to balance safety in common situations with
flexibility.

Bug #15477

Reported-by: Paul Ingram <paul@nicira.com>
Reported-by: 謝秉融 <faithfulman@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
11 years agoconnmgr: Fix memory leak in ofconn monitor table.
Ben Pfaff [Fri, 18 Jan 2013 23:17:15 +0000 (15:17 -0800)]
connmgr: Fix memory leak in ofconn monitor table.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
11 years agoovsdb: Fix memory leak.
Ben Pfaff [Thu, 24 Jan 2013 19:33:35 +0000 (11:33 -0800)]
ovsdb: Fix memory leak.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
11 years agoSet dates for 1.9.0 release.
Justin Pettit [Tue, 26 Feb 2013 19:24:20 +0000 (11:24 -0800)]
Set dates for 1.9.0 release.

This also sets the dates for 1.8.0, even though it was an internal-only
release.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
11 years agoNEWS: Note tunneling feature removals in the correct release.
Jesse Gross [Mon, 11 Mar 2013 23:00:17 +0000 (16:00 -0700)]
NEWS: Note tunneling feature removals in the correct release.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Conflicts:
NEWS

11 years agoAdd table_id to NXM flow_removed messages.
Ben Pfaff [Wed, 6 Mar 2013 17:13:37 +0000 (09:13 -0800)]
Add table_id to NXM flow_removed messages.

Feature #15466.
Requested-by: Ronghua Zhang <rzhang@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agoofproto-dpif: Fix up user specifying wrong bridge on "ofproto/trace".
Ben Pfaff [Wed, 6 Mar 2013 00:48:21 +0000 (16:48 -0800)]
ofproto-dpif: Fix up user specifying wrong bridge on "ofproto/trace".

If there is more than one bridge, then it's easy to specify the wrong one
on an ofproto/trace command.  Previously, this would produce surprising
results.  With this commit, "ofproto/trace" should silently fix up the
problem.

It would be nice to not require the user to specify a bridge at all, but
it's theoretically possible to have more than one backer, in which case we
need some way to distinguish, and a bridge name is as good an identifier
as we have.  We could ask the user to specify the datapath_type, I guess,
but that's a less familiar name to most users and it would be a somewhat
gratuitous change in synatx for ofproto/trace.

Bug #15419.
Reported-by: Paul Ingram <paul@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agoofproto-dpif: Print slow-path actions instead of "drop" in dump-flows.
Justin Pettit [Thu, 7 Mar 2013 01:11:35 +0000 (17:11 -0800)]
ofproto-dpif: Print slow-path actions instead of "drop" in dump-flows.

The command "ovs-appctl dpif/dump-flows" would print slow-path actions
as "drop", which could be confusing to users.  This is different from
"ovs-dpctl dump-flows", which prints a descriptive reason.  This commit
replaces "drop" with the reason.

Bug #14840

Signed-off-by: Justin Pettit <jpettit@nicira.com>
11 years agotimeval: Avoid backtrace() from signal handler on x86-64.
Ben Pfaff [Fri, 8 Mar 2013 01:13:49 +0000 (17:13 -0800)]
timeval: Avoid backtrace() from signal handler on x86-64.

backtrace() is really useful, but it is not signal safe everywhere.  We
need to reassess whether it is reasonable to use it anywhere, but
immediately we need to disable it on x86-64 (with glibc) because it is
causing segfaults in testing.

Bug #15497.
Reported-by: Ram Jothikumar <rjothikumar@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
11 years agotunnel: Mark ECN status on decapsulated tunnel packets.
Justin Pettit [Wed, 13 Feb 2013 22:50:24 +0000 (14:50 -0800)]
tunnel: Mark ECN status on decapsulated tunnel packets.

In the kernel tunnel implementation, if a packet was marked as ECN CE on
the outer packet then we would carry this over to the inner packet on
decapsulation.  With the switch to flow based tunneling, this stopped
happening.  This commit reintroduces that behavior by using the set IP
header action.

Bug #15072

Signed-off-by: Justin Pettit <jpettit@nicira.com>
11 years agotunnel: Generate datapath flows for tunneled packets dropped due to ECN.
Justin Pettit [Wed, 13 Feb 2013 22:08:15 +0000 (14:08 -0800)]
tunnel: Generate datapath flows for tunneled packets dropped due to ECN.

Move the check for whether tunneled packets should be dropped due to
congestion encountered (CE) when the encapsulated packet is not ECN
capable (non-ECT).  This also adds some additional tests for ECN
handling on tunnel decapsulation.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
11 years agoofproto-dpif: Store the initial tunnel IP TOS values for later use.
Justin Pettit [Wed, 13 Feb 2013 02:08:01 +0000 (18:08 -0800)]
ofproto-dpif: Store the initial tunnel IP TOS values for later use.

When a packet arrives on an IP tunnel, store the TOS value for later
use.  This value will be used in a couple of future commits.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
11 years agoofproto-dpif: Make initial packet value handling generic.
Justin Pettit [Tue, 12 Feb 2013 02:56:24 +0000 (18:56 -0800)]
ofproto-dpif: Make initial packet value handling generic.

For VLAN splinters, an "initial_tci" value was introduced that is passed
around during flow processing to be used later for action translation.
This commit switches to passing around a struct so that additional
values beyond TCI can be used.  A future commit will use this.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
11 years agoofproto-dpif: Pass around "facet" in flow_push_stats().
Justin Pettit [Tue, 19 Feb 2013 19:42:54 +0000 (11:42 -0800)]
ofproto-dpif: Pass around "facet" in flow_push_stats().

The flow_push_stats() function will need other members of the "facet"
structure in a future commit.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
11 years agotunneling: Simplify ovs_tnl_send() error handling code.
Pravin B Shelar [Wed, 6 Mar 2013 18:34:59 +0000 (10:34 -0800)]
tunneling: Simplify ovs_tnl_send() error handling code.

Following commit slightly improves code readability. It is
also correctness fix as ip_local_out() was storing error
code in err which was not int.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
11 years agoTunnel: Cleanup old tunnel infrastructure.
Pravin B Shelar [Wed, 6 Mar 2013 18:34:24 +0000 (10:34 -0800)]
Tunnel: Cleanup old tunnel infrastructure.

Since userspace flow based tunneling code is checked in, the kernel
port based tunneling code can be removed.

Patch removes following components:
 - tunnel ports hash table and moved tunnel ports list to individual
   vports.
 - Cleaned per tnl-port config.
 - OVS_KEY_ATTR_TUN_ID action is removed.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #15078

11 years agodatapath: Remove CAPWAP tunneling support.
Pravin B Shelar [Wed, 6 Mar 2013 18:33:03 +0000 (10:33 -0800)]
datapath: Remove CAPWAP tunneling support.

The CAPWAP implementation is just the encapsulation format and
therefore really not the full protocol.  While there were some
uses of it (primarily hardware support and UDP transport).  But
these are most likely better provided by VXLAN.

Following patch removes CAPWAP tunneling support.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
11 years agotimeval: Increase accuracy of cached time 4X, from 100 ms to 25 ms.
Ben Pfaff [Tue, 5 Mar 2013 21:12:08 +0000 (13:12 -0800)]
timeval: Increase accuracy of cached time 4X, from 100 ms to 25 ms.

With CFM and other tunnel monitoring protocols, having a fairly precise
time is good.  My measurements don't show this change increasing CPU use.
(In fact it appears to repeatably reduce CPU use slightly, from about
22% to about 20% with 1000 CFM instances, although it's not obvious why.)

Bug #15171.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
11 years agotimeval: Enable caching the current time even on x86-64.
Ben Pfaff [Wed, 6 Mar 2013 00:12:21 +0000 (16:12 -0800)]
timeval: Enable caching the current time even on x86-64.

With CFM enabled on 1000 tunnels, this reduced CPU use from about 30% to
about 22%.

Bug #15171.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
11 years agoovsdb-idlc: Make no-op writes to write-only columns cheaper.
Ben Pfaff [Tue, 5 Mar 2013 23:30:33 +0000 (15:30 -0800)]
ovsdb-idlc: Make no-op writes to write-only columns cheaper.

For 1000 tunnels with CFM enabled, this reduces CPU use from
about 36% to about 30%.

Bug #15171.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
11 years agodatapath: Fix circular dependency between bug.h and kernel.h.
Jesse Gross [Wed, 6 Mar 2013 08:10:01 +0000 (00:10 -0800)]
datapath: Fix circular dependency between bug.h and kernel.h.

In Linux 3.4 the definition for BUILD_BUG_ON_NOT_POWER_OF_2 was
moved from kernel.h to bug.h.  On various kernels these header
files include each other in various orders (often through a
long chain of other header files), which can create circular
dependency issues.  Since we not longer need this definition,
this simply removes the backport.

Reported-by: Palo Andi <andi@dis.uniroma1.it>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoovs-ctl.in: Restore interfaces and ofports for userspace restarts.
Gurucharan Shetty [Thu, 28 Feb 2013 22:46:43 +0000 (14:46 -0800)]
ovs-ctl.in: Restore interfaces and ofports for userspace restarts.

When we upgrade from pre-1.9 to 1.10 or later branches, when just
the user space daemons are restarted, with the older kernel module
intact, datapaths are recreated.

This results in loosing the internal interface states like ip addresses,
routing table entries etc. Also, the 'ofport' value of the older interfaces
change.

With this patch we restore the interface states, ofport values etc,
when "ovs-ctl restart" or "/etc/init.d/openvswitch[-switch] restart
--save-flows" is called. The later command is automatically called
when debian packages are installed.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
11 years agoovs-ctl.in: Clean up code for the next commit.
Gurucharan Shetty [Thu, 28 Feb 2013 22:21:40 +0000 (14:21 -0800)]
ovs-ctl.in: Clean up code for the next commit.

Previously, we would null the variables holding the names of the restore
scripts in case there were any errors in creating the restore script or if
we did not need to run a particular restore script. That is not necessary,
as we can just check the execution permission set on those scirpts.

Also, carve out a couple of functions which will be used in the next commit.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
11 years agodatapath: Increase maximum allocation size of action list.
Pravin B Shelar [Fri, 1 Mar 2013 00:15:00 +0000 (16:15 -0800)]
datapath: Increase maximum allocation size of action list.

The switch to flow based tunneling increased the size of each output
action in the flow action list.  In extreme cases, this can result
in the action list exceeding the maximum buffer size.
This doubles the maximum buffer size to compensate for the increase
in action size.
Action list is recieved from netlink callback which is allocating
linear-skb, therefore allocating another multi-page buffer would
not increase probability of the allocation-failure a lot.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #15203

11 years agoRevert "datapath: Increase maximum allocation size of action list."
Pravin B Shelar [Fri, 1 Mar 2013 03:40:02 +0000 (19:40 -0800)]
Revert "datapath: Increase maximum allocation size of action list."

This reverts commit 649b1c68fdd39316e3bcea21ce5464da614a6691.
This patch introduced bug by calling vfree() from interrupt context.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
11 years agonetdev-linux: Fix netdev_linux_send() return value in corner case.
Ben Pfaff [Tue, 26 Feb 2013 20:35:40 +0000 (12:35 -0800)]
netdev-linux: Fix netdev_linux_send() return value in corner case.

A negative 'sock' means there was an error but netdev_linux_send() returns
a positive errno value on error.

Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agonx-match: Correct writing of value and length in set_field_to_ofast()
Simon Horman [Wed, 27 Feb 2013 07:12:16 +0000 (16:12 +0900)]
nx-match: Correct writing of value and length in set_field_to_ofast()

ofpbuf_put_* may reallocate the underlying buffer of the ofpbuf and
thus writing data after a ofpbuf_put_* call must write to memory
relative to the pointer returned by the call.

Prior to this change the length and trailing value would not be written to
the set_field action if ofpbuf_put_* may reallocated the underlying buffer.

Also make use of ofpbuf_put_zero() to avoid calling memset() directly.

Tested-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agoofproto-dpif: Handle tunnel config changes in facet_revalidate().
Ethan Jackson [Wed, 27 Feb 2013 03:12:22 +0000 (19:12 -0800)]
ofproto-dpif: Handle tunnel config changes in facet_revalidate().

For most of the history of Open vSwitch, one could assume that a
given datapath flow key would consistently translate into the same
userspace struct flow representation.  However, with the switch to
flow based tunneling, we now have a situation where a database
configuration change can cause a datapath flow key's in_port to
correspond to a completely different OpenFlow in_port possibly on a
completely different bridge.  This can cause all sorts of problems,
including traffic black holes due to confused facet revalidations.

To solve the problem, this patch verifies that each facet's
subfacets still result in the appropriate struct flow.  If a facet
fails this test, it is simply removed.

Bug #15213.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
11 years agoofproto-dpif: Ignore subfacet install errors.
Ethan Jackson [Wed, 27 Feb 2013 23:44:06 +0000 (15:44 -0800)]
ofproto-dpif: Ignore subfacet install errors.

When we fail to install a subfacet, there's not much we can do
other than note that it happened.  However, doing this requires us
to maintain a pointer to a subfacet which theoretically could be
destroyed by facet_revalidate() later.  This patch solves the
problem by simply assuming dpif_flow_put() always succeeds.  This
should have no effect on behavior.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
11 years agoofproto-dpif: Always maintain subfacet key.
Ethan Jackson [Wed, 27 Feb 2013 04:10:46 +0000 (20:10 -0800)]
ofproto-dpif: Always maintain subfacet key.

Due to flow based tunneling, we can no longer assume that it's
possible to reconstruct a subfacet's key from its facet's flow.
The flow's in_port may be stale due to tunnel configuration
changes.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
11 years agotests: Remove LISP unit test.
Jesse Gross [Thu, 28 Feb 2013 00:32:14 +0000 (16:32 -0800)]
tests: Remove LISP unit test.

LISP doesn't exist yet in Open vSwitch 1.10, so the test fails
(correctly).  This removes the test from this release.

Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agotests: Add VXLAN and LISP tunnel tests to the unit test infrastructure.
Kyle Mestery [Wed, 27 Feb 2013 18:43:21 +0000 (13:43 -0500)]
tests: Add VXLAN and LISP tunnel tests to the unit test infrastructure.

Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoofproto: Create and delete tnl_backers in type_run()
Kyle Mestery [Fri, 15 Feb 2013 22:12:13 +0000 (17:12 -0500)]
ofproto: Create and delete tnl_backers in type_run()

Garbage collect tnl_backers during type_run(). Add new
tnl_backers if a VXLAN port's UDP port changes.

Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
11 years agovxlan: Change dpif_backer->tnl backer to a "struct simap"
Kyle Mestery [Thu, 14 Feb 2013 14:37:28 +0000 (09:37 -0500)]
vxlan: Change dpif_backer->tnl backer to a "struct simap"

Move dpif_backer->tnl_backers from a "struct sset" to a
"struct simap". Store odp_port in the new map.  This will make it easier to
access the odp_port for future patches.

Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agodpif-linux: Fix byte-swapping direction in nl_msg_put_u16() call.
Ben Pfaff [Fri, 15 Feb 2013 19:24:27 +0000 (11:24 -0800)]
dpif-linux: Fix byte-swapping direction in nl_msg_put_u16() call.

OVS_TUNNEL_ATTR_DST_PORT expects a u16, tnl_cfg->dst_port is a be16, so
we want ntohs() instead of htons().

In practice htons() and ntohs() perform the same operation, so this does
not fix a real bug.

Found by sparse.

Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agoModify dpif_linux_port_add() to set the destination port for VXLAN ports.
Kyle Mestery [Thu, 14 Feb 2013 14:37:26 +0000 (09:37 -0500)]
Modify dpif_linux_port_add() to set the destination port for VXLAN ports.

Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agovxlan: Update netdev_vport_get_dpif_port() to support VXLAN port names
Kyle Mestery [Thu, 14 Feb 2013 14:37:25 +0000 (09:37 -0500)]
vxlan: Update netdev_vport_get_dpif_port() to support VXLAN port names

Modify netdev_vport_get_dpif_port() to return a name for
VXLAN ports which includes the destination UDP port number as a part of the
name.

Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agovxlan: Add utility functions to the simap data structure.
Kyle Mestery [Thu, 14 Feb 2013 14:37:27 +0000 (09:37 -0500)]
vxlan: Add utility functions to the simap data structure.

Add utility functions to the simap structure. These are
used by future patches in this seris. The functions added are.

Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
11 years agoin-band: Use "internal" netdev type for local ports.
Ethan Jackson [Fri, 22 Feb 2013 03:13:16 +0000 (19:13 -0800)]
in-band: Use "internal" netdev type for local ports.

A bridge's local port always has type "internal", so opening it
with type "system" can't be correct.  This was causing upgrade
problems.  Specifically, in certain bridge topologies, if there was
a manager set force-reload-kmod would fail.  This is because the
local port netdev would open in the in-band code with type
"system", confusing the more important netdev_open() in
iface_create().

Bug #15067.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
11 years agovxlan: new draft revision
Lorand Jakab [Mon, 25 Feb 2013 02:58:03 +0000 (18:58 -0800)]
vxlan: new draft revision

The VXLAN draft just got updated from -02 to -03, with no major changes.
Update documentation to reflect the change.

Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Acked-by: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agodatapath: fix the calculation of checksum for vlan header
Cong Wang [Sat, 23 Feb 2013 03:22:41 +0000 (19:22 -0800)]
datapath: fix the calculation of checksum for vlan header

In vlan_insert_tag(), we insert a 4-byte VLAN header _after_
mac header:

        memmove(skb->data, skb->data + VLAN_HLEN, 2 * ETH_ALEN);
        ...
        veth->h_vlan_proto = htons(ETH_P_8021Q);
        ...
        veth->h_vlan_TCI = htons(vlan_tci);

so after it, we should recompute the checksum to include these 4 bytes.
skb->data still points to the mac header, therefore VLAN header is at
(2 * ETH_ALEN = 12) bytes after it, not (ETH_HLEN = 14) bytes.

This can also be observed via tcpdump:

         0x0000:  ffff ffff ffff 5254 005d 6f6e 8100 000a
         0x0010:  0806 0001 0800 0604 0001 5254 005d 6f6e
         0x0020:  c0a8 026e 0000 0000 0000 c0a8 0282

Similar for __pop_vlan_tci(), the vlan header we remove is the one
overwritten in:

memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN);

Therefore the VLAN_HLEN = 4 bytes after 2 * ETH_ALEN is the part
we want to sub from checksum.

Cc: David S. Miller <davem@davemloft.net>
Cc: Jesse Gross <jesse@nicira.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agodatapath: Increase maximum allocation size of action list.
Pravin B Shelar [Sat, 23 Feb 2013 01:16:11 +0000 (17:16 -0800)]
datapath: Increase maximum allocation size of action list.

The switch to flow based tunneling increased the size of each output
action in the flow action list.  In extreme cases, this can result
in the action list exceeding the maximum buffer size.

This doubles the maximum buffer size to compensate for the increase
in action size.  In the common case, most allocations will be
less than a page and those uses kmalloc.  Therefore, for the majority
of situations, this will have no impact.

Bug #15203
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
11 years agoofproto-dpif: Look at the flow's ofproto when handling flow misses.
Justin Pettit [Fri, 22 Feb 2013 22:07:47 +0000 (14:07 -0800)]
ofproto-dpif: Look at the flow's ofproto when handling flow misses.

When handling flow misses, an attempt is made to group identical packets
together.  Before the single datapath, each OpenFlow port number was
unique, so the flow_equal() function was sufficient to check whether
packets are identical.  With the single datapath, the OpenFlow port
numbers are shared across bridges, so packets that arrive at the same
time and are identical other than their ingress port were being serviced
by the same ofproto instance.  This commit changes the duplicate flow
finding function to take the ofproto into account.

Bug #14934

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
11 years agodatapath: Fix parsing invalid LLC/SNAP ethertypes
Rich Lane [Fri, 8 Feb 2013 23:29:57 +0000 (15:29 -0800)]
datapath: Fix parsing invalid LLC/SNAP ethertypes

Before this patch, if an LLC/SNAP packet with OUI 00:00:00 had an ethertype
less than 1536 the flow key given to userspace in the upcall would contain the
invalid ethertype (for example, 3). If userspace attempted to insert a kernel
flow for this key it would be rejected by ovs_flow_from_nlattrs.

This patch allows OVS to pass the OFTest pktact.DirectBadLlcPackets.

Signed-off-by: Rich Lane <rlane@bigswitch.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoofproto-dpif: Receive special packets on patch ports.
Ethan Jackson [Sat, 16 Feb 2013 20:07:18 +0000 (12:07 -0800)]
ofproto-dpif: Receive special packets on patch ports.

Commit 0a740f48293 (ofproto-dpif: Implement patch ports in
userspace.) allowed special packets (i.e. LACP, CFM, etc) to be
sent on patch ports, but not received.  This patch implements the
logic required to receive special packets on patch ports.

Bug #15154.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
11 years agoofproto-dpif: Reduce number of get_ofp_port() calls during flow xlate.
Ben Pfaff [Tue, 12 Feb 2013 23:56:10 +0000 (15:56 -0800)]
ofproto-dpif: Reduce number of get_ofp_port() calls during flow xlate.

Until now the flow translation code has done one get_ofp_port() call
initially to check for special processing, then one for each level of
action processing.  Only one call is actually necessary, though, because
the in_port of a flow doesn't change in ordinary circumstances, and so this
commit eliminates the unnecessary calls.

The one case where the in_port can change is when a packet passes through
a patch port.  The implementation here was buggy anyway: when the patch
port's peer had forwarding disabled by STP, then the code would drop all
ODP actions, even those that were executed before the packet crossed the
patch port.  This commit fixes that case.

With a complicated flow table involving multiple levels of resubmit, this
increases flow setup performance by 2-3%.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
11 years agotunnel: set skb mark for IPsec tunnel packets
Ansis Atteka [Thu, 14 Feb 2013 00:48:46 +0000 (16:48 -0800)]
tunnel: set skb mark for IPsec tunnel packets

The new ovs-monitor-ipsec implementation will use skb marks in
IPsec policies. This patch will configure datapath to use these
skb marks for IPsec tunnel packets.

Issue: 14870
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>