cascardo/ovs.git
9 years agoFactor the ovsdb-server main loop into a new function
Eric Sesterhenn [Fri, 11 Jul 2014 11:24:06 +0000 (13:24 +0200)]
Factor the ovsdb-server main loop into a new function

This refactors the ovsdb-server main loop into a new function, which allows
to call it from multiple places.

Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoCodingStyle: Add suggested GNU indent options.
Ben Pfaff [Tue, 24 Jun 2014 18:44:16 +0000 (11:44 -0700)]
CodingStyle: Add suggested GNU indent options.

Suggested-by: Gerald Rogers <gerald.rogers@intel.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agodpif: Update documentation for RCU-protected actions.
Joe Stringer [Tue, 15 Jul 2014 12:04:52 +0000 (12:04 +0000)]
dpif: Update documentation for RCU-protected actions.

The userspace datapath returns RCU-protected actions from flow_get() and
flow_dump_next(). This doesn't cause any trouble for current users of
these functions, but it imposes additional constraints on their use.
This patch makes the dpif documentation more explicit about how the
results of these functions can be used.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agotests: Use the 'LARGE_MSECS' variation of time/warp at more places.
Gurucharan Shetty [Tue, 15 Jul 2014 15:16:52 +0000 (08:16 -0700)]
tests: Use the 'LARGE_MSECS' variation of time/warp at more places.

commit 8661af798(timeval: Provide a variation for time/warp command)
provided a variation of time/warp command to prevent multiple
invocations of time/warp. That commit changed only the users in bfd.at
and cfm.at as they used it the most. Since we haven't had any negative
confequences because of the change, add the remaining users.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodebian: Automatically start openvswitch before first invocation of ovs-vsctl.
Gurucharan Shetty [Fri, 11 Jul 2014 21:50:50 +0000 (14:50 -0700)]
debian: Automatically start openvswitch before first invocation of ovs-vsctl.

In the 'interfaces' file, if an admin adds the openvswitch interface
in 'auto', ifupdown will try to create OVS interfaces even before
openvswitch has started. In a case like that, assume that the admin
knows what he is doing and try to start openvswitch.

The negatives I see are
1. /usr is NFS mounted, in which case expect the admin
to mount it through initramfs.
2. syslog through network may not be accessible.

This is similar to what rhel does with commit 602453000e28ec10
(rhel: Automatically start openvswitch service before bringing an ovs
interface up or down)

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodebian: Option to create patch ports through 'interfaces'
Gurucharan Shetty [Fri, 11 Jul 2014 21:32:48 +0000 (14:32 -0700)]
debian: Option to create patch ports through 'interfaces'

This is a port of commit d7aab661 ( rhel: Add Patch Port support to
initscripts.) from rhel to debian's ifupdown script.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/coverage: Removed set but not used variables
Daniele Di Proietto [Tue, 15 Jul 2014 18:32:36 +0000 (11:32 -0700)]
lib/coverage: Removed set but not used variables

This removes a GCC 4.9 warning (unused-but-set-variable)

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Check for NULL upcall_portids.
Pravin B Shelar [Tue, 15 Jul 2014 18:12:11 +0000 (11:12 -0700)]
datapath: Check for NULL upcall_portids.

Following patch adds NULL check for memory allocated
by kmalloc.

Reported-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetlink-socket: Fix handling socket allocation failure in nl_dump_start().
Ben Pfaff [Mon, 14 Jul 2014 21:06:03 +0000 (14:06 -0700)]
netlink-socket: Fix handling socket allocation failure in nl_dump_start().

If nl_pool_alloc() failed, then 'dump' was not initialized at all and
further use of the dump would access uninitialized data, probably causing
a crash.

Found by inspection.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agonetlink-socket: Refill comment to fit within 79 columns.
Ben Pfaff [Mon, 14 Jul 2014 20:40:18 +0000 (13:40 -0700)]
netlink-socket: Refill comment to fit within 79 columns.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agodpif-linux: Avoid null dereference if all ports disappear.
Ben Pfaff [Mon, 14 Jul 2014 20:17:05 +0000 (13:17 -0700)]
dpif-linux: Avoid null dereference if all ports disappear.

When dpif_linux_refresh_channels() refreshes the set of channels when
the number of handlers changes, it destroys all the dpif's channels and
sets dpif->uc_array_size to 0.  If the port dump later in the function
turns up no ports (which generally indicates a bug), then no channels will
be allocated and thus dpif->uc_array_size will remain 0 and 'channels' will
be null in each handler.  This is self-consistent, at least, but
dpif_linux_port_get_pid__() was still willing in this situation to
try to access element 0 of the set of channels, dereferencing a null
pointer.

This fixes the problem.

I encountered this while looking at a bug that I had introduced during
development that caused the port dump to always be empty.  It would be
difficult to encounter in normal use.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agoofp-msgs: Correct code for queue configuration messages in OpenFlow 1.0.
Ben Pfaff [Mon, 14 Jul 2014 17:37:38 +0000 (10:37 -0700)]
ofp-msgs: Correct code for queue configuration messages in OpenFlow 1.0.

Reported-by: Simon Jouet <simon.jouet@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoFix documentation error that led user to wrong file to install dependency packages.
Kirkland Spector [Tue, 15 Jul 2014 16:48:20 +0000 (09:48 -0700)]
Fix documentation error that led user to wrong file to install dependency packages.

Signed-off-by: Kirkland Spector <kspector@salesforce.com>
Acked-by: Andrey Falko <afalko@salesforce.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoDrop assignments whose values are never used.
Ben Pfaff [Mon, 14 Jul 2014 21:29:20 +0000 (14:29 -0700)]
Drop assignments whose values are never used.

Found by clang-analyzer.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agocoverage: Move m_idx, h_idx to an inner scope in coverage_run().
Ben Pfaff [Mon, 14 Jul 2014 21:31:36 +0000 (14:31 -0700)]
coverage: Move m_idx, h_idx to an inner scope in coverage_run().

These variables were initialized in an outer scope and then immediately
changed in an inner one, so they might as well be farther in.

Found by clang-analyzer.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agoofproto: Avoid theoretical double free of large rule collections.
Ben Pfaff [Mon, 14 Jul 2014 21:47:12 +0000 (14:47 -0700)]
ofproto: Avoid theoretical double free of large rule collections.

collect_rules_strict() and collect_rules_loose() destroy the rule
collections that they create if they return an error, and some of their
callers then go on to destroy them again.  This could cause a double-free
in the case where rule_collection_destroy() actually calls free().  That
never happens in the current tree, because free() is only necessary if
malloc() was called and there's a 64-entry stub that none of the current
code in collect_rules_*() can fill up in their error cases.  Still, it
seems better to fix the problem.

Found by clang-analyzer.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agoofp-util: Fix null pointer dereference in ofputil_pull_buckets().
Ben Pfaff [Mon, 14 Jul 2014 21:33:01 +0000 (14:33 -0700)]
ofp-util: Fix null pointer dereference in ofputil_pull_buckets().

Found by clang-analyzer.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agorevalidator: Revalidate missed flows.
Joe Stringer [Tue, 8 Jul 2014 07:04:05 +0000 (07:04 +0000)]
revalidator: Revalidate missed flows.

If the datapath doesn't dump a flow for some reason, and the current
dump is expected to revalidate all flows in the datapath, then perform
revalidation for those flows by fetching them during the sweep phase.
If revalidation is not required, then leave the flow in the datapath and
don't revalidate it.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodpif: Support fetching flow mask via dpif_flow_get().
Joe Stringer [Tue, 8 Jul 2014 07:04:04 +0000 (07:04 +0000)]
dpif: Support fetching flow mask via dpif_flow_get().

Change the interface to allow implementations to pass back a buffer, and
allow callers to specify which of actions, mask, and stats they wish to
receive. This will be used in the next commit.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovsdb-server.at: Skip tests that use ovsdb-server's "--run" on Windows.
Gurucharan Shetty [Mon, 16 Jun 2014 20:49:18 +0000 (13:49 -0700)]
ovsdb-server.at: Skip tests that use ovsdb-server's "--run" on Windows.

ovsdb-server's port on Windows does not support the "--run" option.
The two tests skipped in this commit make use of "--run" option to
test ovsdb-server's truncating of corrupt log or bad transaction.
It looks a little tricky to get this test running on Windows without
the "--run" option implemented.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Refactor ovs_flow_cmd_fill_info().
Joe Stringer [Thu, 26 Jun 2014 00:10:37 +0000 (12:10 +1200)]
datapath: Refactor ovs_flow_cmd_fill_info().

Split up ovs_flow_cmd_fill_info() to make it easier to cache parts of a
dump reply. This will be used to streamline flow_dump in a future patch.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agotests: Disable glibc memory checking under glibc <= 2.11.
Ben Pfaff [Fri, 11 Jul 2014 18:03:08 +0000 (11:03 -0700)]
tests: Disable glibc memory checking under glibc <= 2.11.

We noticed that the unit tests sometimes fail on XenServer inside glibc's
memory checker, in the free_check() function.  It turns out that the
glibc memory checker in glibc 2.11 and earlier had an internal race that
caused false positives in multithreaded programs.

This commit avoids the problem by disabling the glibc memory checker in
glibc 2.11 and earlier.

VMware-BZ: #1267127
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agodatapath/flow_netlink: Create right mask with disabled megaflows
Daniele Di Proietto [Fri, 11 Jul 2014 17:01:17 +0000 (10:01 -0700)]
datapath/flow_netlink: Create right mask with disabled megaflows

If megaflows are disabled, the userspace does not send the netlink attribute
OVS_FLOW_ATTR_MASK, and the kernel must create an exact match mask.

sw_flow_mask_set() sets every bytes (in 'range') of the mask to 0xff, even the
bytes that represent padding for struct sw_flow, or the bytes that represent
fields that may not be set during ovs_flow_extract().
This is a problem, because when we extract a flow from a packet,
we do not memset() anymore the struct sw_flow to 0 (since commit 9cef26ac6a71).

This commit gets rid of sw_flow_mask_set() and introduces mask_set_nlattr(),
which operates on the netlink attributes rather than on the mask key. Using
this approach we are sure that only the bytes that the user provided in the
flow are matched.

Also, if the parse_flow_mask_nlattrs() for the mask ENCAP attribute fails, we
now return with an error.

Reported-by: Alex Wang <alexw@nicira.com>
Suggested-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: Enable tunnel GSO features.
Pravin B Shelar [Thu, 3 Jul 2014 18:38:29 +0000 (11:38 -0700)]
datapath: Enable tunnel GSO features.

Following patch enables all available tunnel GSO features for OVS
bridge device so that ovs can use hardware offloads available to
underling device.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agolib/hash: Use CRC32 for hashing.
Jarno Rajahalme [Fri, 11 Jul 2014 12:57:11 +0000 (05:57 -0700)]
lib/hash: Use CRC32 for hashing.

Use CRC32 intrinsics for hash computations when building for
X86_64 with SSE4_2.

Add a new hash_words64() and change hash_words() to be inlined when
'n_words' is a compile-time constant.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/classifier: Lockless lookups.
Jarno Rajahalme [Fri, 11 Jul 2014 09:29:08 +0000 (02:29 -0700)]
lib/classifier: Lockless lookups.

Now that all the relevant classifier structures use RCU and internal
mutual exclusion for modifications, we can remove the fat-rwlock and
thus make the classifier lookups lockless.

As the readers are operating concurrently with the writers, a
concurrent reader may or may not see a new rule being added by a
writer, depending on how the concurrent events overlap with each
other.  Overall, this is no different from the former locked behavior,
but there the visibility of the new rule only depended on the timing
of the locking functions.

A new rule is first added to the segment indices, so the readers may
find the rule in the indices before the rule is visible in the
subtables 'rules' map.  This may result in us losing the opportunity
to quit lookups earlier, resulting in sub-optimal wildcarding.  This
will be fixed by forthcoming revalidation always scheduled after flow
table changes.

Similar behavior may happen due to us removing the overlapping rule
(if any) from the indices only after the corresponding new rule has
been added.

The subtable's max priority is updated only after a rule is inserted
to the maps, so the concurrent readers may not see the rule, as the
updated priority ordered subtable list will only be visible after the
subtable's max priority is updated.

Similarly, the classifier's partitions are updated by the caller after
the rule is inserted to the maps, so the readers may keep skipping the
subtable until they see the updated partitions.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agolib/classifier: RCUify prefix trie code.
Jarno Rajahalme [Fri, 11 Jul 2014 09:29:08 +0000 (02:29 -0700)]
lib/classifier: RCUify prefix trie code.

cls_set_prefix_fields() now synchronizes explicitly with the readers,
waiting them to finish using the old configuration before changing to
the new configuration.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agolib/classifier: Use internal mutex.
Jarno Rajahalme [Fri, 11 Jul 2014 09:29:08 +0000 (02:29 -0700)]
lib/classifier: Use internal mutex.

Add an internal mutex to struct cls_classifier, and reorganize
classifier internal structures according to the user of each field,
marking the fields that need to be protected by the mutex.  This makes
locking requirements easier to track, and may make lookup more memory
efficient.

After this patch there is some double locking, as callers are taking
the fat-rwlock, and we take the mutex internally.  A following patch
will remove the classifier fat-rwlock, removing the (double) locking
overhead.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agolib/classifier: Stylistic change.
Jarno Rajahalme [Fri, 11 Jul 2014 09:29:08 +0000 (02:29 -0700)]
lib/classifier: Stylistic change.

Rename 'nbits' as 'n_bits' to be more consistent with other count-like
fields.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agolib/ovs-rcu: Export ovsrcu_synchronize().
Jarno Rajahalme [Fri, 11 Jul 2014 09:29:07 +0000 (02:29 -0700)]
lib/ovs-rcu: Export ovsrcu_synchronize().

A following patch will add the first external user.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agolib/classifier: Simplify iteration with C99 declaration.
Jarno Rajahalme [Fri, 11 Jul 2014 09:29:07 +0000 (02:29 -0700)]
lib/classifier: Simplify iteration with C99 declaration.

Hide the cursor from the classifier iteration users and move locking to
the iterators.  This will make following RCU changes simpler, as the call
sites of the iterators need not be changed at that point.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agolib/classifier: Use cmap.
Jarno Rajahalme [Fri, 11 Jul 2014 09:29:07 +0000 (02:29 -0700)]
lib/classifier: Use cmap.

Use cmap instead of hmap & hindex in classifier.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agonetlink-socket: Work around kernel Netlink dump thread races.
Ben Pfaff [Thu, 10 Jul 2014 23:48:16 +0000 (16:48 -0700)]
netlink-socket: Work around kernel Netlink dump thread races.

The Linux kernel Netlink implementation has two races that cause problems
for processes that attempt to dump a table in a multithreaded manner.

The first race is in the structure of the kernel netlink_recv() function.
This function pulls a message from the socket queue and, if there is none,
reports EAGAIN:
skb = skb_recv_datagram(sk, flags, noblock, &err);
if (skb == NULL)
goto out;
Only if a message is successfully read from the socket queue does the
function, toward the end, try to queue up a new message to be dumped:
if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) {
ret = netlink_dump(sk);
if (ret) {
sk->sk_err = ret;
sk->sk_error_report(sk);
}
}
This means that if thread A reads a message from a dump, then thread B
attempts to read one before A queues up the next, B will get EAGAIN.  This
means that, following EAGAIN, B needs to wait until A returns to userspace
before it tries to read the socket again.  nl_dump_next() already does
this, using 'dump->status_seq' (although the need for it has never been
explained clearly, to my knowledge).

The second race is more serious.  Suppose thread X and thread Y both
simultaneously attempt to queue up a new message to be dumped, using the
call to netlink_dump() quoted above.  netlink_dump() begins with:
mutex_lock(nlk->cb_mutex);

cb = nlk->cb;
if (cb == NULL) {
err = -EINVAL;
goto errout_skb;
}
Suppose that X gets cb_mutex first and finds that the dump is complete.  It
will therefore, toward the end of netlink_dump(), clear nlk->cb to NULL to
indicate that no dump is in progress and release the mutex:
nlk->cb = NULL;
mutex_unlock(nlk->cb_mutex);
When Y grabs cb_mutex afterward, it will see that nlk->cb is NULL and
return -EINVAL as quoted above.  netlink_recv() stuffs -EINVAL in sk_err,
but that error is not reported immediately; instead, it is saved for the
next read from the socket.  Since Open vSwitch maintains a pool of Netlink
sockets, that next failure can crop up pretty much anywhere.  One of the
worst places for it to crop up is in the execution of a later transaction
(e.g. in nl_sock_transact_multiple__()), because userspace treats Netlink
transactions as idempotent and will re-execute them when socket errors
occur.  For a transaction that sends a packet, this causes packet
duplication, which we actually observed in practice.  (ENOBUFS should
actually cause transactions to be re-executed in many cases, but EINVAL
should not; this is a separate bug in the userspace netlink code.)

VMware-BZ: #1283188
Reported-and-tested-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
9 years agonetlink-socket: Fix sign of error code.
Ben Pfaff [Thu, 10 Jul 2014 21:32:10 +0000 (14:32 -0700)]
netlink-socket: Fix sign of error code.

Commit 8f20fd98db (netlink-socket: Work around upstream kernel Netlink
bug.) got the sign of the error code wrong, so that it reported e.g. -22
for EINVAL to nl_sock_recv__()'s caller, instead of 22.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
9 years agodatapath: refactor do_output() to move skb_clone NULL check out of fast path
Andy Zhou [Thu, 10 Jul 2014 07:50:26 +0000 (00:50 -0700)]
datapath: refactor do_output() to move skb_clone NULL check out of fast path

skb_clone() NULL check is implemented in do_output(), as past of the
common (fast) path. Refactoring so that NULL check is done in the
slow path, immediately after skb_clone() is called.

Besides optimization, this change also improves code readability by
making the skb_clone() NULL check consistent within OVS datapath
module.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: add skb_clone NULL check for the sampling action.
Andy Zhou [Thu, 10 Jul 2014 07:49:06 +0000 (00:49 -0700)]
datapath: add skb_clone NULL check for the sampling action.

Fix a bug where skb_clone() NULL check is missing in sample action
implementation.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: add skb_clone NULL check in the recirc action.
Andy Zhou [Thu, 10 Jul 2014 07:30:27 +0000 (00:30 -0700)]
datapath: add skb_clone NULL check in the recirc action.

Refactoring recirc action implementation.

The main change is to fix a bug where the NULL check after skb clone()
call is missing.  The fix is to return -ENOMEM whenever skb_clone()
failed to create a clone.

Also rearranged adjacent code to improve readability.

Reported-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetlink-protocol: Remove definition of struct sockaddr_nl.
Ben Pfaff [Thu, 10 Jul 2014 20:21:35 +0000 (13:21 -0700)]
netlink-protocol: Remove definition of struct sockaddr_nl.

This struct is used only in netlink-socket.c, which is only used on Linux,
which in turn gets the definition from <linux/netlink.h>.  On Windows the
definition actually causes a small amount of trouble because Windows does
not define sa_family_t (despite POSIX), so it's better to remove it.

Even if other platforms adopt Netlink, I have no reason to believe that
they will use the same sockaddr format as Linux.

CC: Saurabh Shah <ssaurabh@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agodatapath/flow_netlink: Fix NDP flow mask validation
Daniele Di Proietto [Wed, 9 Jul 2014 22:22:14 +0000 (15:22 -0700)]
datapath/flow_netlink: Fix NDP flow mask validation

match_validate() enforce that a mask matching on NDP attributes has also an
exact match on ICMPv6 type.
The ICMPv6 type, which is 8-bit wide, is stored in the 'tp.src' field of
'struct sw_flow_key', which is 16-bit wide.
Therefore, an exact match on ICMPv6 type should only check the first 8 bits.

This commit fixes a bug that prevented flows with an exact match on NDP field
from being installed

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodaemon: restart child process if it died before signaling its readiness
Ansis Atteka [Tue, 8 Jul 2014 00:11:39 +0000 (17:11 -0700)]
daemon: restart child process if it died before signaling its readiness

The child process (the one being monitored) could die before it was able
to call fork_notify_startup() function.  If such situation arises, then
parent process (the one monitoring child process) would also terminate
with a fatal log message:

...|EMER|fork child died before signaling startup (killed (...))

This patch changes that behavior by always restarting child process
if it was able to start up at least once in the past.  However, if
child was not able to start up even once, then the monitor process
would still terminate, because that would most likely indicate a
persistent programming or system error.

To reproduce use following script:

while : ; do kill -SIGSEGV `cat /var/run/openvswitch/ovs-vswitchd.pid`; done

Signed-Off-By: Ansis Atteka <aatteka@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
VMware-BZ: 1273550

9 years agotimeval: Initialize 'unix_epoch' for Windows correctly.
Gurucharan Shetty [Thu, 26 Jun 2014 04:27:36 +0000 (21:27 -0700)]
timeval: Initialize 'unix_epoch' for Windows correctly.

Till now, we were initializing 'unix_epoch' through time_init().
But if there was a call directly to xgettimeofday(), we would
miss the initialization causing overflows. This commit fixes it
by pre-calculating the value and assigning it globally.

Also add-in a missing return statement.

Reported-by: Alessandro Pilotti <apilotti@cloudbasesolutions.com>
Reported-by: Linda Sun <lsun@vmware.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodpif-linux: Recheck the socket pointer existence before getting its pid.
Alex Wang [Tue, 8 Jul 2014 04:58:33 +0000 (21:58 -0700)]
dpif-linux: Recheck the socket pointer existence before getting its pid.

This commit fixes a race between port deletion and flow miss handling.
More specifically, a port could be removed by main thread while
the handler thread is handling the flow miss from it.  If the flow
requires slow path action, the handler thread will try querying a pid
from port's socket.  Since the port has been deleted, the query will
cause a dereference of NULL socket pointer.

This commit makes the handler thread recheck the socket pointer before
dereferencing it.

VMware-BZ: 1251981

Reported-by: Pratap Reddy <preddy@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agotests: Fix "megaflow disabled" test.
Joe Stringer [Tue, 8 Jul 2014 09:01:20 +0000 (09:01 +0000)]
tests: Fix "megaflow disabled" test.

The previous 'grep' logic would occasionally catch unintended flows.
There's no reason to check for these flows separately, so combine the
two checks.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ansis Atteka <aatteka@nicira.com>
9 years agotests: Fix race in 'balance-tcp bonding' test.
Joe Stringer [Fri, 4 Jul 2014 06:58:28 +0000 (06:58 +0000)]
tests: Fix race in 'balance-tcp bonding' test.

Running the test in a tight loop could cause this test to fail after
about 5 runs, with some of the ports reporting "may_enable: false" in
the "ovs-appctl bond/show" output. This commit fixes the race condition
by waiting for may_enable to be true for all bond ports.

I suspect that LACP negotiation finishes, but the main thread doesn't
have a chance to enable the ports before we send the test packets.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoutil: fix compile warnings
Ansis Atteka [Tue, 8 Jul 2014 04:11:53 +0000 (04:11 +0000)]
util: fix compile warnings

This patch fixes two compile warnings introduced by commit
64b73291 ("util: create a copy of program_name"):
1. ../lib/util.c:457:5: error: passing argument 1 of 'free'
   discards 'const' qualifier from pointer target type; And
2. ../lib/util.c:463:5: error: ISO C90 forbids mixed declarations
   and code [-Werror=declaration-after-statement] (affected only
   branch-2.3 that is C90 compliant and not the master)

Reported-By: Joe Stringer <jstringer@nicira.com>
Reported-By: Lorand Jakab <lojakab@cisco.com>
Signed-Off-By: Ansis Atteka <aatteka@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agoovsdb: Frees database memory on ovsdb process cleanup.
Ryan Wilson [Wed, 2 Jul 2014 22:00:16 +0000 (15:00 -0700)]
ovsdb: Frees database memory on ovsdb process cleanup.

This fixes valgrind errors.

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agolib/classifier: Remove unused typedef cls_cb_func.
Jarno Rajahalme [Mon, 7 Jul 2014 20:18:47 +0000 (13:18 -0700)]
lib/classifier: Remove unused typedef cls_cb_func.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoofproto-dpif: Use ovs_refcount_try_ref_rcu().
Jarno Rajahalme [Mon, 7 Jul 2014 20:18:46 +0000 (13:18 -0700)]
ofproto-dpif: Use ovs_refcount_try_ref_rcu().

This is a prerequisite step in making the classifier lookups lockless.
If taking a reference fails, we do the lookup again, as a new (lower
priority) rule may now match instead.

Also remove unwildcarding dl_type and nw_frag, as these are already
taken care of by xlate_actions().

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoUse ovs_refcount_unref_relaxed.
Jarno Rajahalme [Mon, 7 Jul 2014 20:18:46 +0000 (13:18 -0700)]
Use ovs_refcount_unref_relaxed.

After a quick analysis, in most cases the access to refcounted objects
is clearly protected either with an explicit lock/mutex, or RCU. there
are only a few places where I left a call to ovs_refcount_unref().
Upon closer analysis it may well be that those could also use the
relaxed form.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/ovs-atomic: Add ovs_refcount_unref_relaxed(), ovs_refcount_try_ref_rcu().
Jarno Rajahalme [Mon, 7 Jul 2014 20:18:46 +0000 (13:18 -0700)]
lib/ovs-atomic: Add ovs_refcount_unref_relaxed(), ovs_refcount_try_ref_rcu().

When a reference counted object is also RCU protected the deletion of
the object's memory is always postponed.  This allows
memory_order_relaxed to be used also for unreferencing, as RCU
quiescing provides a full memory barrier (it has to, or otherwise
there could be lingering accesses to objects after they are recycled).

Also, when access to the reference counted object is protected via a
mutex or a lock, the locking primitives provide the required memory
barrier functionality.

Also, add ovs_refcount_try_ref_rcu(), which takes a reference only if
the refcount is non-zero and returns true if a reference was taken,
false otherwise.  This can be used in combined RCU/refcount scenarios
where we have an RCU protected reference to an refcounted object, but
which may be unref'ed at any time.  If ovs_refcount_try_ref_rcu()
fails, the object may still be safely used until the current thread
quiesces.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/ovs-atomic: Add atomic compare_exchange.
Jarno Rajahalme [Mon, 7 Jul 2014 20:18:46 +0000 (13:18 -0700)]
lib/ovs-atomic: Add atomic compare_exchange.

Add support for atomic compare_exchange operations.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-atomic: Use explicit memory order for ovs_refcount.
Jarno Rajahalme [Mon, 7 Jul 2014 20:18:46 +0000 (13:18 -0700)]
ovs-atomic: Use explicit memory order for ovs_refcount.

Use explicit variants of atomic operations for the ovs_refcount to
avoid the overhead of the default memory_order_seq_cst.

Adding a reference requires no memory ordering, as the calling thread
is already assumed to have protected access to the object being
reference counted.  Hence, memory_order_relaxed is used for
ovs_refcount_ref().  ovs_refcount_read() does not change the reference
count, so it can also use memory_order_relaxed.

Unreferencing an object needs a release barrier, so that none of the
accesses to the protected object are reordered after the atomic
decrement operation.  Additionally, an explicit acquire barrier is
needed before the object is recycled, to keep the subsequent accesses
to the object's memory from being reordered before the atomic
decrement operation.

This patch follows the memory ordering and argumentation discussed
here:

http://www.chaoticmind.net/~hcb/projects/boost.atomic/doc/atomic/usage_examples.html

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/hash: Abstract hash interface.
Jarno Rajahalme [Fri, 4 Jul 2014 14:57:18 +0000 (07:57 -0700)]
lib/hash: Abstract hash interface.

Use generic names hash_add() and hash_finish() instead of mhash_*
equivalents.  This makes future changes to hash implentations more
localized.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodpif-netdev: Use cmap instead of hmap.
Jarno Rajahalme [Fri, 4 Jul 2014 13:38:47 +0000 (06:38 -0700)]
dpif-netdev: Use cmap instead of hmap.

This requires less locking and makes introducing lockless classifier
lookups possible.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoutil: Suppress a warning by adding CONST_CAST
YAMAMOTO Takashi [Fri, 4 Jul 2014 14:14:57 +0000 (14:14 +0000)]
util: Suppress a warning by adding CONST_CAST

Suppress a gcc warning which was introduced by
commit e0b48482c16b6eaa7f14d8c7e7c6275528881b9e.
("util: create a copy of program_name")
I guess MSVC doesn't have a corresponding warning.

Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Acked-by: Lorand Jakab <lojakab@cisco.com>
9 years agoutil: create a copy of program_name
Ansis Atteka [Thu, 3 Jul 2014 20:41:02 +0000 (13:41 -0700)]
util: create a copy of program_name

Commit 8a9562 ("dpif-netdev: Add DPDK netdev.") reversed sequence
in which set_program_name() and proctitle_init() functions are
called.  This introduced a regression where program_name and argv_start
would point to exactly the same memory (previously both of these
pointers were pointing to different memory locations because
proctitle_init() would have beforehand created a copy of argv[0]
for the succeeding set_program_name() call).

This regression on my system caused ovs-vswitchd monitoring process to
show up without process name:

...  00:00:00 : monitoring pid 26308 (healthy)

Ps output was lacking process name because following code was
using overlapping memory for source and target buffer:.

proctitle_set(const char *format, ...)
{
    ...
    n = snprintf(argv_start, argv_size, "%s: ", program_name);

Overall C99 and POSIX standards state that behavior is undefined
if source and target buffers overlap.

Signed-Off-By: Ansis Atteka <aatteka@nicira.com>
Acked-By: Ben Pfaff <blp@nicira.com>
9 years agorevalidator: Simplify push_dump_ops__().
Joe Stringer [Tue, 1 Jul 2014 09:54:18 +0000 (09:54 +0000)]
revalidator: Simplify push_dump_ops__().

Commit acaa8dac49 (revalidator: Eliminate duplicate flow handling.)
ensured that a ukey will always exist for a given flow, even if it is
about to be deleted. This means that push_dump_ops__() no longer needs
to handle the case where there is no ukey. This commit removes the
redundant code.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Additional logging for -EINVAL on flow setups.
Jesse Gross [Wed, 2 Jul 2014 00:41:51 +0000 (17:41 -0700)]
datapath: Additional logging for -EINVAL on flow setups.

There are many possible ways that a flow can be invalid so we've
added logging for most of them. This adds logs for the remaining
possible cases so there isn't any ambiguity while debugging.

CC: Federico Iezzi <fiezzi@enter.it>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
9 years agonetlink-socket: Work around upstream kernel Netlink bug.
Ben Pfaff [Mon, 30 Jun 2014 21:57:42 +0000 (14:57 -0700)]
netlink-socket: Work around upstream kernel Netlink bug.

The upstream kernel net/netlink/af_netlink.c netlink_recvmsg() contains the
following code to refill the Netlink socket buffer with more dump skbs
while a dump is in progress:

if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) {
ret = netlink_dump(sk);
if (ret) {
sk->sk_err = ret;
sk->sk_error_report(sk);
}
}

The netlink_dump() function that this calls returns a negative number on
error, the convention used throughout the kernel, and thus sk->sk_err
receives a negative value on error.

However, sk->sk_err is supposed to contain either 0 or a positive errno
value, as one can see from a quick "grep" through net for 'sk_err =', e.g.:

    ipv4/tcp.c:2067: sk->sk_err = ECONNRESET;
    ipv4/tcp.c:2069: sk->sk_err = ECONNRESET;
    ipv4/tcp_input.c:4106: sk->sk_err = ECONNREFUSED;
    ipv4/tcp_input.c:4109: sk->sk_err = EPIPE;
    ipv4/tcp_input.c:4114: sk->sk_err = ECONNRESET;
    netlink/af_netlink.c:741: sk->sk_err = ENOBUFS;
    netlink/af_netlink.c:1796: sk->sk_err = ENOBUFS;
    packet/af_packet.c:2476: sk->sk_err = ENETDOWN;
    unix/af_unix.c:341: other->sk_err = ECONNRESET;
    unix/af_unix.c:407: skpair->sk_err = ECONNRESET;

The result is that the next attempt to receive from the socket will return
the error to userspace with the wrong sign.

(The root of the error in this case is that multiple threads are attempting
to read a single flow dump from a shared fd.  That should work, but the
kernel has an internal race that can result in one or more of those threads
hitting the EINVAL case at the start of netlink_dump().  The EINVAL is
harmless in this case and userspace should be able to ignore it, but
reporting the EINVAL as if it were a 22-byte message received in userspace
throws a real wrench in the works.)

This bug makes me think that there are probably not many programs doing
multithreaded Netlink dumps.  Maybe it is good that we are considering
other approaches.

VMware-BZ: #1255704
Reported-by: Mihir Gangar <gangarm@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
9 years agoINSTALL: Mention conflict with NET_IPGRE setting before Linux 3.11.
Ben Pfaff [Wed, 2 Jul 2014 18:52:41 +0000 (11:52 -0700)]
INSTALL: Mention conflict with NET_IPGRE setting before Linux 3.11.

I found when reconfiguring my kernel that if I turned on NET_IPGRE, GRE
tunnels no longer worked.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
9 years agoINSTALL.DPDK: fix a typo in remote line
Flavio Leitner [Wed, 2 Jul 2014 18:47:41 +0000 (15:47 -0300)]
INSTALL.DPDK: fix a typo in remote line

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Pritesh Kothari <pritesh.kothari@cisco.com>
9 years agoINSTALL.Fedora, INSTALL.RHEL: Make instructions more explicit.
Ben Pfaff [Wed, 2 Jul 2014 19:50:08 +0000 (12:50 -0700)]
INSTALL.Fedora, INSTALL.RHEL: Make instructions more explicit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Pritesh Kothari <pritesh.kothari@cisco.com>
9 years agorevalidator: Improve optimization to skip revalidation.
Joe Stringer [Wed, 2 Jul 2014 07:41:33 +0000 (07:41 +0000)]
revalidator: Improve optimization to skip revalidation.

The should_revalidate() optimisation introduced with commit 698ffe3623
(revalidator: Only revalidate high-throughput flows.) was a little
aggressive, occasionally deleting flows even when OVS is quite capable
of performing full revalidation.

This commit modifies the logic to:
* Firstly, check if we are likely to handle full revalidation, and
  attempt that instead.
* Secondly, fall back to the existing flow throughput estimations to
  determine whether to revalidate the flow or just delete it.

VMware-BZ: #1271926

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
9 years agodatapath: Allow pop and push MPLS actions after pop VLAN
Simon Horman [Mon, 30 Jun 2014 04:20:14 +0000 (13:20 +0900)]
datapath: Allow pop and push MPLS actions after pop VLAN

This patch loosens the restrictions surrounding push and pop MPLS actions
such that they will be allowed after a pop VLAN action if the inner
ethernet type is acceptable for pop and push MPLS actions. This implies
that there is only one VLAN tag present.

Some analysis of logic of this change is as follows:

    The purpose of tracking vlan_tci is to allow prohibition of push
    and pop MPLS actions in the presence of a VLAN. In this scenario
    the VLAN_TAG_PRESENT bit of vlan_tci is set and eth_type is that of
    the packet with the outermost VLAN tag removed.

    A pop VLAN action may clear vlan_tci as it removes the outermost
    VLAN tag and the push and pop MPLS logic may rely on eth_type for
    their prohibition logic.

    This will not allow push and pop MPLS on packets with multiple VLAN
    tags, regardless of if they are all remove using POP VLAN, as there
    is no mechanism to expose the inner ethernet type beyond that of
    the outermost VLAN tag.

Suggested-by: Jesse Gross <jgross@nicira.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
9 years agodatapath: Use exact lookup for flow_get and flow_del.
Alex Wang [Mon, 30 Jun 2014 21:51:02 +0000 (14:51 -0700)]
datapath: Use exact lookup for flow_get and flow_del.

Due to the race condition in userspace, there is chance that two
overlapping megaflows could be installed in datapath.  And this
causes userspace unable to delete the less inclusive megaflow flow
even after it timeout, since the flow_del logic will stop at the
first match of masked flow.

This commit fixes the bug by making the kernel flow_del and flow_get
logic check all masks in that case.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: Change u64_stats_* to use _irq instead of _bh().
Jesse Gross [Mon, 30 Jun 2014 20:43:25 +0000 (13:43 -0700)]
datapath: Change u64_stats_* to use _irq instead of _bh().

The upstream u64_stats API has been changed to remove the _bh()
versions and switch all consumers to use IRQ safe variants instead.
This was done to be safe for netpoll generated packets, which can
occur in hard IRQ context. From a safety perspective, this doesn't
directly affect OVS since it doesn't support netpoll. However, this
change has been backported to older kernels so OVS needs to use the
new API to compile.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pritesh Kothari <pritesh.kothari@cisco.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agotest-vconn: Change the expected error for Windows.
Gurucharan Shetty [Wed, 25 Jun 2014 02:52:54 +0000 (02:52 +0000)]
test-vconn: Change the expected error for Windows.

On Windows ECONNRESET is WSAECONNRESET.

Also, "unix" connections are done through TCP sockets.
For the 'refuse-connection' test, the error message for Windows
is WSAECONNRESET instead of EPIPE.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agorhel, xenserver: Run unit tests while creating rpms.
Gurucharan Shetty [Mon, 23 Jun 2014 21:05:00 +0000 (14:05 -0700)]
rhel, xenserver: Run unit tests while creating rpms.

For RHEL, Fedora and Xenserver, run unit tests while
building rpms.  This may catch some cross-platform bugs.

The commit also allows the users to optionally skip unit tests.
(On debian, the default is to run unit tests. For consistency,
do the same for rpms.)

VMware-BZ: 1267127

CC: Flavio Leitner <fbl@redhat.com>
CC: Ben Pfaff <blp@nicira.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Tested-by: Flavio Leitner <fbl@redhat.com>
9 years agodpif-netdev: Batch megaflow lookup.
Ethan Jackson [Tue, 24 Jun 2014 01:28:43 +0000 (18:28 -0700)]
dpif-netdev: Batch megaflow lookup.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agoclassifier: Add a batched miniflow lookup function.
Ethan Jackson [Tue, 24 Jun 2014 01:40:47 +0000 (18:40 -0700)]
classifier: Add a batched miniflow lookup function.

Used in a future patch.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agodpif-netdev: Take the classifier rdlock once per batch.
Ethan Jackson [Thu, 5 Jun 2014 18:23:06 +0000 (11:23 -0700)]
dpif-netdev: Take the classifier rdlock once per batch.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agodpif-netdev: Rename batch_pkt_execute.
Ethan Jackson [Tue, 24 Jun 2014 01:22:03 +0000 (18:22 -0700)]
dpif-netdev: Rename batch_pkt_execute.

The new name "packet_batch" is a bit more straight forward.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agodatapath: Fix error handling for Geneve options in ipv4_tun_to_nlattr().
Ben Pfaff [Mon, 30 Jun 2014 20:48:30 +0000 (13:48 -0700)]
datapath: Fix error handling for Geneve options in ipv4_tun_to_nlattr().

Found by inspection.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
9 years agonetdev-dpdk: Add OVS_UNLIKELY annotations in dpdk_do_tx_copy().
Ryan Wilson [Fri, 27 Jun 2014 01:16:41 +0000 (18:16 -0700)]
netdev-dpdk: Add OVS_UNLIKELY annotations in dpdk_do_tx_copy().

Since dropped packets due to large packet size or lack of memory
are unlikely, it is best to add OVS_UNLIKELY annotations to these
conditions.

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoovs-benchmark: Compile for windows.
Gurucharan Shetty [Wed, 2 Apr 2014 15:31:50 +0000 (08:31 -0700)]
ovs-benchmark: Compile for windows.

This just makes ovs-benchmark compile on windows.
This lets us go ahead with just a 'make' instead of
picking and choosing executables that are tested to work on
windows as arguments for make.

This commit does not make ovs-benchmark a supported utility
on windows.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agotests: Port test-sflow and test-netflow to Windows.
Gurucharan Shetty [Fri, 23 May 2014 20:40:07 +0000 (13:40 -0700)]
tests: Port test-sflow and test-netflow to Windows.

After the change, both of them compile. test-netflow related
unit tests pass.

test-sflow related tests do not pass because
of LOOPBACK_INTERFACE constraints for 'agent'.
(It should be revisited later.)

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agonetdev-dpdk: Fix memory leak in dpdk_do_tx_copy().
Ryan Wilson [Fri, 27 Jun 2014 01:16:39 +0000 (18:16 -0700)]
netdev-dpdk: Fix memory leak in dpdk_do_tx_copy().

This patch fixes a bug where rte_pktmbuf_alloc() would fail and
packets which succeeded to allocate memory with rte_pktmbuf_alloc()
would not be sent and leak memory.

Also, as a byproduct of using a local variable to record dropped
packets, this reduces the locking of the netdev's mutex when
multiple packets are dropped in dpdk_do_tx_copy().

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetdev-dpdk: Set current timestamp when flushing TX queue.
Ryan Wilson [Fri, 27 Jun 2014 00:41:46 +0000 (17:41 -0700)]
netdev-dpdk: Set current timestamp when flushing TX queue.

The current timestamp should be set every time the queue is flushed.
Thus, if DRAIN_TSC timer cycles have passed since the last timestamp,
the send queue should be flushed again.

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetdev-dpdk: Refactor dpdk_queue_flush().
Ryan Wilson [Fri, 27 Jun 2014 00:41:45 +0000 (17:41 -0700)]
netdev-dpdk: Refactor dpdk_queue_flush().

This patch refactors dpdk_queue_flush() to reuse code in
dpdk_queue_pkts().

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoINSTALL.RHEL: Fix a typo.
Gurucharan Shetty [Mon, 30 Jun 2014 15:01:51 +0000 (08:01 -0700)]
INSTALL.RHEL: Fix a typo.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agopoll-loop: Create Windows event handles for sockets automatically.
Gurucharan Shetty [Fri, 27 Jun 2014 20:30:49 +0000 (13:30 -0700)]
poll-loop: Create Windows event handles for sockets automatically.

We currently have a poll_fd_wait_event(fd, wevent, events) function that
is used at places common to Windows and Linux where we have to wait on
sockets.  On Linux, 'wevent' is always set as zero. On Windows, for sockets,
when we send both 'fd' and 'wevent', we associate them with each other for
'events' and then wait on 'wevent'. Also on Windows, when we only send 'wevent'
to this function, we would simply wait for all events for that 'wevent'.

There is a disadvantage with this approach.
* Windows clients need to create a 'wevent' and then pass it along. This
means that at a lot of places where we create sockets, we also are forced
to create a 'wevent'.

With this commit, we pass the responsibility of creating a 'wevent' to
poll_fd_wait() in case of sockets. That way, a client using poll_fd_wait()
is only concerned about sockets and not about 'wevents'. There is a potential
disadvantage with this change in that we create events more often and that
may have a performance penalty. If that turns out to be the case, we will
eventually need to create a pool of wevents that can be re-used.

In Windows, there are cases where we want to wait on a event (not
associated with any sockets) and then control it using functions
like SetEvent() etc. For that purpose, introduce a new function
poll_wevent_wait(). For this function, the client needs to create a event
and then pass it along as an argument.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-By: Ben Pfaff <blp@nicira.com>
9 years agorpm: improved RPM sources dir explanation
Flavio Leitner [Mon, 30 Jun 2014 00:21:05 +0000 (21:21 -0300)]
rpm: improved RPM sources dir explanation

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoudpif: Add command to wait for revalidation.
Joe Stringer [Wed, 25 Jun 2014 14:02:45 +0000 (14:02 +0000)]
udpif: Add command to wait for revalidation.

This allows us to remove some of the sleeps from the testsuite.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Initialize OVS_CB in ovs_vport_receive()
Pravin B Shelar [Thu, 26 Jun 2014 22:15:00 +0000 (15:15 -0700)]
datapath: Initialize OVS_CB in ovs_vport_receive()

On packet recv OVS CB is initialized in multiple function.
Following patch moves all these initialization to
ovs_vport_receive().
This patch also save a check in execute actions.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
9 years agodpdk: High speed PMD physical NIC queue size
Polehn, Mike A [Thu, 19 Jun 2014 22:58:26 +0000 (22:58 +0000)]
dpdk: High speed PMD physical NIC queue size

Large TX and RX queues are needed for high speed 10 GbE physical NICS.
Observed a 250% zero loss improvement over small NIC queue test for
port to port flow test.

Signed-off-by: Mike A. Polehn <mike.a.polehn@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoFAQ: Add an entry for multicast snooping with VLAN
Flavio Leitner [Thu, 26 Jun 2014 12:00:37 +0000 (09:00 -0300)]
FAQ: Add an entry for multicast snooping with VLAN

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agobuild: Allow building with autoconf 2.63
Thomas Graf [Fri, 27 Jun 2014 07:31:57 +0000 (09:31 +0200)]
build: Allow building with autoconf 2.63

Reduces the dependency on autoconf from 2.64 to 2.63 to ease building
on older platforms. There is only a few macros missing and they can
be provided easily.

A handful of tests needed modification. The difference in quoting
behaviour between 2.63 and later require the m4_define() to be
manually unfolded.

The Debian control file is left untouched on purpose. The decision
whether to adjust the dependency is left to the respective maintainers.

Tested with autoconf 2.63 and 2.69.

Cc: Scott Mann <smann@noironetworks.com>
Cc: Don Kehn <dkehn@noironetworks.com>
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agotest-ovsdb: Workaround unicode bug in Python 2.4.x.
Gurucharan Shetty [Fri, 20 Jun 2014 20:13:51 +0000 (13:13 -0700)]
test-ovsdb: Workaround unicode bug in Python 2.4.x.

Run the following command on Xenserver:
PYTHONPATH=`pwd`/python/compat::`pwd`/python python ./tests/test-ovsdb.py \
parse-atoms '{"type": "string", "minLength": 2}'    \
'[""]'     '["a"]'     '["ab"]'     '["abc"]'     '["\ud834\udd1e"]'

And we get the following error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001d11e'
in position 23: ordinal not in range(128).

It looks like we are hitting the following bug:
http://bugs.python.org/issue2517

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-By: Ben Pfaff <blp@nicira.com>
9 years agovlan-splinters.at: Skip the test for Windows.
Gurucharan Shetty [Tue, 17 Jun 2014 20:24:37 +0000 (13:24 -0700)]
vlan-splinters.at: Skip the test for Windows.

vlan splinters is to workaround buggy network drivers of Linux.
Skip the test for Windows.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-ofctl: Ability to read a hex string from file.
Gurucharan Shetty [Wed, 18 Jun 2014 18:12:36 +0000 (11:12 -0700)]
ovs-ofctl: Ability to read a hex string from file.

The unit test, "OFPST_TABLE reply - OF1.2" in ofp-print.at
sends a very large hex string as an argument to 'ovs-ofctl ofp-print'.
The length of the hex string exceeds the maximum command line length
in Windows. With this commit, we can pass the same hex string by
placing it inside a file.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-vsctl.at: Workaround lack of 'kill -l' on Windows.
Gurucharan Shetty [Tue, 17 Jun 2014 19:35:32 +0000 (12:35 -0700)]
ovs-vsctl.at: Workaround lack of 'kill -l' on Windows.

Also, fflush(stderr) when we raise a signal. The test
this commit is changing would fail otherwise.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoutil: Set two digit exponents for scientific notation.
Gurucharan Shetty [Tue, 17 Jun 2014 18:16:06 +0000 (11:16 -0700)]
util: Set two digit exponents for scientific notation.

By default, three digit exponents are printed on Windows.
Many unit tests in Open vSwitch expect two digit exponents.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovsdb-tool: Workaround inability to replace existing file on Windows.
Gurucharan Shetty [Mon, 16 Jun 2014 22:49:06 +0000 (15:49 -0700)]
ovsdb-tool: Workaround inability to replace existing file on Windows.

rename() on an existing destination file fails on Windows. This commit
worksaround that problem.

There are two tests that test it. But both of them use the ovsdb-server's
--run option for the test and it does not exist in Windows. So change
the test to workaround the lack of that feature.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovsdb-server.at: Handle different error message for already opened database.
Gurucharan Shetty [Mon, 2 Jun 2014 17:49:18 +0000 (10:49 -0700)]
ovsdb-server.at: Handle different error message for already opened database.

Commit ebed9f78(ovsdb-server: Improve message for "add-db" of
database already open.) improved the error message seen when
opening an already opened database on Linux. For Windows,
we still need to look for the lockfile error message.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agointerface-reconfigure.at: Skip POSIX specfic tests for Windows.
Gurucharan Shetty [Mon, 16 Jun 2014 20:01:36 +0000 (13:01 -0700)]
interface-reconfigure.at: Skip POSIX specfic tests for Windows.

interface-reconfigure.at mostly tests functionality for Xenserver
and the tests use some POSIX only features. Skip them for Windows.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoofproto-dpif.at: Skip sflow tests in Windows.
Gurucharan Shetty [Mon, 16 Jun 2014 18:18:33 +0000 (11:18 -0700)]
ofproto-dpif.at: Skip sflow tests in Windows.

sflow feature needs to be investigated for Windows. Right now
test-sflow related tests do not pass because of LOOPBACK_INTERFACE
constraints for 'agent'. Add a TODO item and skip the tests.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agofile_name.at: Skip a symlink related test for Windows.
Gurucharan Shetty [Mon, 16 Jun 2014 18:08:14 +0000 (11:08 -0700)]
file_name.at: Skip a symlink related test for Windows.

There is no one-one mapping of symlinks between Linux and
Windows. This test currently fails on Windows and we do not
really need this functionality on Windows. So skip it.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-ofctl.at: Prevent msys from getting confused with ipv6 address.
Gurucharan Shetty [Mon, 9 Jun 2014 17:04:58 +0000 (10:04 -0700)]
ovs-ofctl.at: Prevent msys from getting confused with ipv6 address.

msys has a set of rules which triggers an automatic conversion of
arguments into something else to suit Windows requirements. Sometimes
this also causes unwanted conversions. Details of the rules is here:
http://www.mingw.org/wiki/Posix_path_conversion

msys converts ::1/::1 into ;1\;1. To prevent this, use fullform
ipv6 address of the form 0:0:0:0:0:0:0:1 instead.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agorconn: Don't warn when peer abruptly closes connection.
Gurucharan Shetty [Mon, 9 Jun 2014 15:47:12 +0000 (08:47 -0700)]
rconn: Don't warn when peer abruptly closes connection.

On Windows, when a peer terminates without calling a close
on socket fd, the server ends up printing "connection dropped"
warning messages. We probably don't want those warning messages
when the error is WSAECONNRESET.

(In OVS unit tests on Windows, anytime a client like ovs-ofctl
calls a ovs_fatal without clean close(fd) on the socket, the
server like ovs-vswitchd prints warnings that cause unit tests
to fail.)

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-vsctl.at: Adjust test output for Windows.
Gurucharan Shetty [Mon, 2 Jun 2014 21:23:44 +0000 (14:23 -0700)]
ovs-vsctl.at: Adjust test output for Windows.

"xargs echo" introduces "^M" character inbetween the
records on Windows. Workaround it.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>