cascardo/ovs.git
10 years agoRelease Open vSwitch 2.0.1. v2.0.1
Justin Pettit [Sat, 14 Dec 2013 00:16:19 +0000 (16:16 -0800)]
Release Open vSwitch 2.0.1.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoofproto-dpif: Ignore non-packet field masks during flow revalidation
Andy Zhou [Wed, 11 Dec 2013 07:32:51 +0000 (23:32 -0800)]
ofproto-dpif: Ignore non-packet field masks during flow revalidation

Commit bcd2633a5be(ofproto-dpif: Store relevant fields for wildcarding
in facet) implements flow revalidation by comparing the newly looked
up flow mask with that of the existing facet.

The non-packet fields, such as register masks, are always cleared
by xlate_actions in the masks stored within facets, but they are not
cleared in the newly looked up flow masks, causing otherwise valid
flows to be declared as invalid flows and be removed from the datapath.

This patch provides a fix. I was able to verify the fix on a system set
up by Ying Chen where the bug can be reproduced.

Bug #21680
Reported by: Ying Chen <yingchen@vmware.com>

Acked-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
10 years agodatapath: Fix compat skb_get_rxhash()
Pravin B Shelar [Thu, 5 Dec 2013 05:24:49 +0000 (21:24 -0800)]
datapath: Fix compat skb_get_rxhash()

Add missing return statement.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agodatapath: fix vport-netdev unregister
Alexei Starovoitov [Wed, 16 Oct 2013 22:28:23 +0000 (15:28 -0700)]
datapath: fix vport-netdev unregister

The combination of two commits:
commit 8e4e1713e4
("openvswitch: Simplify datapath locking.")
commit 2537b4dd0a
("openvswitch:: link upper device for port devices")

introduced a bug where upper_dev wasn't unlinked upon
netdev_unregister notification

The following steps:

  modprobe openvswitch
  ovs-dpctl add-dp test
  ip tuntap add dev tap1 mode tap
  ovs-dpctl add-if test tap1
  ip tuntap del dev tap1 mode tap

are causing multiple warnings:

[   62.747557] gre: GRE over IPv4 demultiplexor driver
[   62.749579] openvswitch: Open vSwitch switching datapath
[   62.755087] device test entered promiscuous mode
[   62.765911] device tap1 entered promiscuous mode
[   62.766033] IPv6: ADDRCONF(NETDEV_UP): tap1: link is not ready
[   62.769017] ------------[ cut here ]------------
[   62.769022] WARNING: CPU: 1 PID: 3267 at net/core/dev.c:5501 rollback_registered_many+0x20f/0x240()
[   62.769023] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
[   62.769051] CPU: 1 PID: 3267 Comm: ip Not tainted 3.12.0-rc3+ #60
[   62.769052] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   62.769053]  0000000000000009 ffff8807f25cbd28 ffffffff8175e575 0000000000000006
[   62.769055]  0000000000000000 ffff8807f25cbd68 ffffffff8105314c ffff8807f25cbd58
[   62.769057]  ffff8807f2634000 ffff8807f25cbdc8 ffff8807f25cbd88 ffff8807f25cbdc8
[   62.769059] Call Trace:
[   62.769062]  [<ffffffff8175e575>] dump_stack+0x55/0x76
[   62.769065]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
[   62.769067]  [<ffffffff8105319a>] warn_slowpath_null+0x1a/0x20
[   62.769069]  [<ffffffff8162a04f>] rollback_registered_many+0x20f/0x240
[   62.769071]  [<ffffffff8162a101>] rollback_registered+0x31/0x40
[   62.769073]  [<ffffffff8162a488>] unregister_netdevice_queue+0x58/0x90
[   62.769075]  [<ffffffff8154f900>] __tun_detach+0x140/0x340
[   62.769077]  [<ffffffff8154fb36>] tun_chr_close+0x36/0x60
[   62.769080]  [<ffffffff811bddaf>] __fput+0xff/0x260
[   62.769082]  [<ffffffff811bdf5e>] ____fput+0xe/0x10
[   62.769084]  [<ffffffff8107b515>] task_work_run+0xb5/0xe0
[   62.769087]  [<ffffffff810029b9>] do_notify_resume+0x59/0x80
[   62.769089]  [<ffffffff813a41fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[   62.769091]  [<ffffffff81770f5a>] int_signal+0x12/0x17
[   62.769093] ---[ end trace 838756c62e156ffb ]---
[   62.769481] ------------[ cut here ]------------
[   62.769485] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0()
[   62.769486] sysfs: can not remove 'master', no directory
[   62.769486] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
[   62.769514] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G        W    3.12.0-rc3+ #60
[   62.769515] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   62.769518] Workqueue: events ovs_dp_notify_wq [openvswitch]
[   62.769519]  0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006
[   62.769521]  ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b28
[   62.769523]  0000000000000000 ffffffff81a87a1f ffff8807f2634000 ffff880037038500
[   62.769525] Call Trace:
[   62.769528]  [<ffffffff8175e575>] dump_stack+0x55/0x76
[   62.769529]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
[   62.769531]  [<ffffffff81053236>] warn_slowpath_fmt+0x46/0x50
[   62.769533]  [<ffffffff8123e7e9>] sysfs_hash_and_remove+0xa9/0xb0
[   62.769535]  [<ffffffff81240e96>] sysfs_remove_link+0x26/0x30
[   62.769538]  [<ffffffff81631ef7>] __netdev_adjacent_dev_remove+0xf7/0x150
[   62.769540]  [<ffffffff81632037>] __netdev_adjacent_dev_unlink_lists+0x27/0x50
[   62.769542]  [<ffffffff8163213a>] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50
[   62.769544]  [<ffffffff8163218d>] netdev_upper_dev_unlink+0x3d/0x140
[   62.769548]  [<ffffffffa033c2db>] netdev_destroy+0x4b/0x80 [openvswitch]
[   62.769550]  [<ffffffffa033b696>] ovs_vport_del+0x46/0x60 [openvswitch]
[   62.769552]  [<ffffffffa0335314>] ovs_dp_detach_port+0x44/0x60 [openvswitch]
[   62.769555]  [<ffffffffa0336574>] ovs_dp_notify_wq+0xb4/0x150 [openvswitch]
[   62.769557]  [<ffffffff81075c28>] process_one_work+0x1d8/0x6a0
[   62.769559]  [<ffffffff81075bc8>] ? process_one_work+0x178/0x6a0
[   62.769562]  [<ffffffff8107659b>] worker_thread+0x11b/0x370
[   62.769564]  [<ffffffff81076480>] ? rescuer_thread+0x350/0x350
[   62.769566]  [<ffffffff8107f44a>] kthread+0xea/0xf0
[   62.769568]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
[   62.769570]  [<ffffffff81770bac>] ret_from_fork+0x7c/0xb0
[   62.769572]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
[   62.769573] ---[ end trace 838756c62e156ffc ]---
[   62.769574] ------------[ cut here ]------------
[   62.769576] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0()
[   62.769577] sysfs: can not remove 'upper_test', no directory
[   62.769577] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
[   62.769603] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G        W    3.12.0-rc3+ #60
[   62.769604] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   62.769606] Workqueue: events ovs_dp_notify_wq [openvswitch]
[   62.769607]  0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006
[   62.769609]  ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b58
[   62.769611]  0000000000000000 ffff880807ad3bd9 ffff8807f2634000 ffff880037038500
[   62.769613] Call Trace:
[   62.769615]  [<ffffffff8175e575>] dump_stack+0x55/0x76
[   62.769617]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
[   62.769619]  [<ffffffff81053236>] warn_slowpath_fmt+0x46/0x50
[   62.769621]  [<ffffffff8123e7e9>] sysfs_hash_and_remove+0xa9/0xb0
[   62.769622]  [<ffffffff81240e96>] sysfs_remove_link+0x26/0x30
[   62.769624]  [<ffffffff81631f22>] __netdev_adjacent_dev_remove+0x122/0x150
[   62.769627]  [<ffffffff81632037>] __netdev_adjacent_dev_unlink_lists+0x27/0x50
[   62.769629]  [<ffffffff8163213a>] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50
[   62.769631]  [<ffffffff8163218d>] netdev_upper_dev_unlink+0x3d/0x140
[   62.769633]  [<ffffffffa033c2db>] netdev_destroy+0x4b/0x80 [openvswitch]
[   62.769636]  [<ffffffffa033b696>] ovs_vport_del+0x46/0x60 [openvswitch]
[   62.769638]  [<ffffffffa0335314>] ovs_dp_detach_port+0x44/0x60 [openvswitch]
[   62.769640]  [<ffffffffa0336574>] ovs_dp_notify_wq+0xb4/0x150 [openvswitch]
[   62.769642]  [<ffffffff81075c28>] process_one_work+0x1d8/0x6a0
[   62.769644]  [<ffffffff81075bc8>] ? process_one_work+0x178/0x6a0
[   62.769646]  [<ffffffff8107659b>] worker_thread+0x11b/0x370
[   62.769648]  [<ffffffff81076480>] ? rescuer_thread+0x350/0x350
[   62.769650]  [<ffffffff8107f44a>] kthread+0xea/0xf0
[   62.769652]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
[   62.769654]  [<ffffffff81770bac>] ret_from_fork+0x7c/0xb0
[   62.769656]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
[   62.769657] ---[ end trace 838756c62e156ffd ]---
[   62.769724] device tap1 left promiscuous mode

This patch also affects moving devices between net namespaces.

OVS used to ignore netns move notifications which caused problems.
Like:
  ovs-dpctl add-if test tap1
  ip link set tap1 netns 3512
and then removing tap1 inside the namespace will cause hang on missing dev_put.

With this patch OVS will detach dev upon receiving netns move event.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Conflicts:
AUTHORS

10 years agovxlan: Optimize vxlan rcv
Pravin B Shelar [Fri, 11 Oct 2013 19:40:13 +0000 (12:40 -0700)]
vxlan: Optimize vxlan rcv

vxlan-udp-recv function lookup vxlan_sock struct on every packet
recv by using udp-port number. we can use sk->sk_user_data to
store vxlan_sock and avoid lookup.

This commit also allows us to get rid of socket hash table.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agoipfix: fix segfault for Flow_Sample_Collector_Set without ipfix
Romain Lenglet [Wed, 20 Nov 2013 18:57:52 +0000 (10:57 -0800)]
ipfix: fix segfault for Flow_Sample_Collector_Set without ipfix

Guard any access to an IPFIX row referenced from
Flow_Sample_Collector_Set by a test that the reference is not NULL.

Signed-off-by: Romain Lenglet <rlenglet@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agodebian: Avoid logrotate error if /var/run/openvswitch does not exist.
Alfredo Finelli [Tue, 12 Nov 2013 16:03:41 +0000 (08:03 -0800)]
debian: Avoid logrotate error if /var/run/openvswitch does not exist.

Signed-off-by: Alfredo Finelli <alf@computationes.de>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agoodp-util: Fix formatting of ipfix action cookies, to fix a testsuite failure
Ben Pfaff [Mon, 11 Nov 2013 21:32:23 +0000 (13:32 -0800)]
odp-util: Fix formatting of ipfix action cookies, to fix a testsuite failure

Problem introduced by previous commit 96ed775f19 (odp-util: Fix IPFIX
breakage with old kernel modules.)

Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agoodp-util: Fix IPFIX breakage with old kernel modules.
Ben Pfaff [Mon, 7 Oct 2013 21:26:28 +0000 (14:26 -0700)]
odp-util: Fix IPFIX breakage with old kernel modules.

Before commit e995e3df57ea (Allow OVS_USERSPACE_ATTR_USERDATA to be
variable length.) userdata attributes in userspace actions were expected
to be exactly 64 bits long.  The kernel only actually enforced that they
were at least 64 bits long (the previously referenced commit's log message
contains misinformation on this account).

Initially this was no problem, because all of the userdata that userspace
actually used was exactly 8 bytes long.  Commit 29089a540c (Implement IPFIX
export), however, exposed a problem by reducing the length of userdata for
IPFIX support to just 4 bytes.  This meant that IPFIX no longer worked on
older datapaths, because the userdata was no longer at least 8 bytes long.

This commit fixes the problem by padding out userdata attributes less than
8 bytes long to 8 bytes.

CC: Romain Lenglet <rlenglet@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Romain Lenglet <rlenglet at vmware.com>
10 years agonetlink: New function nl_msg_put_unspec_zero().
Ben Pfaff [Mon, 7 Oct 2013 21:11:40 +0000 (14:11 -0700)]
netlink: New function nl_msg_put_unspec_zero().

An upcoming commit adds a user.

Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agoovsdb: Do not leak error in ovsdb_server_compact()
Thomas Graf [Fri, 1 Nov 2013 16:44:52 +0000 (17:44 +0100)]
ovsdb: Do not leak error in ovsdb_server_compact()

Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agoodp-util: Parse SCTP correctly.
Gurucharan Shetty [Wed, 30 Oct 2013 07:44:32 +0000 (00:44 -0700)]
odp-util: Parse SCTP correctly.

We should be looking at 'src_flow' instead of 'flow'. Otherwise,
parsing SCTP through odp_flow_key_to_mask will fail.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agofedora: Add ovs-dpctl-top to the spec file.
Gurucharan Shetty [Tue, 29 Oct 2013 15:53:58 +0000 (08:53 -0700)]
fedora: Add ovs-dpctl-top to the spec file.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Kyle Mestery <kmestery@cisco.com>
10 years agoofproto: Send only one OFPT_FLOW_REMOVED message when a flow expires.
Ben Pfaff [Fri, 18 Oct 2013 23:32:40 +0000 (16:32 -0700)]
ofproto: Send only one OFPT_FLOW_REMOVED message when a flow expires.

Commit 15aaf59932a3 (ofproto: Add global locking around flow table
changes.) introduced doubled messages for expirations.

Reported-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
10 years agoovs-lib: Return the correct exit status of the command 'status'
Gurucharan Shetty [Mon, 7 Oct 2013 21:28:48 +0000 (14:28 -0700)]
ovs-lib: Return the correct exit status of the command 'status'

commit 46528f78e5c(debian, rhel, xenserver: Ability to collect ovs-ctl logs)
made changes in the startup scripts such that the o/p of ovs-ctl is logged
into ovs-ctl.log. But it had an unintended consequence that the exit status
of ovs-ctl was no longer returned. We would always return success(the exit
status of tee).

With this commit, we return the exit status of ovs-ctl instead of tee.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoAnnounce that ovs-vswitchd is now multi-threaded. v2.0
Justin Pettit [Tue, 15 Oct 2013 22:36:40 +0000 (15:36 -0700)]
Announce that ovs-vswitchd is now multi-threaded.

Might be worth mentioning the biggest change in 2.0.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoSet release date for 2.0.0.
Justin Pettit [Tue, 15 Oct 2013 22:04:20 +0000 (15:04 -0700)]
Set release date for 2.0.0.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agolib/meta-flow: Include util.h for ovs_assert().
Jarno Rajahalme [Tue, 15 Oct 2013 17:30:44 +0000 (10:30 -0700)]
lib/meta-flow: Include util.h for ovs_assert().

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agotests: fix failure when $PKIDIR contains uuid-like string
YAMAMOTO Takashi [Tue, 15 Oct 2013 14:38:06 +0000 (23:38 +0900)]
tests: fix failure when $PKIDIR contains uuid-like string

this fixes a test failure with my working directory:
    /disks/ea6a5743-ad5f-11e2-9410-08606e7f74e7/git/openvswitch

stop filtering uuid as it's unnecessary for this specific test case.

Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agolib/meta-flow: Enforce mf_fields array order.
Jarno Rajahalme [Tue, 15 Oct 2013 15:35:39 +0000 (08:35 -0700)]
lib/meta-flow: Enforce mf_fields array order.

The elements of the array must be in the enum order.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agotests: Make ovsdb-server add/remove remote test faster and more reliable.
Alex Wang [Tue, 15 Oct 2013 05:12:59 +0000 (22:12 -0700)]
tests: Make ovsdb-server add/remove remote test faster and more reliable.

Until now, the "ovsdb-server/add-remote and remove-remote with --monitor"
test killed ovsdb-server with SIGSEGV twice.  Each time, the "--monitor"
option caused the supervisor process to restart the child, but the second
time it incurred a 10-second delay intended to prevent the daemon from
wasting CPU time by restarting itself and dying again very quickly in a
loop.  This made the test take over 10 seconds to execute.  It also made
it occasionally fail because the OVS_WAIT_UNTIL check waits at most
approximately 10 seconds before it decides that the condition that it is
testing for will never occur.

This commit fixes the problem by breaking the test into two tests, each of
which kills ovsdb-server with SIGSEGV only once.

Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agorconn: Make thread-safe.
Ben Pfaff [Fri, 11 Oct 2013 17:08:32 +0000 (10:08 -0700)]
rconn: Make thread-safe.

This should make sending OFPT_FLOW_REMOVED and NXST_FLOW_MONITOR safe from
miss handler threads.

Bug #20271.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoconnmgr: Formalize 'ofproto_mutex' as protecting ofconn monitor data.
Ben Pfaff [Fri, 11 Oct 2013 06:11:09 +0000 (23:11 -0700)]
connmgr: Formalize 'ofproto_mutex' as protecting ofconn monitor data.

'ofproto_mutex' has effectively protected the monitor-related members of
struct ofconn since its introduction, but this was not written down or
systematically annotated.  This commit makes it more systematic and fixes
a few issues found using the annotations.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoconnmgr: Use 'ofproto_mutex' to protect ofconns from being destroyed.
Ben Pfaff [Fri, 11 Oct 2013 17:04:32 +0000 (10:04 -0700)]
connmgr: Use 'ofproto_mutex' to protect ofconns from being destroyed.

Code in the ofproto-dpif miss handler threads can currently access
ofconns, sending flow_removed and flow monitor messages due to NXAST_LEARN
actions.  Nothing currently protects those threads from accessing ofconns
that are in the process of being destroyed.  This commit adds protection
'ofproto_mutex', which NXAST_LEARN already takes.

Later patches will address other races that remain.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agorconn: Make rconn_packet_counter thread-safe.
Ben Pfaff [Fri, 13 Sep 2013 22:07:35 +0000 (15:07 -0700)]
rconn: Make rconn_packet_counter thread-safe.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto-dpif-xlate: Do initial rule lookup for callers.
Ethan Jackson [Wed, 9 Oct 2013 20:23:31 +0000 (13:23 -0700)]
ofproto-dpif-xlate: Do initial rule lookup for callers.

None of the functions available in ofproto-dpif.h are thread safe
unless holding the xlate_rwlock because one can't know that an ofproto
or ofport used as argument will survive during the function call.  For
this reason, ofproto-dpif-upcall's invocation of rule_dpif_lookup()
is unsafe because the ofproto could be destroyed during the call.

This patch fixes the problem by optionally doing the initial rule
lookup in xlate_actions() so that it can be done while holding the
xlate_rwlock.  This has the nice side benefit of removing a bunch of
boilerplate.

Note that this only partially solves the problem, there's still
vsp_realdev_to_vlandev() and ofproto_dpif_send_packet_in() which
aren't thread safe for the same reason.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
10 years agoovs-controller: Avoid dereferencing NULL pointer when the switch acts as a hub
ZhengLingyun [Tue, 8 Oct 2013 15:52:40 +0000 (23:52 +0800)]
ovs-controller: Avoid dereferencing NULL pointer when the switch acts as a hub

Starting ovs-controller with '-H' option will lead to a segment fault problem.
Add a check, and adjust the indentation of the following code.

Signed-off-by: ZhengLingyun <konghuarukhr@163.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agoovs-lib: Revert "Return the exit status of ovs-ctl in ovs_ctl()."
Gurucharan Shetty [Fri, 4 Oct 2013 21:52:34 +0000 (14:52 -0700)]
ovs-lib: Revert "Return the exit status of ovs-ctl in ovs_ctl()."

This reverts commit 0e37488562d1 which had a side-effect that
ssh executing start/restart command on a remote machine would
hang as one of the file descriptors created in that commit
was getting passed along to the daemons. The daemons weren't closing
it and hence ssh would just wait for them to close and hang.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoovs-lib: Return the exit status of ovs-ctl in ovs_ctl().
Duffie Cooley [Wed, 2 Oct 2013 14:07:43 +0000 (07:07 -0700)]
ovs-lib: Return the exit status of ovs-ctl in ovs_ctl().

commit 46528f78e5c(debian, rhel, xenserver: Ability to collect ovs-ctl logs)
made changes in the startup scripts such that the o/p of ovs-ctl is logged
into ovs-ctl.log. But it had an unintended consequence that the exit status
of ovs-ctl was no longer returned. We would always return success(the exit
status of tee).

With this commit, we return the exit status of ovs-ctl instead of tee.
Code referenced from: (line wrapped).
http://unix.stackexchange.com/questions/14270/\
get-exit-status-of-process-thats-piped-to-another/70675#70675)

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Signed-off-by: Duffie Cooley <dcooley@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agodatapath: Fix vxlan gso with vlan.
Pravin B Shelar [Tue, 1 Oct 2013 15:24:55 +0000 (08:24 -0700)]
datapath: Fix vxlan gso with vlan.

To use ovs-gso-compatibility we need to record inner skb offset.
In case of vxlan it is done before vlan header is pushed which
gives wrong inner packet to ovs-gso.
Following patch reset skb offsets after inner skb is completely built.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agorhel: fix the exit status of the openvswitch init script.
Duffie Cooley [Tue, 1 Oct 2013 02:27:07 +0000 (19:27 -0700)]
rhel: fix the exit status of the openvswitch init script.

This is a fix for a request to make sure that the openvswitch status command
in rhel based distros gives a useful exit status. That was fixed in

commit 5e0c05bc058c78a11be6747f62e6ad88e5d06b70
debian: Fix exit status of openvswitch-switch init script "status" command

Signed-off-by: Duffie Cooley <dcooley@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agohmap: Make bad hash functions easier to find.
Ben Pfaff [Tue, 24 Sep 2013 21:42:35 +0000 (14:42 -0700)]
hmap: Make bad hash functions easier to find.

The hmap code has for a long time incremented a counter when a hash bucket
grew to have many entries.  This can let a developer know that some hash
function is performing poorly, but doesn't give any hint as to which one.
This commit improves the situation by adding rate-limited debug logging
that points out a particular line of code as the source of the poor hash
behavior.  It should make issues easier to track down.

Bug #19926.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Lauded-by: Keith Amidon <keith@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
10 years agoofproto: Fix memory leak in rule_actions_unref().
Ben Pfaff [Tue, 24 Sep 2013 21:22:14 +0000 (14:22 -0700)]
ofproto: Fix memory leak in rule_actions_unref().

Found by valgrind.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
10 years agoofproto: Allow ofproto_delete_flow() to delete hidden rules.
Ben Pfaff [Tue, 24 Sep 2013 04:34:37 +0000 (21:34 -0700)]
ofproto: Allow ofproto_delete_flow() to delete hidden rules.

Commit 97f63e57a8 (ofproto: Remove soon-to-be-invalid optimizations.)
changed ofproto_delete_flow() to use the general-purpose flow_mod
implementation.  However, the general-purpose flow_mod never matches hidden
flows, which are exactly the flows that ofproto_delete_flow()'s callers
want to delete.

This commit fixes the problem by allowing flow_mods that specify a priority
that can only be for a hidden flow to delete a hidden flow.  (This doesn't
allow OpenFlow clients to meddle with hidden flows because OpenFlow uses
only a 16-bit field to specify priority.)

I verified that, without this commit, if I change from one controller to
another with "ovs-vsctl set-controller", then the in-band rules for the
old controller remain.  With this commit, the old rules are removed.

Reported-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Bug #19821.
Reported-by: Luca Giraudo <lgiraudo@vmware.com>
Bug #19888.
Reported-by: Ying Chen <yingchen@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
10 years agocfm: Don't enforce CFM_FAULT_INTERVAL.
Ethan Jackson [Fri, 20 Sep 2013 22:32:08 +0000 (15:32 -0700)]
cfm: Don't enforce CFM_FAULT_INTERVAL.

While upgrading a deployment, it's possible that transient
configuration changes could cause the cfm interval on two ends of a
link to be different.  If these two configured values are close to
each other, this condition could have no impact on traffic.  Therefore
it's better to let this slide than force a tunnel down guaranteeing an
impact

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agocfm: Prevent interval fault when demand mode is enabled on one end.
alex wang [Fri, 20 Sep 2013 06:13:33 +0000 (06:13 +0000)]
cfm: Prevent interval fault when demand mode is enabled on one end.

This commit prevents cfm from raising 'interval' fault when demand
mode is only enabled on one end of link.

Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agodatapath: Remove net_device_ops compatibility code.
Jesse Gross [Fri, 13 Sep 2013 21:16:35 +0000 (14:16 -0700)]
datapath: Remove net_device_ops compatibility code.

The symbol HAVE_NET_DEVICE_OPS was changed in 3.1 but code the that
it was protecting dates back to before 2.6.32. Therefore, we can
just assume that net_device_ops exists in all cases and drop the
compat code.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
10 years agoutilities: a top like tool for ovs-dpctl dump-flows.
Mark Hamilton [Fri, 13 Sep 2013 22:50:48 +0000 (15:50 -0700)]
utilities: a top like tool for ovs-dpctl dump-flows.

This python script summarizes ovs-dpctl dump-flows content by aggregating
the number of packets, total bytes and occurrence of the following fields:
  - Datapath in_port
  - Ethernet type
  - Source and destination MAC addresses
  - IP protocol
  - Source and destination IPv4 addresses
  - Source and destination IPv6 addresses
  - UDP and TCP destination port
  - Tunnel source and destination addresses

Testing included confirming both mega-flows and non-megaflows are
properly parsed. Bit masks are applied in the case of mega-flows
prior to aggregation.  Test --script parameter which runs in
non-interactive mode. Tested syntax against python 2.4.3, 2.6 and 2.7.
Confirmed script passes pep8 and pylint run as:

pylint --disable=I0011 --include-id=y --reports=n

This tool has been added to these distribution:
  - add ovs-dpctl-top to debian distribution
  - add ovs-dpctl-top to rpm distribution.
  - add ovs-dpctl-top to XenServer RPM.

Signed-off-by: Mark Hamilton <mhamilton@nicira.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
10 years agovlog: Fix formatting of milliseconds in Python log messages.
Ben Pfaff [Mon, 16 Sep 2013 22:15:01 +0000 (15:15 -0700)]
vlog: Fix formatting of milliseconds in Python log messages.

Commit 2b31d8e713de7 (vlog: Report timestamps in millisecond resolution in
log messages.) introduced milliseconds to log messages by default, but the
Python version did not ensure that milliseconds were always formatted with
3 digits, so 3.001 was formatted as "3.1" and 3.012 as "3.12", and so on.
This commit fixes the problem.

CC: Paul Ingram <paul@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agoovsdb: write commit timestamps to millisecond resolution.
Paul Ingram [Sat, 14 Sep 2013 01:52:54 +0000 (18:52 -0700)]
ovsdb: write commit timestamps to millisecond resolution.

This is expected to make system debugging easier.

This raises two compatibility issues:
1. When a new ovsdb-tool reads an old database, it will multiply by 1000 any
  timestamp it reads which is less than 1<<31. Since this date corresponds to
  Jan 16 1970 this is unlikely to cause a problem.
2. When an old ovsdb-tool reads a new database, it will interpret the
  millisecond timestamps as seconds and report dates in the far future; the
  time of this commit is reported as the year 45672 (each second since the
  epoch is interpreted as 16 minutes).

Signed-off-by: Paul Ingram <pingram@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agoovsdb: Use DB load time, not on-disk commit times, for compaction.
Paul Ingram [Sat, 14 Sep 2013 01:52:53 +0000 (18:52 -0700)]
ovsdb: Use DB load time, not on-disk commit times, for compaction.

The ovsdb-server compaction timing logic is written assuming monotonic
time at milliscond resolution but it calculated the next compaction time
based on the oldest commit in the database. This was a problem because
commit timestamps are written in wall-clock time to second resolution.

This commit calculates the next compaction time based on the time when
the database was first loaded or the last compaction was done, both in
monotonic time at millisecond resolution.

Signed-off-by: Paul Ingram <pingram@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agovlog: Report timestamps in millisecond resolution in log messages.
Paul Ingram [Fri, 13 Sep 2013 01:19:04 +0000 (18:19 -0700)]
vlog: Report timestamps in millisecond resolution in log messages.

To make debugging easier.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Paul Ingram <pingram@nicira.com>
10 years agoofproto-dpif: Fix use-after-free error deleting last bridge.
Ben Pfaff [Mon, 16 Sep 2013 21:53:27 +0000 (14:53 -0700)]
ofproto-dpif: Fix use-after-free error deleting last bridge.

valgrind reported:

    Invalid read of size 4
       at 0x806ADC1: odp_port_to_ofport (hmap.h:267)
       by 0x8077C05: xlate_receive (ofproto-dpif-xlate.c:523)
       by 0x8073994: handle_miss_upcalls (ofproto-dpif-upcall.c:642)
       by 0x80741AA: udpif_miss_handler (ofproto-dpif-upcall.c:412)
       by 0x56FCC38: start_thread (pthread_create.c:304)
       by 0x735378D: clone (clone.S:130)
     Address 0x786c084 is 4 bytes inside a block of size 16 free'd
       at 0x4D8350C: free (vg_replace_malloc.c:427)
       by 0x8065EDA: close_dpif_backer (ofproto-dpif.c:1094)

The problem is that close_dpif_backer() destroys odp_to_ofport_map and the
associated mutex before it calls udpif_destroy() to stop the forwarding
threads.  This gives the forwarding threads a window in which to try to
use odp_to_ofport_map.

This commit moves the udpif_destroy() call much earlier, solving the
problem.  (The call to udpif_destroy() must follow the call to
drop_key_clear() because drop_key_clear() uses the udpif.)

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
10 years agotests: Make ovsdb-server add-db/remove-db test faster and more reliable.
Ben Pfaff [Mon, 16 Sep 2013 22:03:55 +0000 (15:03 -0700)]
tests: Make ovsdb-server add-db/remove-db test faster and more reliable.

Until now, the "ovsdb-server/add-db and remove-db with --monitor" test
killed ovsdb-server with SIGSEGV twice.  Each time, the "--monitor" option
caused the supervisor process to restart the child, but the second time it
incurred a 10-second delay intended to prevent the daemon from wasting CPU
time by restarting itself and dying again very quickly in a loop.  This
made the test take over 10 seconds to execute.  It also made it
occasionally fail because the OVS_WAIT_UNTIL check waits at most
approximately 10 seconds before it decides that the condition that it is
testing for will never occur.

This commit fixes the problem by breaking the test into two tests, each of
which kills ovsdb-server with SIGSEGV only once.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoFAQ: Fix version number for 2.0.
Jesse Gross [Fri, 13 Sep 2013 21:40:39 +0000 (14:40 -0700)]
FAQ: Fix version number for 2.0.

Presumably we will have minor version releases in the 2.x series
as well.

Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agoofproto-dpif: Move "learn" actions into individual threads.
Ben Pfaff [Fri, 13 Sep 2013 03:45:31 +0000 (20:45 -0700)]
ofproto-dpif: Move "learn" actions into individual threads.

Before OVS introduced threading, any given ``learn'' action took effect in
the flow table before the next incoming flow was set up.  This meant that
if a packet came in, used ``learn'' to set up a flow to handle replies, and
then a reply came in, the reply would always hit the flow set up by the
``learn'' action.  Until now, with the threading implementation, though,
the effects of ``learn'' actions happen asynchronously via a queue, which
means that the reply can arrive before the flow to handle it is set up.
This introduced an unacceptable regression in important use cases.

This commit fixes the problem by switching back to executing learn actions
before forwarding the packet that triggered it.

I imagine that there is considerable opportunity for optimization here.

Bug #19147.
Bug #19244.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Remove redundant cls parameter from a few functions.
Ben Pfaff [Tue, 10 Sep 2013 05:36:19 +0000 (22:36 -0700)]
ofproto: Remove redundant cls parameter from a few functions.

Previously this parameter was useful for Clang locking annotations
but it isn't actually a locking requirement anymore, so remove
the parameter.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Add global locking around flow table changes.
Ben Pfaff [Fri, 13 Sep 2013 03:45:15 +0000 (20:45 -0700)]
ofproto: Add global locking around flow table changes.

This makes 'ofproto_mutex' protect the flow table well enough that threads
other than the main one can realistically modify flows.

I need to look at the interface between ofproto and connmgr: I think that
there might need to be some locking there too.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Refactor eviction cases to use common code.
Ben Pfaff [Fri, 13 Sep 2013 03:30:56 +0000 (20:30 -0700)]
ofproto: Refactor eviction cases to use common code.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: New helper any_pending_ops().
Ben Pfaff [Wed, 11 Sep 2013 04:44:53 +0000 (21:44 -0700)]
ofproto: New helper any_pending_ops().

This function is trivial now but in an upcoming commit it will need to
become slightly more complicated, which makes writing it as a function
worthwhile.

Until then, this commit simplifies the logic, which was redundant since
the 'deletions' hmap always points into the 'pending' list anyway.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Mark immutable members of struct rule 'const'.
Ben Pfaff [Wed, 11 Sep 2013 04:31:59 +0000 (21:31 -0700)]
ofproto: Mark immutable members of struct rule 'const'.

One difficult part of make flow_mods thread-safe is sorting out which
members of each structure can be modified under which locks and,
especially, documenting this in a way that makes it hard for programmers to
get it wrong later.  The compiler provides some tools for us, however, and
'const' is one of the nicer ones since it is standardized rather than part
of a compiler extension.

This commit makes use of 'const' to mark the immutable members of struct
rule, which is definitely the most confusing structure regarding thread
safety simply because it has so many members that use different forms of
synchronization.  It also adds a bunch of CONST_CASTs to allow these
members to be initialized and destroyed where necessary.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Make some functions for rules private to ofproto.c.
Ben Pfaff [Wed, 11 Sep 2013 04:01:34 +0000 (21:01 -0700)]
ofproto: Make some functions for rules private to ofproto.c.

These aren't used outside this file.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Remove ->timeout_mutex from struct rule (just use ->mutex).
Ben Pfaff [Wed, 11 Sep 2013 04:18:09 +0000 (21:18 -0700)]
ofproto: Remove ->timeout_mutex from struct rule (just use ->mutex).

I think that ->timeout_mutex and ->mutex were separate because the latter
(which was actually a rwlock named ->rwlock until recently) was held for
long periods of time, which meant that having a separate ->timeout_mutex
reduced lock contention.  But ->mutex is now only held briefly, so it seems
reasonable to just use it.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Replace rwlock in struct rule by a mutex.
Ben Pfaff [Thu, 12 Sep 2013 07:28:49 +0000 (00:28 -0700)]
ofproto: Replace rwlock in struct rule by a mutex.

A rwlock is suitable when one expects long hold times so there is a need
for some parallelism for readers.  But when the lock is held briefly, it
makes more sense to use a mutex for two reasons.  First, a rwlock is more
complex than a mutex so one would expect it to be more expensive to
acquire.  Second, a rwlock is less fair than a mutex: as long as there are
any readers this blocks out writers.

At least, that's the behavior I intuitively expect, and a few looks around
the web suggest that I'm not the only one.

Previously, struct rule's rwlock was held for long periods, so using a
rwlock made sense.  Now it is held only briefly, so this commit replaces it
by a mutex.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Protect index by cookie with ofproto_mutex.
Ben Pfaff [Mon, 9 Sep 2013 21:26:23 +0000 (14:26 -0700)]
ofproto: Protect index by cookie with ofproto_mutex.

The cookie index is only used for flow_mods, so it makes sense to use this
global mutex.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Drop 'expirable_mutex' in favor of new global 'ofproto_mutex'.
Ben Pfaff [Thu, 12 Sep 2013 07:31:33 +0000 (00:31 -0700)]
ofproto: Drop 'expirable_mutex' in favor of new global 'ofproto_mutex'.

This is the first step toward making a global lock that protects everything
needed for updating the flow table.  This commit starts out by merging one
lock into the new one, and later commits will continue that process.

The mutex is initially a recursive one, because I wasn't sure that there
were no nested acquisitions at this stage, but a later commit will change
it to a regular error-checking mutex once it's settled down a bit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Move rule_execute() out of ofopgroup_complete().
Ben Pfaff [Thu, 12 Sep 2013 07:25:28 +0000 (00:25 -0700)]
ofproto: Move rule_execute() out of ofopgroup_complete().

One goal we're moving toward is to be able to execute "learn" actions
in each thread that handles packets that arrive on an interface just before
we forward those packets.  The planned strategy there is to have a global
lock that protects everything required to modify the flow table.  Generally
this works out well, but rule_execute() is a trouble spot.  That's because
it's called during a flow table modification (when that global lock is
held) which means that a "learn" action triggered by the executing the
packet would try to recursively modify the flow table and reacquire the
global lock.

I can see two ways out of this issue.  One would be to make the global lock
a recursive one, or otherwise deal with handling recursive flow_mods.  The
other is to just queue up flow_mods due to rule_execute(), which itself is
a corner case (it only happens when a flow_mod sent via OpenFlow includes
a buffer_id).  (I guess there could be other more radical solutions, like
just dropping packets that contain "learn" actions if they occur in this
situation.)

This commit implements the second solution because it seems less likely to
be wrong in a way that crashes the switch.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoclassifier: Allow CLS_CURSOR_FOR_EACH to use a const-qualified iterator.
Ben Pfaff [Tue, 10 Sep 2013 05:13:09 +0000 (22:13 -0700)]
classifier: Allow CLS_CURSOR_FOR_EACH to use a const-qualified iterator.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoguarded-list: New data structure for thread-safe queue.
Ben Pfaff [Fri, 13 Sep 2013 00:42:23 +0000 (17:42 -0700)]
guarded-list: New data structure for thread-safe queue.

We already had queues that were suitable for replacement by this data
structure, and I intend to add another one later on.

flow_miss_batch_ofproto_destroyed() did not work well with the guarded-list
structure (it required either adding a lot more functions or breaking the
abstraction) so I changed the caller to just use udpif_revalidate().

Checking reval_seq at the end of handle_miss_upcalls() also didn't work
well with the abstraction, so I decided that since this was a corner case
anyway it would be acceptable to just drop those in flow_miss_batch_next().

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Add a ref_count to "struct rule" to protect it from being freed.
Ben Pfaff [Thu, 12 Sep 2013 06:23:00 +0000 (23:23 -0700)]
ofproto: Add a ref_count to "struct rule" to protect it from being freed.

Taking a read-lock on the 'rwlock' member of struct rule prevents members
of the rule from changing.  This is a short-term use of the 'rwlock': one
takes the lock, reads some members, and releases the lock.

Taking a read-lock on the 'rwlock' also prevents the rule from being freed.
This is often a relatively long-term need.  For example, until now flow
translation has held the rwlock in xlate_table_action() across the entire
recursive translation, which can call into a great deal of different code
across multiple files.

This commit switches to using a reference count for this second purpose
of struct rule's rwlock.  This means that all the code that previously
held a read-lock on the rwlock across deep stacks of functions can now
simply keep a reference.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Break actions out of rule into new rule_actions structure.
Ben Pfaff [Mon, 9 Sep 2013 20:05:52 +0000 (13:05 -0700)]
ofproto: Break actions out of rule into new rule_actions structure.

This permits code to ensure long-term access to a rule's actions
without holding a long-term lock on the rule's rwlock.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Remove soon-to-be-invalid optimizations.
Ben Pfaff [Fri, 13 Sep 2013 04:06:46 +0000 (21:06 -0700)]
ofproto: Remove soon-to-be-invalid optimizations.

Until now, ofproto_add_flow() and ofproto_delete_flow() assumed that the
flow table could not change between its flow table check and its later
modification to the flow table.  This assumption will soon be untrue, so
remove it.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Move function find_meter() into ofpacts as ofpacts_get_meter().
Ben Pfaff [Sat, 7 Sep 2013 02:52:14 +0000 (19:52 -0700)]
ofproto: Move function find_meter() into ofpacts as ofpacts_get_meter().

ofproto is too big anyway so we might as well move out code that can
reasonably live elsewhere.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Merge ofproto_rule_delete() and ofproto_delete_rule().
Ben Pfaff [Fri, 30 Aug 2013 22:27:33 +0000 (15:27 -0700)]
ofproto: Merge ofproto_rule_delete() and ofproto_delete_rule().

These functions were identical but had different names (one just called
the other).  Make them a single function.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto-dpif: Remove vestigial "clogged" feature.
Ben Pfaff [Sun, 1 Sep 2013 21:46:04 +0000 (14:46 -0700)]
ofproto-dpif: Remove vestigial "clogged" feature.

Before this commit, ofproto-dpif did not ordinarily ever defer
completion of a flow_mod, but it was possible to use "ovs-appctl
ofproto/clog" to force deferring completion, which allowed some basic
debugging of deferred completion.  I only ever used that feature once,
while doing initial debugging of the deferred completion feature.  I
intend in future commits to make changes that will make deferred
completion harder to implement in ofproto-dpif, so this commit removes
that feature in advance.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Avoid gratuitous memory allocation and free.
Ben Pfaff [Sat, 7 Sep 2013 02:23:07 +0000 (19:23 -0700)]
ofproto: Avoid gratuitous memory allocation and free.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Correct comments.
Ben Pfaff [Fri, 30 Aug 2013 22:20:08 +0000 (15:20 -0700)]
ofproto: Correct comments.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Eliminate 'ofproto_node' member from struct rule.
Ben Pfaff [Wed, 11 Sep 2013 23:41:32 +0000 (16:41 -0700)]
ofproto: Eliminate 'ofproto_node' member from struct rule.

The ofproto_node member is convenient for collecting lists of rules, but
it is also challenging for concurrency because only a single thread at a
time can put a given rule on a list.  This commit eliminates the
ofproto_node member and introduces a new 'struct rule_collection' that
can be use in a thread-safe manner.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Reduce number of "collect" functions taking lots of parameters.
Ben Pfaff [Wed, 11 Sep 2013 22:35:44 +0000 (15:35 -0700)]
ofproto: Reduce number of "collect" functions taking lots of parameters.

The long lists of parameters for all these "collect" functions was starting
to get to me.  This reduces the number of such functions to one.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Factor code out of collect_rules_{loose,strict} into new helper.
Ben Pfaff [Fri, 6 Sep 2013 05:37:41 +0000 (22:37 -0700)]
ofproto: Factor code out of collect_rules_{loose,strict} into new helper.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agoofproto: Use proper error code when meter_id is out of range.
Jarno Rajahalme [Sun, 8 Sep 2013 01:27:08 +0000 (18:27 -0700)]
ofproto: Use proper error code when meter_id is out of range.

Use OUT_OF_METERS when given meter_id is greater than what is supported
by the datapath.  Retain the INVALID_METER error code for the meter_ids
outside of the range supported by the specification.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agoDelete meters in the right place.
Ben Pfaff [Fri, 13 Sep 2013 04:18:01 +0000 (21:18 -0700)]
Delete meters in the right place.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agolist: New function list_move().
Ben Pfaff [Thu, 29 Aug 2013 20:06:21 +0000 (13:06 -0700)]
list: New function list_move().

An upcoming commit will add more users.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
10 years agoofproto: Avoid removing rules from meter lists twice.
Ben Pfaff [Thu, 29 Aug 2013 23:49:51 +0000 (16:49 -0700)]
ofproto: Avoid removing rules from meter lists twice.

Since the meter code was introduced, the oftable_remove_rule() function
(recently renamed oftable_remove_rule__()) has had two blocks of code to
remove a rule from its meter's list of rules that reference that meter.
This commit reduces that to one.

Also modifies oftable_remove_rule__() to maintain the invariant that
'rule' is in a meter's list if and only if
!list_is_empty(&rule->meter_list_node).  I don't believe that that fixes
a real bug, but it seems like a good idea.

(This seemed like a serious bug at first, but in fact list_remove() is
idempotent: calling it two times in a row has the same effect as calling
it once.)

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
10 years agovswitchd: Clear bfd_status column when bfd is disabled or not supported
Alex Wang [Thu, 12 Sep 2013 19:51:53 +0000 (12:51 -0700)]
vswitchd: Clear bfd_status column when bfd is disabled or not supported

This commit makes vswitchd clear the 'bfd_status' column
in ovsdb when bfd is disabled or not supported.

Reported-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agobridge: Always call smap_destroy() after smap_init()
Alex Wang [Thu, 12 Sep 2013 18:54:16 +0000 (11:54 -0700)]
bridge: Always call smap_destroy() after smap_init()

This commit fixes a place in bridge.c where smap_destroy() is not
always called after smap_init().  Though there is no memory leak
now, it is necessary to fix it and prevent memory leak in the
future when smap_init() may be modified to allocate dynamic memory.

Reported-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agoofproto: Reduce default number of miss handlers.
Ethan Jackson [Tue, 10 Sep 2013 22:36:21 +0000 (15:36 -0700)]
ofproto: Reduce default number of miss handlers.

The default number of miss handlers should be the number of processors
minus two to account for the dispatcher and main threads.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoofproto-dpif: Don't hold mac rwlock while calling send_packet().
Ethan Jackson [Fri, 6 Sep 2013 19:51:02 +0000 (12:51 -0700)]
ofproto-dpif: Don't hold mac rwlock while calling send_packet().

Holding the mac learning rwlock while calling send_packet() which
calls xlate_actions() is dangerous because somewhere deep in the chain
the same lock might be acquired causing a deadlock.  Though we haven't
run into this problem in testing, it seems prudent to refactor it.

Reported-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoofproto: Fix core dump due to uninitialized cls_rule.
Ethan Jackson [Tue, 10 Sep 2013 20:29:30 +0000 (13:29 -0700)]
ofproto: Fix core dump due to uninitialized cls_rule.

Bug #19568.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoofproto: Do not call ->rule_destruct() if ->rule_construct() failed.
Ben Pfaff [Fri, 30 Aug 2013 20:59:37 +0000 (13:59 -0700)]
ofproto: Do not call ->rule_destruct() if ->rule_construct() failed.

Found by inspection.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
10 years agodatapath: flow: fix potential illegal memory access in __parse_flow_nlattrs
Daniel Borkmann [Mon, 9 Sep 2013 20:27:27 +0000 (13:27 -0700)]
datapath: flow: fix potential illegal memory access in __parse_flow_nlattrs

In function __parse_flow_nlattrs(), we check for condition
(type > OVS_KEY_ATTR_MAX) and if true, print an error, but we do
not return from this function as in other checks. It seems this
has been forgotten, as otherwise, we could access beyond the
memory of ovs_key_lens, which is of ovs_key_lens[OVS_KEY_ATTR_MAX + 1].
Hence, a maliciously prepared nla_type from user space could access
beyond this upper limit.

Introduced by 03f0d916a ("openvswitch: Mega flow implementation").

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agoutil: Don't set thread name to empty
Guolin Yang [Mon, 9 Sep 2013 16:38:01 +0000 (09:38 -0700)]
util: Don't set thread name to empty

In monitor_daemon(), it set subprogram_name to "" which causes system crash
in some platform when trying to set the thread name to "".  This change set
thread name to program name in this case.

Signed-off-by: Guolin Yang <gyang@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agodatapath: Remove compat files.
Pravin B Shelar [Mon, 26 Aug 2013 18:19:10 +0000 (11:19 -0700)]
datapath: Remove compat files.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agodatapath: Remove reciprocal_div compat code.
Pravin B Shelar [Fri, 30 Aug 2013 17:42:53 +0000 (10:42 -0700)]
datapath: Remove reciprocal_div compat code.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agodatapath: Remove compat support for NLA_NUL_STRING
Pravin B Shelar [Wed, 28 Aug 2013 18:24:00 +0000 (11:24 -0700)]
datapath: Remove compat support for NLA_NUL_STRING

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agodatapath: Cleanup compat support.
Pravin B Shelar [Tue, 27 Aug 2013 06:43:29 +0000 (23:43 -0700)]
datapath: Cleanup compat support.

cleanup various header file.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agodatapath: Cleanup netlink compat code.
Pravin B Shelar [Tue, 27 Aug 2013 06:53:17 +0000 (23:53 -0700)]
datapath: Cleanup netlink compat code.

Patch removes genl, netlink, rtnl compat code and dpif-linux
fallback-id compat code.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agodatapath: Remove vlan compat support
Pravin B Shelar [Wed, 4 Sep 2013 18:35:13 +0000 (11:35 -0700)]
datapath: Remove vlan compat support

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agodatapath: Remove checksum compat support
Pravin B Shelar [Mon, 26 Aug 2013 18:18:07 +0000 (11:18 -0700)]
datapath: Remove checksum compat support

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agodatapath: Remove skb->mark compat code.
Pravin B Shelar [Mon, 26 Aug 2013 20:27:02 +0000 (13:27 -0700)]
datapath: Remove skb->mark compat code.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agodatapath: Remove namespace compat support.
Pravin B Shelar [Mon, 26 Aug 2013 18:30:41 +0000 (11:30 -0700)]
datapath: Remove namespace compat support.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agodatapath: vport: Remove compat support
Pravin B Shelar [Tue, 27 Aug 2013 20:24:50 +0000 (13:24 -0700)]
datapath: vport: Remove compat support

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agodatapath: Drop support for RHEL5 build
Pravin B Shelar [Tue, 3 Sep 2013 17:55:18 +0000 (10:55 -0700)]
datapath: Drop support for RHEL5 build

RHEL5 is based on kernel 2.6.18.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agodatapath: Move kernel version check to configure.
Pravin B Shelar [Tue, 3 Sep 2013 18:02:40 +0000 (11:02 -0700)]
datapath: Move kernel version check to configure.

Rather than having compile time check in datapath.c, its better
to check kernel version at configuration step.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agodatapath: Drop support for linux pre-2.6.32 kernel.
Pravin B Shelar [Fri, 6 Sep 2013 16:14:40 +0000 (09:14 -0700)]
datapath: Drop support for linux pre-2.6.32 kernel.

This makes datapath module much close to upstream datapath and
make code easy to understand.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
10 years agodatapath: Fix alignment of struct sw_flow_key.
Jesse Gross [Thu, 5 Sep 2013 20:01:58 +0000 (13:01 -0700)]
datapath: Fix alignment of struct sw_flow_key.

sw_flow_key alignment was declared as " __aligned(__alignof__(long))".
However, this breaks on the m68k architecture where long is 32 bit in
size but 16 bit aligned by default. This aligns to the size of a long to
ensure that we can always do comparsions in full long-sized chunks. It
also adds an additional build check to catch any reduction in alignment.

CC: Andy Zhou <azhou@nicira.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agoovs-ofctl: Correct formatting of instructions in manpage.
Simon Horman [Mon, 2 Sep 2013 02:23:16 +0000 (11:23 +0900)]
ovs-ofctl: Correct formatting of instructions in manpage.

Adjust formatting in ovs-ofctl manpage so that apply_actions, clear_actions
write_metadata and goto_table appear at the same level of indentation as
actions rather being indented as if they are arguments to the learn action.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
10 years agoofproto: Rename struct rule's evict lock and use it more widely.
Ethan Jackson [Wed, 4 Sep 2013 00:23:44 +0000 (17:23 -0700)]
ofproto: Rename struct rule's evict lock and use it more widely.

There are a few fields in struct rule which are accessible by
functions in ofproto-dpif and therefore need to accessed in a thread
safe manner.  This patch achieves this by generalizing the rules evict
rwlock and requiring a writelock to be held to edit them.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoofproto-dpif: Don't manage eviction groups from threads.
Ethan Jackson [Tue, 3 Sep 2013 22:37:14 +0000 (15:37 -0700)]
ofproto-dpif: Don't manage eviction groups from threads.

Managing eviction groups from threads was going to be difficult to do
in a performant thread-safe manner, so this patch punts the problem to
the main thread.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
10 years agoofproto-dpif: Hide struct rule_dpif internally.
Ethan Jackson [Tue, 27 Aug 2013 20:17:11 +0000 (13:17 -0700)]
ofproto-dpif: Hide struct rule_dpif internally.

By hiding struct rule_dpif inside ofproto-dpif, it becomes very clear
which attributes are accessed by multiple threads and need to be
protected by locks.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>