ovn: Fix localnet ports deletion and recreation sometimes after restart.
authorRamu Ramamurthy <ramu.ramamurthy@gmail.com>
Fri, 29 Apr 2016 00:23:59 +0000 (20:23 -0400)
committerBen Pfaff <blp@ovn.org>
Tue, 17 May 2016 00:05:41 +0000 (17:05 -0700)
On graceful restart of ovn-controller, the chassis row is inserted in the
Chassis table. During this transaction, there is a window of time where an
idl row-read may not return the newly created row - even though the row
should exist, but the transaction is in an incomplete state.  As a result,
get_chassis() in binding_run() returns a null chassis record binding_run
exits early, and does not create local_datapaths, and patch_run deletes
localnet patch ports. In a later run, the localnet patch ports are
recreated.

This is reproducable consistently but not on every restart.  The fix is to
handle the case that the chassis record may be null in binding_run, and yet
create local_datapaths.

Restart logs follow with commentary:

2016-04-28T18:35:42.448Z|00001|vlog|INFO|opened log file /home/ovs/ovs/tests/testsuite.dir/2035/hv/ovn-controller.log
2016-04-28T18:35:42.449Z|00002|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/db.sock: connecting...
2016-04-28T18:35:42.449Z|00003|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/db.sock: connected
2016-04-28T18:35:42.452Z|00004|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/ovn-sb/ovn-sb.sock: connecting...
2016-04-28T18:35:42.452Z|00005|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/ovn-sb/ovn-sb.sock: connected
2016-04-28T18:35:42.454Z|00006|ovsdb_idl|INFO|ovsdb_idl_txn_insert:
                Chassis row inserted into transaction above
2016-04-28T18:35:42.454Z|00007|binding|INFO|Claiming lport localvif2 for this chassis.
2016-04-28T18:35:42.454Z|00008|binding|INFO|Claiming lport localvif3 for this chassis.
2016-04-28T18:35:42.454Z|00009|binding|INFO|Claiming lport localcif4 for this chassis.
2016-04-28T18:35:42.454Z|00010|binding|INFO|Claiming lport localcif5 for this chassis.
2016-04-28T18:35:42.454Z|00011|binding|INFO|Claiming lport localcif1 for this chassis.
2016-04-28T18:35:42.454Z|00012|binding|INFO|Claiming lport localvif1 for this chassis.
2016-04-28T18:35:42.454Z|00013|binding|INFO|Claiming lport localvif201 for this chassis.
2016-04-28T18:35:42.454Z|00014|binding|INFO|Claiming lport localcif3 for this chassis.
2016-04-28T18:35:42.454Z|00015|binding|INFO|Claiming lport localcif2 for this chassis.
               Binding run found the chassis record and has claimed the vifs
2016-04-28T18:35:42.455Z|00016|ofctrl|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting to switch
2016-04-28T18:35:42.455Z|00017|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting...
2016-04-28T18:35:42.455Z|00018|pinctrl|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting to switch
2016-04-28T18:35:42.456Z|00019|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting...
2016-04-28T18:35:42.457Z|00020|ovsdb_idl|INFO|ovsdb_idl_row_clear_new:
                At this point read of Chassis table returns no rows, and
                the transaction status is still incomplete.
2016-04-28T18:35:42.457Z|00021|binding|INFO|no chassis rec!
                Binding run exits early because chassis_rec was null
2016-04-28T18:35:42.459Z|00022|patch|INFO|removing port patch-br-int-to-localnet201
2016-04-28T18:35:42.459Z|00023|patch|INFO|removing port patch-br-int-to-localnet1
2016-04-28T18:35:42.459Z|00024|patch|INFO|removing port patch-localnet1-to-br-int
2016-04-28T18:35:42.459Z|00025|patch|INFO|removing port patch-localnet201-to-br-int
               Localnet ports are removed above, because local_datapaths dont exist
2016-04-28T18:35:42.459Z|00026|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connected
2016-04-28T18:35:42.460Z|00027|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connected
2016-04-28T18:35:42.460Z|00028|ovsdb_idl|INFO|ovsdb_idl_row_create:
               Now, the transaction is complete

Signed-off-by: Ramu Ramamurthy <ramu.ramamurthy@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
ovn/controller/binding.c

index 32fcb85..a0d8b96 100644 (file)
@@ -155,9 +155,6 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
     const struct sbrec_port_binding *binding_rec;
 
     chassis_rec = get_chassis(ctx->ovnsb_idl, chassis_id);
-    if (!chassis_rec) {
-        return;
-    }
 
     struct shash lports = SHASH_INITIALIZER(&lports);
     if (br_int) {
@@ -190,10 +187,8 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
             if (iface_rec && ctx->ovs_idl_txn) {
                 update_qos(iface_rec, binding_rec);
             }
-            if (binding_rec->chassis == chassis_rec) {
-                continue;
-            }
-            if (ctx->ovnsb_idl_txn) {
+            if (ctx->ovnsb_idl_txn && chassis_rec
+                && binding_rec->chassis != chassis_rec) {
                 if (binding_rec->chassis) {
                     VLOG_INFO("Changing chassis for lport %s from %s to %s.",
                               binding_rec->logical_port,
@@ -205,7 +200,7 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
                 }
                 sbrec_port_binding_set_chassis(binding_rec, chassis_rec);
             }
-        } else if (binding_rec->chassis == chassis_rec) {
+        } else if (chassis_rec && binding_rec->chassis == chassis_rec) {
             if (ctx->ovnsb_idl_txn) {
                 VLOG_INFO("Releasing lport %s from this chassis.",
                           binding_rec->logical_port);