ovn/TODO

   1 -*- outline -*-
   2
   3 * L3 support
   4
   5 ** New OVN logical actions
   6
   7 *** arp
   8
   9 Generates an ARP packet based on the current IPv4 packet and allows it
  10 to be processed as part of the current pipeline (and then pop back to
  11 processing the original IPv4 packet).
  12
  13 TCP/IP stacks typically limit the rate at which ARPs are sent, e.g. to
  14 one per second for a given target.  We might need to do this too.
  15
  16 We probably need to buffer the packet that generated the ARP.  I don't
  17 know where to do that.
  18
  19 *** icmp4 { action... }
  20
  21 Generates an ICMPv4 packet based on the current IPv4 packet and
  22 processes it according to each nested action (and then pops back to
  23 processing the original IPv4 packet).  The intended use case is for
  24 generating "time exceeded" and "destination unreachable" errors.
  25
  26 ovn-sb.xml includes a tentative specification for this action.
  27
  28 Tentatively, the icmp4 action sets a default icmp_type and icmp_code
  29 and lets the nested actions override it.  This means that we'd have to
  30 make icmp_type and icmp_code writable.  Because changing icmp_type and
  31 icmp_code can change the interpretation of the rest of the data in the
  32 ICMP packet, we would want to think this through carefully.  If it
  33 seems like a bad idea then we could instead make the type and code a
  34 parameter to the action: icmp4(type, code) { action... }
  35
  36 It is worth considering what should be considered the ingress port for
  37 the ICMPv4 packet.  It's quite likely that the ICMPv4 packet is going
  38 to go back out the ingress port.  Maybe the icmp4 action, therefore,
  39 should clear the inport, so that output to the original inport won't
  40 be discarded.
  41
  42 *** tcp_reset
  43
  44 Transforms the current TCP packet into a RST reply.
  45
  46 ovn-sb.xml includes a tentative specification for this action.
  47
  48 *** Other actions for IPv6.
  49
  50 IPv6 will probably need an action or actions for ND that is similar to
  51 the "arp" action, and an action for generating
  52
  53 ** IPv6
  54
  55 *** ND versus ARP
  56
  57 *** IPv6 routing
  58
  59 *** ICMPv6
  60
  61 ** Dynamic IP to MAC bindings
  62
  63 Some bindings from IP address to MAC will undoubtedly need to be
  64 discovered dynamically through ARP requests.  It's straightforward
  65 enough for a logical L3 router to generate ARP requests and forward
  66 them to the appropriate switch.
  67
  68 It's more difficult to figure out where the reply should be processed
  69 and stored.  It might seem at first that a first-cut implementation
  70 could just keep track of the binding on the hypervisor that needs to
  71 know, but that can't happen easily because the VM that sends the reply
  72 might not be on the same HV as the VM that needs the answer (that is,
  73 the VM that sent the packet that needs the binding to be resolved) and
  74 there isn't an easy way for it to know which HV needs the answer.
  75
  76 Thus, the HV that processes the ARP reply (which is unknown when the
  77 ARP is sent) has to tell all the HVs the binding.  The most obvious
  78 place for this in the OVN_Southbound database.
  79
  80 Details need to be worked out, including:
  81
  82 *** OVN_Southbound schema changes.
  83
  84 Possibly bindings could be added to the Port_Binding table by adding
  85 or modifying columns.  Another possibility is that another table
  86 should be added.
  87
  88 *** Logical_Flow representation
  89
  90 It would be really nice to maintain the general-purpose nature of
  91 logical flows, but these bindings might have to include some
  92 hard-coded special cases, especially when it comes to the relationship
  93 with populating the bindings into the OVN_Southbound table.
  94
  95 *** Tracking queries
  96
  97 It's probably best to only record in the database responses to queries
  98 actually issued by an L3 logical router, so somehow they have to be
  99 tracked, probably by putting a tentative binding without a MAC address
 100 into the database.
 101
 102 *** Renewal and expiration.
 103
 104 Something needs to make sure that bindings remain valid and expire
 105 those that become stale.
 106
 107 ** MTU handling (fragmentation on output)
 108
 109 ** Ratelimiting.
 110
 111 *** ARP.
 112
 113 *** ICMP error generation, TCP reset, UDP unreachable, protocol unreachable, ...
 114
 115 As a point of comparison, Linux doesn't ratelimit TCP resets but I
 116 think it does everything else.
 117
 118 * ovn-controller
 119
 120 ** ovn-controller parameters and configuration.
 121
 122 *** SSL configuration.
 123
 124     Can probably get this from Open_vSwitch database.
 125
 126 ** Security
 127
 128 *** Limiting the impact of a compromised chassis.
 129
 130     Every instance of ovn-controller has the same full access to the central
 131     OVN_Southbound database.  This means that a compromised chassis can
 132     interfere with the normal operation of the rest of the deployment.  Some
 133     specific examples include writing to the logical flow table to alter
 134     traffic handling or updating the port binding table to claim ports that are
 135     actually present on a different chassis.  In practice, the compromised host
 136     would be fighting against ovn-northd and other instances of ovn-controller
 137     that would be trying to restore the correct state.  The impact could include
 138     at least temporarily redirecting traffic (so the compromised host could
 139     receive traffic that it shouldn't) and potentially a more general denial of
 140     service.
 141
 142     There are different potential improvements to this area.  The first would be
 143     to add some sort of ACL scheme to ovsdb-server.  A proposal for this should
 144     first include an ACL scheme for ovn-controller.  An example policy would
 145     be to make Logical_Flow read-only.  Table-level control is needed, but is
 146     not enough.  For example, ovn-controller must be able to update the Chassis
 147     and Encap tables, but should only be able to modify the rows associated with
 148     that chassis and no others.
 149
 150     A more complex example is the Port_Binding table.  Currently, ovn-controller
 151     is the source of truth of where a port is located.  There seems to be  no
 152     policy that can prevent malicious behavior of a compromised host with this
 153     table.
 154
 155     An alternative scheme for port bindings would be to provide an optional mode
 156     where an external entity controls port bindings and make them read-only to
 157     ovn-controller.  This is actually how OpenStack works today, for example.
 158     The part of OpenStack that manages VMs (Nova) tells the networking component
 159     (Neutron) where a port will be located, as opposed to the networking
 160     component discovering it.
 161
 162 ** Gratuitous ARP generation
 163
 164    ovn-controller should generate a GARP when a port is bound to a chassis.
 165    This is needed when ports are migrated from one chassis to another, such
 166    as live migrating a VM.
 167
 168 * ovsdb-server
 169
 170   ovsdb-server should have adequate features for OVN but it probably
 171   needs work for scale and possibly for availability as deployments
 172   grow.  Here are some thoughts.
 173
 174   Andy Zhou is looking at these issues.
 175
 176 *** Reducing amount of data sent to clients.
 177
 178     Currently, whenever a row monitored by a client changes,
 179     ovsdb-server sends the client every monitored column in the row,
 180     even if only one column changes.  It might be valuable to reduce
 181     this only to the columns that changes.
 182
 183     Also, whenever a column changes, ovsdb-server sends the entire
 184     contents of the column.  It might be valuable, for columns that
 185     are sets or maps, to send only added or removed values or
 186     key-values pairs.
 187
 188     Currently, clients monitor the entire contents of a table.  It
 189     might make sense to allow clients to monitor only rows that
 190     satisfy specific criteria, e.g. to allow an ovn-controller to
 191     receive only Logical_Flow rows for logical networks on its hypervisor.
 192
 193 *** Reducing redundant data and code within ovsdb-server.
 194
 195     Currently, ovsdb-server separately composes database update
 196     information to send to each of its clients.  This is fine for a
 197     small number of clients, but it wastes time and memory when
 198     hundreds of clients all want the same updates (as will be in the
 199     case in OVN).
 200
 201     (This is somewhat opposed to the idea of letting a client monitor
 202     only some rows in a table, since that would increase the diversity
 203     among clients.)
 204
 205 *** Multithreading.
 206
 207     If it turns out that other changes don't let ovsdb-server scale
 208     adequately, we can multithread ovsdb-server.  Initially one might
 209     only break protocol handling into separate threads, leaving the
 210     actual database work serialized through a lock.
 211
 212 ** Increasing availability.
 213
 214    Database availability might become an issue.  The OVN system
 215    shouldn't grind to a halt if the database becomes unavailable, but
 216    it would become impossible to bring VIFs up or down, etc.
 217
 218    My current thought on how to increase availability is to add
 219    clustering to ovsdb-server, probably via the Raft consensus
 220    algorithm.  As an experiment, I wrote an implementation of Raft
 221    for Open vSwitch that you can clone from:
 222
 223        https://github.com/blp/ovs-reviews.git raft
 224
 225 ** Reducing startup time.
 226
 227    As-is, if ovsdb-server restarts, every client will fetch a fresh
 228    copy of the part of the database that it cares about.  With
 229    hundreds of clients, this could cause heavy CPU load on
 230    ovsdb-server and use excessive network bandwidth.  It would be
 231    better to allow incremental updates even across connection loss.
 232    One way might be to use "Difference Digests" as described in
 233    Epstein et al., "What's the Difference? Efficient Set
 234    Reconciliation Without Prior Context".  (I'm not yet aware of
 235    previous non-academic use of this technique.)
 236
 237 ** Support multiple tunnel encapsulations in Chassis.
 238
 239    So far, both ovn-controller and ovn-controller-vtep only allow
 240    chassis to have one tunnel encapsulation entry.  We should extend
 241    the implementation to support multiple tunnel encapsulations.
 242
 243 ** Update learned MAC addresses from VTEP to OVN
 244
 245    The VTEP gateway stores all MAC addresses learned from its
 246    physical interfaces in the 'Ucast_Macs_Local' and the
 247    'Mcast_Macs_Local' tables.  ovn-controller-vtep should be
 248    able to update that information back to ovn-sb database,
 249    so that other chassis know where to send packets destined
 250    to the extended external network instead of broadcasting.
 251
 252 ** Translate ovn-sb Multicast_Group table into VTEP config
 253
 254    The ovn-controller-vtep daemon should be able to translate
 255    the Multicast_Group table entry in ovn-sb database into
 256    Mcast_Macs_Remote table configuration in VTEP database.
 257
 258 * Consider the use of BFD as tunnel monitor.
 259
 260   The use of BFD for hypervisor-to-hypervisor tunnels is probably not worth it,
 261   since there's no alternative to switch to if a tunnel goes down.  It could
 262   make sense at a slow rate if someone does OVN monitoring system integration,
 263   but not otherwise.
 264
 265   When OVN gets to supporting HA for gateways (see ovn/OVN-GW-HA.md), BFD is
 266   likely needed as a part of that solution.
 267
 268   There's more commentary in this ML post:
 269   http://openvswitch.org/pipermail/dev/2015-November/062385.html
 270
 271 * ACL
 272
 273 ** Support FTP ALGs.
 274
 275 ** Support reject action.
 276
 277 ** Support log option.