-* ovn-controller
+-*- outline -*-
+
+* L3 support
+
+** New OVN logical actions
+
+*** arp
+
+Generates an ARP packet based on the current IPv4 packet and allows it
+to be processed as part of the current pipeline (and then pop back to
+processing the original IPv4 packet).
+
+TCP/IP stacks typically limit the rate at which ARPs are sent, e.g. to
+one per second for a given target. We might need to do this too.
+
+We probably need to buffer the packet that generated the ARP. I don't
+know where to do that.
+
+*** icmp4 { action... }
+
+Generates an ICMPv4 packet based on the current IPv4 packet and
+processes it according to each nested action (and then pops back to
+processing the original IPv4 packet). The intended use case is for
+generating "time exceeded" and "destination unreachable" errors.
+
+ovn-sb.xml includes a tentative specification for this action.
+
+Tentatively, the icmp4 action sets a default icmp_type and icmp_code
+and lets the nested actions override it. This means that we'd have to
+make icmp_type and icmp_code writable. Because changing icmp_type and
+icmp_code can change the interpretation of the rest of the data in the
+ICMP packet, we would want to think this through carefully. If it
+seems like a bad idea then we could instead make the type and code a
+parameter to the action: icmp4(type, code) { action... }
+
+It is worth considering what should be considered the ingress port for
+the ICMPv4 packet. It's quite likely that the ICMPv4 packet is going
+to go back out the ingress port. Maybe the icmp4 action, therefore,
+should clear the inport, so that output to the original inport won't
+be discarded.
+
+*** tcp_reset
+
+Transforms the current TCP packet into a RST reply.
+
+ovn-sb.xml includes a tentative specification for this action.
+
+*** Other actions for IPv6.
+
+IPv6 will probably need an action or actions for ND that is similar to
+the "arp" action, and an action for generating
+
+*** ovn-controller translation to OpenFlow
+
+The following two translation strategies come to mind. Some of the
+new actions we might want to implement one way, some of them the
+other, depending on the details.
+
+*** Implementation strategies
+
+One way to do this is to define new actions as Open vSwitch extensions
+to OpenFlow, emit those actions in ovn-controller, and implement them
+in ovs-vswitchd (possibly pushing the implementations into the Linux
+and DPDK datapaths as well). This is the only acceptable way for
+actions that need high performance. None of these actions obviously
+need high performance, but it might be necessary to have fairness in
+handling e.g. a flood of incoming packets that require these actions.
+The main disadvantage of this approach is that it ties ovs-vswitchd
+(and the Linux kernel module) to supporting these actions essentially
+forever, which means that we'd want to make sure that they are
+general-purpose, well designed, maintainable, and supportable.
+
+The other way to do this is to send the packets across an OpenFlow
+channel to ovn-controller and have ovn-controller process them. This
+is acceptable for actions that don't need high performance, and it
+means that we don't add anything permanently to ovs-vswitchd or the
+kernel (so we can be more casual about the design). The big
+disadvantage is that it becomes necessary to add a way to resume the
+OpenFlow pipeline when it is interrupted in the middle by sending a
+packet to the controller. This is not as simple as doing a new flow
+table lookup and resuming from that point. Instead, it is equivalent
+to the (very complicated) recirculation logic in ofproto-dpif-xlate.c.
+Much of this logic can be translated into OpenFlow actions (e.g. the
+call stack and data stack), but some of it is entirely outside
+OpenFlow (e.g. the state of mirrors). To implement it properly, it
+seems that we'll have to introduce a new Open vSwitch extension to
+OpenFlow, a "send-to-controller" action that causes extra data to be
+sent to the controller, where the extra data packages up the state
+necessary to resume the pipeline. Maybe the bits of the state that
+can be represented in OpenFlow can be embedded in this extra data in a
+controller-readable form, but other bits we might want to be opaque.
+It's also likely that we'll want to change and extend the form of this
+opaque data over time, so this should be allowed for, e.g. by
+including a nonce in the extra data that is newly generated every time
+ovs-vswitchd starts.
+
+*** OpenFlow action definitions
+
+Define OpenFlow wire structures for each new OpenFlow action and
+implement them in lib/ofp-actions.[ch].
+
+*** OVS implementation
+
+Add code for action translation. Possibly add datapath code for
+action implementation. However, none of these new actions should
+require high-bandwidth processing so we could at least start with them
+implemented in userspace only. (ARP field modification is already
+userspace-only and no one has complained yet.)
+
+** IPv6
+
+*** ND versus ARP
+
+*** IPv6 routing
+
+*** ICMPv6
+
+** Dynamic IP to MAC bindings
-** Flow table handling in ovn-controller.
-
- ovn-controller has to transform logical datapath flows from the
- database into OpenFlow flows.
-
-*** Definition (or choice) of data structure for flows and flow table.
-
- It would be natural enough to use "struct flow" and "struct
- classifier" for this. Maybe that is what we should do. However,
- "struct classifier" is optimized for searches based on packet
- headers, whereas all we care about here can be implemented with a
- hash table. Also, we may want to make it easy to add and remove
- support for fields without recompiling, which is not possible with
- "struct flow" or "struct classifier".
-
- On the other hand, we may find that it is difficult to decide that
- two OXM flow matches are identical (to normalize them) without a
- lot of domain-specific knowledge that is already embedded in struct
- flow. It's also going to be a pain to come up with a way to make
- anything other than "struct flow" work with the ofputil_*()
- functions for encoding and decoding OpenFlow.
-
- It's also possible we could use struct flow without struct
- classifier.
-
-*** Translating logical datapath actions into OpenFlow actions.
-
- Some of the logical datapath actions do not have natural
- representations as OpenFlow actions: they require
- packet-in/packet-out round trips through ovn-controller. The
- trickiest part of that is going to be making sure that the
- packet-out resumes the control flow that was broken off by the
- packet-in. That's tricky; we'll probably have to restrict control
- flow or add OVS features to make resuming in general possible. Not
- sure which is better at this point.
-
-*** OpenFlow flow table synchronization.
-
- The internal representation of the OpenFlow flow table has to be
- synced across the controller connection to OVS. This probably
- boils down to the "flow monitoring" feature of OF1.4 which was then
- made available as a "standard extension" to OF1.3. (OVS hasn't
- implemented this for OF1.4 yet, but the feature is based on a OVS
- extension to OF1.0, so it should be straightforward to add it.)
-
- We probably need some way to catch cases where OVS and OVN don't
- see eye-to-eye on what exactly constitutes a flow, so that OVN
- doesn't waste a lot of CPU time hammering at OVS trying to install
- something that it's not going to do.
-
-*** Logical/physical translation.
-
- When a packet comes into the integration bridge, the first stage of
- processing needs to translate it from a physical to a logical
- context. When a packet leaves the integration bridge, the final
- stage of processing needs to translate it back into a physical
- context. ovn-controller needs to populate the OpenFlow flows
- tables to do these translations.
-
-*** Determine how to split logical pipeline across physical nodes.
-
- From the original OVN architecture document:
-
- The pipeline processing is split between the ingress and egress
- transport nodes. In particular, the logical egress processing may
- occur at either hypervisor. Processing the logical egress on the
- ingress hypervisor requires more state about the egress vif's
- policies, but reduces traffic on the wire that would eventually be
- dropped. Whereas, processing on the egress hypervisor can reduce
- broadcast traffic on the wire by doing local replication. We
- initially plan to process logical egress on the egress hypervisor
- so that less state needs to be replicated. However, we may change
- this behavior once we gain some experience writing the logical
- flows.
-
- The split pipeline processing split will influence how tunnel keys
- are encoded.
-
-*** Monitor Pipeline table in OVN, trigger flow table recomputation on change.
+Some bindings from IP address to MAC will undoubtedly need to be
+discovered dynamically through ARP requests. It's straightforward
+enough for a logical L3 router to generate ARP requests and forward
+them to the appropriate switch.
+
+It's more difficult to figure out where the reply should be processed
+and stored. It might seem at first that a first-cut implementation
+could just keep track of the binding on the hypervisor that needs to
+know, but that can't happen easily because the VM that sends the reply
+might not be on the same HV as the VM that needs the answer (that is,
+the VM that sent the packet that needs the binding to be resolved) and
+there isn't an easy way for it to know which HV needs the answer.
+
+Thus, the HV that processes the ARP reply (which is unknown when the
+ARP is sent) has to tell all the HVs the binding. The most obvious
+place for this in the OVN_Southbound database.
+
+Details need to be worked out, including:
+
+*** OVN_Southbound schema changes.
+
+Possibly bindings could be added to the Port_Binding table by adding
+or modifying columns. Another possibility is that another table
+should be added.
+
+*** Logical_Flow representation
+
+It would be really nice to maintain the general-purpose nature of
+logical flows, but these bindings might have to include some
+hard-coded special cases, especially when it comes to the relationship
+with populating the bindings into the OVN_Southbound table.
+
+*** Tracking queries
+
+It's probably best to only record in the database responses to queries
+actually issued by an L3 logical router, so somehow they have to be
+tracked, probably by putting a tentative binding without a MAC address
+into the database.
+
+*** Renewal and expiration.
+
+Something needs to make sure that bindings remain valid and expire
+those that become stale.
+
+** MTU handling (fragmentation on output)
+
+** Ratelimiting.
+
+*** ARP.
+
+*** ICMP error generation, TCP reset, UDP unreachable, protocol unreachable, ...
+
+As a point of comparison, Linux doesn't ratelimit TCP resets but I
+think it does everything else.
+
+* ovn-controller
** ovn-controller parameters and configuration.
Can probably get this from Open_vSwitch database.
+** Security
+
+*** Limiting the impact of a compromised chassis.
+
+ Every instance of ovn-controller has the same full access to the central
+ OVN_Southbound database. This means that a compromised chassis can
+ interfere with the normal operation of the rest of the deployment. Some
+ specific examples include writing to the logical flow table to alter
+ traffic handling or updating the port binding table to claim ports that are
+ actually present on a different chassis. In practice, the compromised host
+ would be fighting against ovn-northd and other instances of ovn-controller
+ that would be trying to restore the correct state. The impact could include
+ at least temporarily redirecting traffic (so the compromised host could
+ receive traffic that it shouldn't) and potentially a more general denial of
+ service.
+
+ There are different potential improvements to this area. The first would be
+ to add some sort of ACL scheme to ovsdb-server. A proposal for this should
+ first include an ACL scheme for ovn-controller. An example policy would
+ be to make Logical_Flow read-only. Table-level control is needed, but is
+ not enough. For example, ovn-controller must be able to update the Chassis
+ and Encap tables, but should only be able to modify the rows associated with
+ that chassis and no others.
+
+ A more complex example is the Port_Binding table. Currently, ovn-controller
+ is the source of truth of where a port is located. There seems to be no
+ policy that can prevent malicious behavior of a compromised host with this
+ table.
+
+ An alternative scheme for port bindings would be to provide an optional mode
+ where an external entity controls port bindings and make them read-only to
+ ovn-controller. This is actually how OpenStack works today, for example.
+ The part of OpenStack that manages VMs (Nova) tells the networking component
+ (Neutron) where a port will be located, as opposed to the networking
+ component discovering it.
+
* ovsdb-server
ovsdb-server should have adequate features for OVN but it probably
Andy Zhou is looking at these issues.
-** Scaling number of connections.
-
- In typical use today a given ovsdb-server has only a single-digit
- number of simultaneous connections. The OVN Southbound database will
- have a connection from every hypervisor. This use case needs testing
- and probably coding work. Here are some possible improvements.
-
*** Reducing amount of data sent to clients.
Currently, whenever a row monitored by a client changes,
Currently, clients monitor the entire contents of a table. It
might make sense to allow clients to monitor only rows that
satisfy specific criteria, e.g. to allow an ovn-controller to
- receive only Pipeline rows for logical networks on its hypervisor.
+ receive only Logical_Flow rows for logical networks on its hypervisor.
*** Reducing redundant data and code within ovsdb-server.
Reconciliation Without Prior Context". (I'm not yet aware of
previous non-academic use of this technique.)
-* Miscellaneous:
+** Support multiple tunnel encapsulations in Chassis.
+
+ So far, both ovn-controller and ovn-controller-vtep only allow
+ chassis to have one tunnel encapsulation entry. We should extend
+ the implementation to support multiple tunnel encapsulations.
+
+** Update learned MAC addresses from VTEP to OVN
+
+ The VTEP gateway stores all MAC addresses learned from its
+ physical interfaces in the 'Ucast_Macs_Local' and the
+ 'Mcast_Macs_Local' tables. ovn-controller-vtep should be
+ able to update that information back to ovn-sb database,
+ so that other chassis know where to send packets destined
+ to the extended external network instead of broadcasting.
+
+** Translate ovn-sb Multicast_Group table into VTEP config
+
+ The ovn-controller-vtep daemon should be able to translate
+ the Multicast_Group table entry in ovn-sb database into
+ Mcast_Macs_Remote table configuration in VTEP database.
-** Init scripts for ovn-controller (on HVs), ovn-northd, OVN DB server.
+* Consider the use of BFD as tunnel monitor.
-** Distribution packaging.
+ The use of BFD for hypervisor-to-hypervisor tunnels is probably not worth it,
+ since there's no alternative to switch to if a tunnel goes down. It could
+ make sense at a slow rate if someone does OVN monitoring system integration,
+ but not otherwise.
-* Not yet scoped:
+ When OVN gets to supporting HA for gateways (see ovn/OVN-GW-HA.md), BFD is
+ likely needed as a part of that solution.
-** Neutron plugin.
+ There's more commentary in this ML post:
+ http://openvswitch.org/pipermail/dev/2015-November/062385.html
- This is being developed on OpenStack's development infrastructure
- to be along side most of the other Neutron plugins.
+* ACL
- http://git.openstack.org/cgit/stackforge/networking-ovn
+** Support FTP ALGs.
- http://git.openstack.org/cgit/stackforge/networking-ovn/tree/doc/source/todo.rst
+** Support reject action.
-** Gateways.
+** Support log option.