include vtep/automake.mk
include datapath-windows/automake.mk
include datapath-windows/include/automake.mk
+include ovn/automake.mk
-# Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014 Nicira, Inc.
+# Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 Nicira, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
AC_CONFIG_COMMANDS([include/openflow/openflow.h.stamp])
AC_CONFIG_COMMANDS([utilities/bugtool/dummy], [:])
+AC_CONFIG_COMMANDS([ovn/dummy], [:])
m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES])
--- /dev/null
+* Flow match expression handling library.
+
+ ovn-controller is the primary user of flow match expressions, but
+ the same syntax and I imagine the same code ought to be useful in
+ ovn-nbd for ACL match expressions.
+
+** Definition of data structures to represent a match expression as a
+ syntax tree.
+
+** Definition of data structures to represent variables (fields).
+
+ Fields need names and prerequisites. Most fields are numeric and
+ thus need widths. We need also need a way to represent nominal
+ fields (currently just logical port names). It might be
+ appropriate to associate fields directly with OXM/NXM code points;
+ we have to decide whether we want OVN to use the OVS flow structure
+ or work with OXM more directly.
+
+ Probably should be defined so that the data structure is also
+ useful for references to fields in action parsing.
+
+** Lexical analysis.
+
+ Probably should be defined so that the lexer can be reused for
+ parsing actions.
+
+** Parsing into syntax tree.
+
+** Semantic checking against variable definitions.
+
+** Applying prerequisites.
+
+** Simplification into conjunction-of-disjunctions (CoD) form.
+
+** Transformation from CoD form into OXM matches.
+
+* ovn-controller
+
+** Flow table handling in ovn-controller.
+
+ ovn-controller has to transform logical datapath flows from the
+ database into OpenFlow flows.
+
+*** Definition (or choice) of data structure for flows and flow table.
+
+ It would be natural enough to use "struct flow" and "struct
+ classifier" for this. Maybe that is what we should do. However,
+ "struct classifier" is optimized for searches based on packet
+ headers, whereas all we care about here can be implemented with a
+ hash table. Also, we may want to make it easy to add and remove
+ support for fields without recompiling, which is not possible with
+ "struct flow" or "struct classifier".
+
+ On the other hand, we may find that it is difficult to decide that
+ two OXM flow matches are identical (to normalize them) without a
+ lot of domain-specific knowledge that is already embedded in struct
+ flow. It's also going to be a pain to come up with a way to make
+ anything other than "struct flow" work with the ofputil_*()
+ functions for encoding and decoding OpenFlow.
+
+ It's also possible we could use struct flow without struct
+ classifier.
+
+*** Assembling conjunctive flows from flow match expressions.
+
+ This transformation explodes logical datapath flows into multiple
+ OpenFlow flow table entries, since a flow match expression in CoD
+ form requires several OpenFlow flow table entries. It also
+ requires merging together OpenFlow flow tables entries that contain
+ "conjunction" actions (really just concatenating their actions).
+
+*** Translating logical datapath port names into port numbers.
+
+ Logical ports are specified by name in logical datapath flows, but
+ OpenFlow only works in terms of numbers.
+
+*** Translating logical datapath actions into OpenFlow actions.
+
+ Some of the logical datapath actions do not have natural
+ representations as OpenFlow actions: they require
+ packet-in/packet-out round trips through ovn-controller. The
+ trickiest part of that is going to be making sure that the
+ packet-out resumes the control flow that was broken off by the
+ packet-in. That's tricky; we'll probably have to restrict control
+ flow or add OVS features to make resuming in general possible. Not
+ sure which is better at this point.
+
+*** OpenFlow flow table synchronization.
+
+ The internal representation of the OpenFlow flow table has to be
+ synced across the controller connection to OVS. This probably
+ boils down to the "flow monitoring" feature of OF1.4 which was then
+ made available as a "standard extension" to OF1.3. (OVS hasn't
+ implemented this for OF1.4 yet, but the feature is based on a OVS
+ extension to OF1.0, so it should be straightforward to add it.)
+
+ We probably need some way to catch cases where OVS and OVN don't
+ see eye-to-eye on what exactly constitutes a flow, so that OVN
+ doesn't waste a lot of CPU time hammering at OVS trying to install
+ something that it's not going to do.
+
+*** Logical/physical translation.
+
+ When a packet comes into the integration bridge, the first stage of
+ processing needs to translate it from a physical to a logical
+ context. When a packet leaves the integration bridge, the final
+ stage of processing needs to translate it back into a physical
+ context. ovn-controller needs to populate the OpenFlow flows
+ tables to do these translations.
+
+*** Determine how to split logical pipeline across physical nodes.
+
+ From the original OVN architecture document:
+
+ The pipeline processing is split between the ingress and egress
+ transport nodes. In particular, the logical egress processing may
+ occur at either hypervisor. Processing the logical egress on the
+ ingress hypervisor requires more state about the egress vif's
+ policies, but reduces traffic on the wire that would eventually be
+ dropped. Whereas, processing on the egress hypervisor can reduce
+ broadcast traffic on the wire by doing local replication. We
+ initially plan to process logical egress on the egress hypervisor
+ so that less state needs to be replicated. However, we may change
+ this behavior once we gain some experience writing the logical
+ flows.
+
+ The split pipeline processing split will influence how tunnel keys
+ are encoded.
+
+** Interaction with Open_vSwitch and OVN databases:
+
+*** Monitor VIFs attached to the integration bridge in Open_vSwitch.
+
+ In response to changes, add or remove corresponding rows in
+ Bindings table in OVN.
+
+*** Populate Chassis row in OVN at startup. Maintain Chassis row over time.
+
+ (Warn if any other Chassis claims the same IP address.)
+
+*** Remove Chassis and Bindings rows from OVN on exit.
+
+*** Monitor Chassis table in OVN.
+
+ Populate Port records for tunnels to other chassis into
+ Open_vSwitch database. As a scale optimization later on, one can
+ populate only records for tunnels to other chassis that have
+ logical networks in common with this one.
+
+*** Monitor Pipeline table in OVN, trigger flow table recomputation on change.
+
+** ovn-controller parameters and configuration.
+
+*** Tunnel encapsulation to publish.
+
+ Default: VXLAN? Geneve?
+
+*** Location of Open_vSwitch database.
+
+ We can probably use the same default as ovs-vsctl.
+
+*** Location of OVN database.
+
+ Probably no useful default.
+
+*** SSL configuration.
+
+ Can probably get this from Open_vSwitch database.
+
+* ovn-nbd
+
+** Monitor OVN_Northbound database, trigger Pipeline recomputation on change.
+
+** Translate each OVN_Northbound entity into Pipeline logical datapath flows.
+
+ We have to first sit down and figure out what the general
+ translation of each entity is. The original OVN architecture
+ description at
+ http://openvswitch.org/pipermail/dev/2015-January/050380.html had
+ some sketches of these, but they need to be completed and
+ elaborated.
+
+ Initially, the simplest way to do this is probably to write
+ straight C code to do a full translation of the entire
+ OVN_Northbound database into the format for the Pipeline table in
+ the OVN database. As scale increases, this will probably be too
+ inefficient since a small change in OVN_Northbound requires a full
+ recomputation. At that point, we probably want to adopt a more
+ systematic approach, such as something akin to the "nlog" system
+ used in NVP (see Koponen et al. "Network Virtualization in
+ Multi-tenant Datacenters", NSDI 2014).
+
+** Push logical datapath flows to Pipeline table.
+
+** Monitor OVN database Bindings table.
+
+ Sync rows in the OVN Bindings table to the "up" column in the
+ OVN_Northbound database.
+
+* ovsdb-server
+
+ ovsdb-server should have adequate features for OVN but it probably
+ needs work for scale and possibly for availability as deployments
+ grow. Here are some thoughts.
+
+ Andy Zhou is looking at these issues.
+
+** Scaling number of connections.
+
+ In typical use today a given ovsdb-server has only a single-digit
+ number of simultaneous connections. The OVN database will have a
+ connection from every hypervisor. This use case needs testing and
+ probably coding work. Here are some possible improvements.
+
+*** Reducing amount of data sent to clients.
+
+ Currently, whenever a row monitored by a client changes,
+ ovsdb-server sends the client every monitored column in the row,
+ even if only one column changes. It might be valuable to reduce
+ this only to the columns that changes.
+
+ Also, whenever a column changes, ovsdb-server sends the entire
+ contents of the column. It might be valuable, for columns that
+ are sets or maps, to send only added or removed values or
+ key-values pairs.
+
+ Currently, clients monitor the entire contents of a table. It
+ might make sense to allow clients to monitor only rows that
+ satisfy specific criteria, e.g. to allow an ovn-controller to
+ receive only Pipeline rows for logical networks on its hypervisor.
+
+*** Reducing redundant data and code within ovsdb-server.
+
+ Currently, ovsdb-server separately composes database update
+ information to send to each of its clients. This is fine for a
+ small number of clients, but it wastes time and memory when
+ hundreds of clients all want the same updates (as will be in the
+ case in OVN).
+
+ (This is somewhat opposed to the idea of letting a client monitor
+ only some rows in a table, since that would increase the diversity
+ among clients.)
+
+*** Multithreading.
+
+ If it turns out that other changes don't let ovsdb-server scale
+ adequately, we can multithread ovsdb-server. Initially one might
+ only break protocol handling into separate threads, leaving the
+ actual database work serialized through a lock.
+
+** Increasing availability.
+
+ Database availability might become an issue. The OVN system
+ shouldn't grind to a halt if the database becomes unavailable, but
+ it would become impossible to bring VIFs up or down, etc.
+
+ My current thought on how to increase availability is to add
+ clustering to ovsdb-server, probably via the Raft consensus
+ algorithm. As an experiment, I wrote an implementation of Raft
+ for Open vSwitch that you can clone from:
+
+ https://github.com/blp/ovs-reviews.git raft
+
+** Reducing startup time.
+
+ As-is, if ovsdb-server restarts, every client will fetch a fresh
+ copy of the part of the database that it cares about. With
+ hundreds of clients, this could cause heavy CPU load on
+ ovsdb-server and use excessive network bandwidth. It would be
+ better to allow incremental updates even across connection loss.
+ One way might be to use "Difference Digests" as described in
+ Epstein et al., "What's the Difference? Efficient Set
+ Reconciliation Without Prior Context". (I'm not yet aware of
+ previous non-academic use of this technique.)
+
+* Miscellaneous:
+
+** Write ovn-nbctl utility.
+
+ The idea here is that we need a utility to act on the OVN_Northbound
+ database in a way similar to a CMS, so that we can do some testing
+ without an actual CMS in the picture.
+
+ No details yet.
+
+** Init scripts for ovn-controller (on HVs), ovn-nbd, OVN DB server.
+
+** Distribution packaging.
+
+* Not yet scoped:
+
+** Neutron plugin.
+
+*** Create stackforge/networking-ovn repository based on OpenStack's
+cookiecutter git repo generator
+
+*** Document mappings between Neutron data model and the OVN northbound DB
+
+*** Create a Neutron ML2 mechanism driver that implements the mappings
+on Neutron resource requests
+
+*** Add synchronization for when we need to sanity check that the OVN
+northbound DB reflects the current state of the world as intended by
+Neutron (needed for various failure scenarios)
+
+** Gateways.
--- /dev/null
+# OVN schema and IDL
+EXTRA_DIST += ovn/ovn.ovsschema
+pkgdata_DATA += ovn/ovn.ovsschema
+
+# OVN E-R diagram
+#
+# If "python" or "dot" is not available, then we do not add graphical diagram
+# to the documentation.
+if HAVE_PYTHON
+if HAVE_DOT
+ovn/ovn.gv: ovsdb/ovsdb-dot.in ovn/ovn.ovsschema
+ $(AM_V_GEN)$(OVSDB_DOT) --no-arrows $(srcdir)/ovn/ovn.ovsschema > $@
+ovn/ovn.pic: ovn/ovn.gv ovsdb/dot2pic
+ $(AM_V_GEN)(dot -T plain < ovn/ovn.gv | $(PERL) $(srcdir)/ovsdb/dot2pic -f 3) > $@.tmp && \
+ mv $@.tmp $@
+OVN_PIC = ovn/ovn.pic
+OVN_DOT_DIAGRAM_ARG = --er-diagram=$(OVN_PIC)
+DISTCLEANFILES += ovn/ovn.gv ovn/ovn.pic
+endif
+endif
+
+# OVN schema documentation
+EXTRA_DIST += ovn/ovn.xml
+DISTCLEANFILES += ovn/ovn.5
+man_MANS += ovn/ovn.5
+ovn/ovn.5: \
+ ovsdb/ovsdb-doc ovn/ovn.xml ovn/ovn.ovsschema $(OVN_PIC)
+ $(AM_V_GEN)$(OVSDB_DOC) \
+ $(OVN_DOT_DIAGRAM_ARG) \
+ --version=$(VERSION) \
+ $(srcdir)/ovn/ovn.ovsschema \
+ $(srcdir)/ovn/ovn.xml > $@.tmp && \
+ mv $@.tmp $@
+
+# OVN northbound schema and IDL
+EXTRA_DIST += ovn/ovn-nb.ovsschema
+pkgdata_DATA += ovn/ovn-nb.ovsschema
+
+# OVN northbound E-R diagram
+#
+# If "python" or "dot" is not available, then we do not add graphical diagram
+# to the documentation.
+if HAVE_PYTHON
+if HAVE_DOT
+ovn/ovn-nb.gv: ovsdb/ovsdb-dot.in ovn/ovn-nb.ovsschema
+ $(AM_V_GEN)$(OVSDB_DOT) --no-arrows $(srcdir)/ovn/ovn-nb.ovsschema > $@
+ovn/ovn-nb.pic: ovn/ovn-nb.gv ovsdb/dot2pic
+ $(AM_V_GEN)(dot -T plain < ovn/ovn-nb.gv | $(PERL) $(srcdir)/ovsdb/dot2pic -f 3) > $@.tmp && \
+ mv $@.tmp $@
+OVN_NB_PIC = ovn/ovn-nb.pic
+OVN_NB_DOT_DIAGRAM_ARG = --er-diagram=$(OVN_NB_PIC)
+DISTCLEANFILES += ovn/ovn-nb.gv ovn/ovn-nb.pic
+endif
+endif
+
+# OVN northbound schema documentation
+EXTRA_DIST += ovn/ovn-nb.xml
+DISTCLEANFILES += ovn/ovn-nb.5
+man_MANS += ovn/ovn-nb.5
+ovn/ovn-nb.5: \
+ ovsdb/ovsdb-doc ovn/ovn-nb.xml ovn/ovn-nb.ovsschema $(OVN_NB_PIC)
+ $(AM_V_GEN)$(OVSDB_DOC) \
+ $(OVN_NB_DOT_DIAGRAM_ARG) \
+ --version=$(VERSION) \
+ $(srcdir)/ovn/ovn-nb.ovsschema \
+ $(srcdir)/ovn/ovn-nb.xml > $@.tmp && \
+ mv $@.tmp $@
+
+man_MANS += ovn/ovn-controller.8 ovn/ovn-architecture.7
+EXTRA_DIST += ovn/ovn-controller.8.in ovn/ovn-architecture.7.xml
+
+SUFFIXES += .xml
+%: %.xml
+ $(AM_V_GEN)$(run_python) $(srcdir)/build-aux/xml2nroff \
+ --version=$(VERSION) $< > $@.tmp && mv $@.tmp $@
+
+EXTRA_DIST += ovn/TODO
--- /dev/null
+<?xml version="1.0" encoding="utf-8"?>
+<manpage program="ovn-architecture" section="7" title="OVN Architecture">
+ <h1>Name</h1>
+ <p>ovn-architecture -- Open Virtual Network architecture</p>
+
+ <h1>Description</h1>
+
+ <p>
+ OVN, the Open Virtual Network, is a system to support virtual network
+ abstraction. OVN complements the existing capabilities of OVS to add
+ native support for virtual network abstractions, such as virtual L2 and L3
+ overlays and security groups. Services such as DHCP are also desirable
+ features. Just like OVS, OVN's design goal is to have a production-quality
+ implementation that can operate at significant scale.
+ </p>
+
+ <p>
+ An OVN deployment consists of several components:
+ </p>
+
+ <ul>
+ <li>
+ <p>
+ A <dfn>Cloud Management System</dfn> (<dfn>CMS</dfn>), which is
+ OVN's ultimate client (via its users and administrators). OVN
+ integration requires installing a CMS-specific plugin and
+ related software (see below). OVN initially targets OpenStack
+ as CMS.
+ </p>
+
+ <p>
+ We generally speak of ``the'' CMS, but one can imagine scenarios in
+ which multiple CMSes manage different parts of an OVN deployment.
+ </p>
+ </li>
+
+ <li>
+ An OVN Database physical or virtual node (or, eventually, cluster)
+ installed in a central location.
+ </li>
+
+ <li>
+ One or more (usually many) <dfn>hypervisors</dfn>. Hypervisors must run
+ Open vSwitch and implement the interface described in
+ <code>IntegrationGuide.md</code> in the OVS source tree. Any hypervisor
+ platform supported by Open vSwitch is acceptable.
+ </li>
+
+ <li>
+ <p>
+ Zero or more <dfn>gateways</dfn>. A gateway extends a tunnel-based
+ logical network into a physical network by bidirectionally forwarding
+ packets between tunnels and a physical Ethernet port. This allows
+ non-virtualized machines to participate in logical networks. A gateway
+ may be a physical host, a virtual machine, or an ASIC-based hardware
+ switch that supports the <code>vtep</code>(5) schema. (Support for the
+ latter will come later in OVN implementation.)
+ </p>
+
+ <p>
+ Hypervisors and gateways are together called <dfn>transport node</dfn>
+ or <dfn>chassis</dfn>.
+ </p>
+ </li>
+ </ul>
+
+ <p>
+ The diagram below shows how the major components of OVN and related
+ software interact. Starting at the top of the diagram, we have:
+ </p>
+
+ <ul>
+ <li>
+ The Cloud Management System, as defined above.
+ </li>
+
+ <li>
+ <p>
+ The <dfn>OVN/CMS Plugin</dfn> is the component of the CMS that
+ interfaces to OVN. In OpenStack, this is a Neutron plugin.
+ The plugin's main purpose is to translate the CMS's notion of logical
+ network configuration, stored in the CMS's configuration database in a
+ CMS-specific format, into an intermediate representation understood by
+ OVN.
+ </p>
+
+ <p>
+ This component is necessarily CMS-specific, so a new plugin needs to be
+ developed for each CMS that is integrated with OVN. All of the
+ components below this one in the diagram are CMS-independent.
+ </p>
+ </li>
+
+ <li>
+ <p>
+ The <dfn>OVN Northbound Database</dfn> receives the intermediate
+ representation of logical network configuration passed down by the
+ OVN/CMS Plugin. The database schema is meant to be ``impedance
+ matched'' with the concepts used in a CMS, so that it directly supports
+ notions of logical switches, routers, ACLs, and so on. See
+ <code>ovs-nb</code>(5) for details.
+ </p>
+
+ <p>
+ The OVN Northbound Database has only two clients: the OVN/CMS Plugin
+ above it and <code>ovn-nbd</code> below it.
+ </p>
+ </li>
+
+ <li>
+ <code>ovn-nbd</code>(8) connects to the OVN Northbound Database above it
+ and the OVN Database below it. It translates the logical network
+ configuration in terms of conventional network concepts, taken from the
+ OVN Northbound Database, into logical datapath flows in the OVN Database
+ below it.
+ </li>
+
+ <li>
+ <p>
+ The <dfn>OVN Database</dfn> is the center of the system. Its clients
+ are <code>ovn-nbd</code>(8) above it and <code>ovn-controller</code>(8)
+ on every transport node below it.
+ </p>
+
+ <p>
+ The OVN Database contains three kinds of data: <dfn>Physical
+ Network</dfn> (PN) tables that specify how to reach hypervisor and
+ other nodes, <dfn>Logical Network</dfn> (LN) tables that describe the
+ logical network in terms of ``logical datapath flows,'' and
+ <dfn>Binding</dfn> tables that link logical network components'
+ locations to the physical network. The hypervisors populate the PN and
+ Binding tables, whereas <code>ovn-nbd</code>(8) populates the LN
+ tables.
+ </p>
+
+ <p>
+ OVN Database performance must scale with the number of transport nodes.
+ This will likely require some work on <code>ovsdb-server</code>(1) as
+ we encounter bottlenecks. Clustering for availability may be needed.
+ </p>
+ </li>
+ </ul>
+
+ <p>
+ The remaining components are replicated onto each hypervisor:
+ </p>
+
+ <ul>
+ <li>
+ <code>ovn-controller</code>(8) is OVN's agent on each hypervisor and
+ software gateway. Northbound, it connects to the OVN Database to learn
+ about OVN configuration and status and to populate the PN and <code>Bindings</code>
+ tables with the hypervisor's status. Southbound, it connects to
+ <code>ovs-vswitchd</code>(8) as an OpenFlow controller, for control over
+ network traffic, and to the local <code>ovsdb-server</code>(1) to allow
+ it to monitor and control Open vSwitch configuration.
+ </li>
+
+ <li>
+ <code>ovs-vswitchd</code>(8) and <code>ovsdb-server</code>(1) are
+ conventional components of Open vSwitch.
+ </li>
+ </ul>
+
+ <pre fixed="yes">
+ CMS
+ |
+ |
+ +-----------|-----------+
+ | | |
+ | OVN/CMS Plugin |
+ | | |
+ | | |
+ | OVN Northbound DB |
+ | | |
+ | | |
+ | ovn-nbd |
+ | | |
+ +-----------|-----------+
+ |
+ |
+ +------+
+ |OVN DB|
+ +------+
+ |
+ |
+ +------------------+------------------+
+ | | |
+ HV 1 | | HV n |
++---------------|---------------+ . +---------------|---------------+
+| | | . | | |
+| ovn-controller | . | ovn-controller |
+| | | | . | | | |
+| | | | | | | |
+| ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server |
+| | | |
++-------------------------------+ +-------------------------------+
+ </pre>
+
+ <h3>Life Cycle of a VIF</h3>
+
+ <p>
+ Tables and their schemas presented in isolation are difficult to
+ understand. Here's an example.
+ </p>
+
+ <p>
+ The steps in this example refer often to details of the OVN and OVN
+ Northbound database schemas. Please see <code>ovn</code>(5) and
+ <code>ovn-nb</code>(5), respectively, for the full story on these
+ databases.
+ </p>
+
+ <ol>
+ <li>
+ A VIF's life cycle begins when a CMS administrator creates a new VIF
+ using the CMS user interface or API and adds it to a switch (one
+ implemented by OVN as a logical switch). The CMS updates its own
+ configuration. This includes associating unique, persistent identifier
+ <var>vif-id</var> and Ethernet address <var>mac</var> with the VIF.
+ </li>
+
+ <li>
+ The CMS plugin updates the OVN Northbound database to include the new
+ VIF, by adding a row to the <code>Logical_Port</code> table. In the new
+ row, <code>name</code> is <var>vif-id</var>, <code>mac</code> is
+ <var>mac</var>, <code>switch</code> points to the OVN logical switch's
+ Logical_Switch record, and other columns are initialized appropriately.
+ </li>
+
+ <li>
+ <code>ovs-nbd</code> receives the OVN Northbound database update. In
+ turn, it makes the corresponding updates to the OVN database, by adding
+ rows to the OVN database <code>Pipeline</code> table to reflect the new
+ port, e.g. add a flow to recognize that packets destined to the new
+ port's MAC address should be delivered to it, and update the flow that
+ delivers broadcast and multicast packets to include the new port.
+ </li>
+
+ <li>
+ On every hypervisor, <code>ovn-controller</code> receives the
+ <code>Pipeline</code> table updates that <code>ovs-nbd</code> made in the
+ previous step. As long as the VM that owns the VIF is powered off,
+ <code>ovn-controller</code> cannot do much; it cannot, for example,
+ arrange to send packets to or receive packets from the VIF, because the
+ VIF does not actually exist anywhere.
+ </li>
+
+ <li>
+ Eventually, a user powers on the VM that owns the VIF. On the hypervisor
+ where the VM is powered on, the integration between the hypervisor and
+ Open vSwitch (described in <code>IntegrationGuide.md</code>) adds the VIF
+ to the OVN integration bridge and stores <var>vif-id</var> in
+ <code>external-ids</code>:<code>iface-id</code> to indicate that the
+ interface is an instantiation of the new VIF. (None of this code is new
+ in OVN; this is pre-existing integration work that has already been done
+ on hypervisors that support OVS.)
+ </li>
+
+ <li>
+ On the hypervisor where the VM is powered on, <code>ovn-controller</code>
+ notices <code>external-ids</code>:<code>iface-id</code> in the new
+ Interface. In response, it updates the local hypervisor's OpenFlow
+ tables so that packets to and from the VIF are properly handled.
+ Afterward, it updates the <code>Bindings</code> table in the OVN DB,
+ adding a row that links the logical port from
+ <code>external-ids</code>:<code>iface-id</code> to the hypervisor.
+ </li>
+
+ <li>
+ Some CMS systems, including OpenStack, fully start a VM only when its
+ networking is ready. To support this, <code>ovn-nbd</code> notices the
+ new row in the <code>Bindings</code> table, and pushes this upward by
+ updating the <ref column="up" table="Logical_Port" db="OVN_NB"/> column
+ in the OVN Northbound database's <ref table="Logical_Port" db="OVN_NB"/>
+ table to indicate that the VIF is now up. The CMS, if it uses this
+ feature, can then react by allowing the VM's execution to proceed.
+ </li>
+
+ <li>
+ On every hypervisor but the one where the VIF resides,
+ <code>ovn-controller</code> notices the new row in the
+ <code>Bindings</code> table. This provides <code>ovn-controller</code>
+ the physical location of the logical port, so each instance updates the
+ OpenFlow tables of its switch (based on logical datapath flows in the OVN
+ DB <code>Pipeline</code> table) so that packets to and from the VIF can
+ be properly handled via tunnels.
+ </li>
+
+ <li>
+ Eventually, a user powers off the VM that owns the VIF. On the
+ hypervisor where the VM was powered on, the VIF is deleted from the OVN
+ integration bridge.
+ </li>
+
+ <li>
+ On the hypervisor where the VM was powered on,
+ <code>ovn-controller</code> notices that the VIF was deleted. In
+ response, it removes the logical port's row from the
+ <code>Bindings</code> table.
+ </li>
+
+ <li>
+ On every hypervisor, <code>ovn-controller</code> notices the row removed
+ from the <code>Bindings</code> table. This means that
+ <code>ovn-controller</code> no longer knows the physical location of the
+ logical port, so each instance updates its OpenFlow table to reflect
+ that.
+ </li>
+
+ <li>
+ Eventually, when the VIF (or its entire VM) is no longer needed by
+ anyone, an administrator deletes the VIF using the CMS user interface or
+ API. The CMS updates its own configuration.
+ </li>
+
+ <li>
+ The CMS plugin removes the VIF from the OVN Northbound database,
+ by deleting its row in the <code>Logical_Port</code> table.
+ </li>
+
+ <li>
+ <code>ovs-nbd</code> receives the OVN Northbound update and in turn
+ updates the OVN database accordingly, by removing or updating the
+ rows from the OVN database <code>Pipeline</code> table that were related
+ to the now-destroyed VIF.
+ </li>
+
+ <li>
+ On every hypervisor, <code>ovn-controller</code> receives the
+ <code>Pipeline</code> table updates that <code>ovs-nbd</code> made in the
+ previous step. <code>ovn-controller</code> updates OpenFlow tables to
+ reflect the update, although there may not be much to do, since the VIF
+ had already become unreachable when it was removed from the
+ <code>Bindings</code> table in a previous step.
+ </li>
+ </ol>
+
+</manpage>
--- /dev/null
+.\" -*- nroff -*-
+.de IQ
+. br
+. ns
+. IP "\\$1"
+..
+.TH ovn\-controller 8 "@VERSION@" "Open vSwitch" "Open vSwitch Manual"
+.ds PN ovn\-controller
+.
+.SH NAME
+ovn\-controller \- OVN local controller
+.
+.SH SYNOPSIS
+\fBovn\-controller\fR [\fIoptions\fR]
+.
+.SH DESCRIPTION
+\fBovn\-controller\fR is the local controller daemon for OVN, the Open
+Virtual Network. It connects northbound to the OVN database (see
+\fBovn\fR(5)) over the OVSDB protocol, and southbound to the Open
+vSwitch database (see \fBovs-vswitchd.conf.db\fR(5)) over the OVSDB
+protocol and to \fBovs\-vswitchd\fR(8) via OpenFlow. Each hypervisor
+and software gateway in an OVN deployment runs its own independent
+copy of \fBovn\-controller\fR; thus, \fBovn\-controller\fR's
+southbound connections are machine-local and do not run over a
+physical network.
+.PP
+XXX this is completely skeletal.
+.
+.SH OPTIONS
+.SS "Public Key Infrastructure Options"
+.so lib/ssl.man
+.so lib/ssl-peer-ca-cert.man
+.ds DD
+.so lib/daemon.man
+.so lib/vlog.man
+.so lib/unixctl.man
+.so lib/common.man
+.
+.SH "SEE ALSO"
+.
+\fBovn\-architecture\fR(7)
--- /dev/null
+{
+ "name": "OVN_Northbound",
+ "tables": {
+ "Logical_Switch": {
+ "columns": {
+ "router_port": {"type": {"key": {"type": "uuid",
+ "refTable": "Logical_Router_Port",
+ "refType": "strong"},
+ "min": 0, "max": 1}},
+ "external_ids": {
+ "type": {"key": "string", "value": "string",
+ "min": 0, "max": "unlimited"}}}},
+ "Logical_Port": {
+ "columns": {
+ "switch": {"type": {"key": {"type": "uuid",
+ "refTable": "Logical_Switch",
+ "refType": "strong"}}},
+ "name": {"type": "string"},
+ "macs": {"type": {"key": "string",
+ "min": 0,
+ "max": "unlimited"}},
+ "port_security": {"type": {"key": "string",
+ "min": 0,
+ "max": "unlimited"}},
+ "up": {"type": {"key": "boolean", "min": 0, "max": 1}},
+ "external_ids": {
+ "type": {"key": "string", "value": "string",
+ "min": 0, "max": "unlimited"}}},
+ "indexes": [["name"]]},
+ "ACL": {
+ "columns": {
+ "switch": {"type": {"key": {"type": "uuid",
+ "refTable": "Logical_Switch",
+ "refType": "strong"}}},
+ "priority": {"type": {"key": {"type": "integer",
+ "minInteger": 0,
+ "maxInteger": 65535}}},
+ "match": {"type": "string"},
+ "action": {"type": {"key": {"type": "string",
+ "enum": ["set", ["allow", "allow-related", "drop", "reject"]]}}},
+ "log": {"type": "boolean"},
+ "external_ids": {
+ "type": {"key": "string", "value": "string",
+ "min": 0, "max": "unlimited"}}}},
+ "Logical_Router": {
+ "columns": {
+ "ip": {"type": "string"},
+ "default_gw": {"type": {"key": "string", "min": 0, "max": 1}},
+ "external_ids": {
+ "type": {"key": "string", "value": "string",
+ "min": 0, "max": "unlimited"}}}},
+ "Logical_Router_Port": {
+ "columns": {
+ "router": {"type": {"key": {"type": "uuid",
+ "refTable": "Logical_Router",
+ "refType": "strong"}}},
+ "network": {"type": "string"},
+ "mac": {"type": "string"},
+ "external_ids": {
+ "type": {"key": "string", "value": "string",
+ "min": 0, "max": "unlimited"}}}}},
+ "version": "1.0.0"}
--- /dev/null
+<?xml version="1.0" encoding="utf-8"?>
+<database name="ovn-nb" title="OVN Northbound Database">
+ <p>
+ This database is the interface between OVN and the cloud management system
+ (CMS), such as OpenStack, running above it. The CMS produces almost all of
+ the contents of the database. The <code>ovs-nbd</code> program monitors
+ the database contents, transforms it, and stores it into the <ref
+ db="OVN"/> database.
+ </p>
+
+ <p>
+ We generally speak of ``the'' CMS, but one can imagine scenarios in
+ which multiple CMSes manage different parts of an OVN deployment.
+ </p>
+
+ <h2>External IDs</h2>
+
+ <p>
+ Each of the tables in this database contains a special column, named
+ <code>external_ids</code>. This column has the same form and purpose each
+ place it appears.
+ </p>
+
+ <dl>
+ <dt><code>external_ids</code>: map of string-string pairs</dt>
+ <dd>
+ Key-value pairs for use by the CMS. The CMS might use certain pairs, for
+ example, to identify entities in its own configuration that correspond to
+ those in this database.
+ </dd>
+ </dl>
+
+ <table name="Logical_Switch" title="L2 logical switch">
+ <p>
+ Each row represents one L2 logical switch. A given switch's ports are
+ the <ref table="Logical_Port"/> rows whose <ref table="Logical_Port"
+ column="switch"/> column points to its row.
+ </p>
+
+ <column name="router_port">
+ <p>
+ The router port to which this logical switch is connected, or empty if
+ this logical switch is not connected to any router. A switch may be
+ connected to at most one logical router, but this is not a significant
+ restriction because logical routers may be connected into arbitrary
+ topologies.
+ </p>
+ </column>
+
+ <group title="Common Columns">
+ <column name="external_ids">
+ See <em>External IDs</em> at the beginning of this document.
+ </column>
+ </group>
+ </table>
+
+ <table name="Logical_Port" title="L2 logical switch port">
+ <p>
+ A port within an L2 logical switch.
+ </p>
+
+ <column name="switch">
+ The logical switch to which the logical port is connected.
+ </column>
+
+ <column name="name">
+ The logical port name. The name used here must match those used in the
+ <ref key="iface-id" table="Interface" column="external_ids"
+ db="Open_vSwitch"/> in the <ref db="Open_vSwitch"/> database's <ref
+ table="Interface" db="Open_vSwitch"/> table, because hypervisors use <ref
+ key="iface-id" table="Interface" column="external_ids"
+ db="Open_vSwitch"/> as a lookup key for logical ports.
+ </column>
+
+ <column name="up">
+ This column is populated by <code>ovn-nbd</code>, rather than by the CMS
+ plugin as is most of this database. When a logical port is bound to a
+ physical location in the OVN database <ref db="OVN" table="Bindings"/>
+ table, <code>ovn-nbd</code> sets this column to <code>true</code>;
+ otherwise, or if the port becomes unbound later, it sets it to
+ <code>false</code>. This allows the CMS to wait for a VM's networking to
+ become active before it allows the VM to start.
+ </column>
+
+ <column name="macs">
+ The logical port's own Ethernet address or addresses, each in the form
+ <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>.
+ Like a physical Ethernet NIC, a logical port ordinarily has a single
+ fixed Ethernet address. The string <code>unknown</code> is also allowed
+ to indicate that the logical port has an unknown set of (additional)
+ source addresses.
+ </column>
+
+ <column name="port_security">
+ <p>
+ A set of L2 (Ethernet) or L3 (IPv4 or IPv6) addresses or L2+L3 pairs
+ from which the logical port is allowed to send packets and to which it
+ is allowed to receive packets. If this column is empty, all addresses
+ are permitted.
+ </p>
+
+ <p>
+ Exact syntax is TBD. One could simply use comma- or space-separated L2
+ and L3 addresses in each set member, or replace this by a subset of the
+ general-purpose expression language used for the <ref column="match"
+ table="Pipeline" db="OVN"/> column in the OVN database's <ref
+ table="Pipeline" db="OVN"/> table.
+ </p>
+ </column>
+
+ <group title="Common Columns">
+ <column name="external_ids">
+ See <em>External IDs</em> at the beginning of this document.
+ </column>
+ </group>
+ </table>
+
+ <table name="ACL" title="Access Control List (ACL) rule">
+ <p>
+ Each row in this table represents one ACL rule for the logical switch in
+ its <ref column="switch"/> column. The <ref column="action"/> column for
+ the highest-<ref column="priority"/> matching row in this table
+ determines a packet's treatment. If no row matches, packets are allowed
+ by default. (Default-deny treatment is possible: add a rule with <ref
+ column="priority"/> 0, <code>true</code> as <ref column="match"/>, and
+ <code>deny</code> as <ref column="action"/>.)
+ </p>
+
+ <column name="switch">
+ The switch to which the ACL rule applies. The expression in the
+ <ref column="match"/> column may match against logical ports
+ within this switch.
+ </column>
+
+ <column name="priority">
+ The ACL rule's priority. Rules with numerically higher priority take
+ precedence over those with lower. If two ACL rules with the same
+ priority both match, then the one actually applied to a packet is
+ undefined.
+ </column>
+
+ <column name="match">
+ The packets that the ACL should match, in the same expression language
+ used for the <ref column="match" table="Pipeline" db="OVN"/> column in
+ the OVN database's <ref table="Pipeline" db="OVN"/> table. Match
+ <code>inport</code> and <code>outport</code> against names of logical
+ ports within <ref column="switch"/> to implement ingress and egress ACLs,
+ respectively. In logical switches connected to logical routers, the
+ special port name <code>ROUTER</code> refers to the logical router port.
+ </column>
+
+ <column name="action">
+ <p>The action to take when the ACL rule matches:</p>
+
+ <ul>
+ <li>
+ <code>allow</code>: Forward the packet.
+ </li>
+
+ <li>
+ <code>allow-related</code>: Forward the packet and related traffic
+ (e.g. inbound replies to an outbound connection).
+ </li>
+
+ <li>
+ <code>drop</code>: Silently drop the packet.
+ </li>
+
+ <li>
+ <code>reject</code>: Drop the packet, replying with a RST for TCP or
+ ICMP unreachable message for other IP-based protocols.
+ </li>
+ </ul>
+ </column>
+
+ <column name="log">
+ If set to <code>true</code>, packets that match the ACL will trigger a
+ log message on the transport node or nodes that perform ACL processing.
+ Logging may be combined with any <ref column="action"/>.
+ </column>
+
+ <group title="Common Columns">
+ <column name="external_ids">
+ See <em>External IDs</em> at the beginning of this document.
+ </column>
+ </group>
+ </table>
+
+ <table name="Logical_Router" title="L3 logical router">
+ <p>
+ Each row represents one L3 logical router. A given router's ports are
+ the <ref table="Logical_Router_Port"/> rows whose <ref
+ table="Logical_Router_Port" column="router"/> column points to its row.
+ </p>
+
+ <column name="ip">
+ The logical router's own IP address. The logical router uses this
+ address for ICMP replies (e.g. network unreachable messages) and other
+ traffic that it originates and responds to traffic destined to this
+ address (e.g. ICMP echo requests).
+ </column>
+
+ <column name="default_gw">
+ IP address to use as default gateway, if any.
+ </column>
+
+ <group title="Common Columns">
+ <column name="external_ids">
+ See <em>External IDs</em> at the beginning of this document.
+ </column>
+ </group>
+ </table>
+
+ <table name="Logical_Router_Port" title="L3 logical router port">
+ <p>
+ A port within an L3 logical router.
+ </p>
+
+ <p>
+ A router port is always attached to a switch port. The connection can be
+ identified by following the <ref column="router_port"
+ table="Logical_Port"/> column from an appropriate <ref
+ table="Logical_Port"/> row.
+ </p>
+
+ <column name="router">
+ The router to which the port belongs.
+ </column>
+
+ <column name="network">
+ The IP network and netmask of the network on the router port. Used for
+ routing.
+ </column>
+
+ <column name="mac">
+ The Ethernet address that belongs to this router port.
+ </column>
+
+ <group title="Common Columns">
+ <column name="external_ids">
+ See <em>External IDs</em> at the beginning of this document.
+ </column>
+ </group>
+ </table>
+</database>
--- /dev/null
+{
+ "name": "OVN",
+ "tables": {
+ "Chassis": {
+ "columns": {
+ "name": {"type": "string"},
+ "encap": {"type": {"key": {"type": "string",
+ "enum": ["set", ["stt", "vxlan", "gre"]]}}},
+ "encap_options": {"type": {"key": "string",
+ "value": "string",
+ "min": 0,
+ "max": "unlimited"}},
+ "ip": {"type": "string"},
+ "gateway_ports": {"type": {"key": "string",
+ "value": {"type": "uuid",
+ "refTable": "Gateway",
+ "refType": "strong"},
+ "min": 0,
+ "max": "unlimited"}}},
+ "isRoot": true,
+ "indexes": [["name"]]},
+ "Gateway": {
+ "columns": {"attached_port": {"type": "string"},
+ "vlan_map": {"type": {"key": {"type": "integer",
+ "minInteger": 0,
+ "maxInteger": 4095},
+ "value": {"type": "string"},
+ "min": 0,
+ "max": "unlimited"}}}},
+ "Pipeline": {
+ "columns": {
+ "table_id": {"type": {"key": {"type": "integer",
+ "minInteger": 0,
+ "maxInteger": 127}}},
+ "priority": {"type": {"key": {"type": "integer",
+ "minInteger": 0,
+ "maxInteger": 65535}}},
+ "match": {"type": "string"},
+ "actions": {"type": "string"}},
+ "isRoot": true},
+ "Bindings": {
+ "columns": {
+ "logical_port": {"type": "string"},
+ "chassis": {"type": "string"},
+ "mac": {"type": {"key": "string",
+ "min": 0,
+ "max": "unlimited"}}},
+ "indexes": [["logical_port"]],
+ "isRoot": true}},
+ "version": "1.0.0"}
--- /dev/null
+<?xml version="1.0" encoding="utf-8"?>
+<database name="ovn" title="OVN Database">
+ <p>
+ This database holds logical and physical configuration and state for the
+ Open Virtual Network (OVN) system to support virtual network abstraction.
+ For an introduction to OVN, please see <code>ovn-architecture</code>(7).
+ </p>
+
+ <p>
+ The OVN database sits at the center of the OVN architecture. It is the one
+ component that speaks both southbound directly to all the hypervisors and
+ gateways, via <code>ovn-controller</code>, and northbound to the Cloud
+ Management System, via <code>ovn-nbd</code>:
+ </p>
+
+ <h2>Database Structure</h2>
+
+ <p>
+ The OVN database contains three classes of data with different properties,
+ as described in the sections below.
+ </p>
+
+ <h3>Physical Network (PN) data</h3>
+
+ <p>
+ PN tables contain information about the chassis nodes in the system. This
+ contains all the information necessary to wire the overlay, such as IP
+ addresses, supported tunnel types, and security keys.
+ </p>
+
+ <p>
+ The amount of PN data is small (O(n) in the number of chassis) and it
+ changes infrequently, so it can be replicated to every chassis.
+ </p>
+
+ <p>
+ The <ref table="Chassis"/> and <ref table="Gateway"/> tables comprise the
+ PN tables.
+ </p>
+
+ <h3>Logical Network (LN) data</h3>
+
+ <p>
+ LN tables contain the topology of logical switches and routers, ACLs,
+ firewall rules, and everything needed to describe how packets traverse a
+ logical network, represented as logical datapath flows (see Logical
+ Datapath Flows, below).
+ </p>
+
+ <p>
+ LN data may be large (O(n) in the number of logical ports, ACL rules,
+ etc.). Thus, to improve scaling, each chassis should receive only data
+ related to logical networks in which that chassis participates. Past
+ experience shows that in the presence of large logical networks, even
+ finer-grained partitioning of data, e.g. designing logical flows so that
+ only the chassis hosting a logical port needs related flows, pays off
+ scale-wise. (This is not necessary initially but it is worth bearing in
+ mind in the design.)
+ </p>
+
+ <p>
+ The LN is a slave of the cloud management system running northbound of OVN.
+ That CMS determines the entire OVN logical configuration and therefore the
+ LN's content at any given time is a deterministic function of the CMS's
+ configuration, although that happens indirectly via the OVN Northbound DB
+ and <code>ovn-nbd</code>.
+ </p>
+
+ <p>
+ LN data is likely to change more quickly than PN data. This is especially
+ true in a container environment where VMs are created and destroyed (and
+ therefore added to and deleted from logical switches) quickly.
+ </p>
+
+ <p>
+ The <ref table="Pipeline"/> table is currently the only LN table.
+ </p>
+
+ <h3>Bindings data</h3>
+
+ <p>
+ The Bindings tables contain the current placement of logical components
+ (such as VMs and VIFs) onto chassis and the bindings between logical ports
+ and MACs.
+ </p>
+
+ <p>
+ Bindings change frequently, at least every time a VM powers up or down
+ or migrates, and especially quickly in a container environment. The
+ amount of data per VM (or VIF) is small.
+ </p>
+
+ <p>
+ Each chassis is authoritative about the VMs and VIFs that it hosts at any
+ given time and can efficiently flood that state to a central location, so
+ the consistency needs are minimal.
+ </p>
+
+ <p>
+ The <ref table="Bindings"/> table is currently the only Bindings table.
+ </p>
+
+ <table name="Chassis" title="Physical Network Hypervisor and Gateway Information">
+ <p>
+ Each row in this table represents a hypervisor or gateway (a chassis) in
+ the physical network (PN). Each chassis, via
+ <code>ovn-controller</code>, adds and updates its own row, and keeps a
+ copy of the remaining rows to determine how to reach other hypervisors.
+ </p>
+
+ <p>
+ When a chassis shuts down gracefully, it should remove its own row.
+ (This is not critical because resources hosted on the chassis are equally
+ unreachable regardless of whether the row is present.) If a chassis
+ shuts down permanently without removing its row, some kind of manual or
+ automatic cleanup is eventually needed; we can devise a process for that
+ as necessary.
+ </p>
+
+ <column name="name">
+ A chassis name, taken from <ref key="system-id" table="Open_vSwitch"
+ column="external_ids" db="Open_vSwitch"/> in the Open_vSwitch
+ database's <ref table="Open_vSwitch" db="Open_vSwitch"/> table. OVN does
+ not prescribe a particular format for chassis names.
+ </column>
+
+ <group title="Encapsulation">
+ <p>
+ These columns together identify how OVN may transmit logical dataplane
+ packets to this chassis.
+ </p>
+
+ <column name="encap">
+ The encapsulation to use to transmit packets to this chassis.
+ </column>
+
+ <column name="encap_options">
+ Options for configuring the encapsulation, e.g. IPsec parameters when
+ IPsec support is introduced. No options are currently defined.
+ </column>
+
+ <column name="ip">
+ The IPv4 address of the encapsulation tunnel endpoint.
+ </column>
+ </group>
+
+ <group title="Gateway Configuration">
+ <p>
+ A <dfn>gateway</dfn> is a chassis that forwards traffic between a
+ logical network and a physical VLAN. Gateways are typically dedicated
+ nodes that do not host VMs.
+ </p>
+
+ <column name="gateway_ports">
+ Maps from the name of a gateway port, which is typically a physical
+ port (e.g. <code>eth1</code>) or an Open vSwitch patch port, to a <ref
+ table="Gateway"/> record that describes the details of the gatewaying
+ function.
+ </column>
+ </group>
+ </table>
+
+ <table name="Gateway" title="Physical Network Gateway Ports">
+ <p>
+ The <ref column="gateway_ports" table="Chassis"/> column in the <ref
+ table="Chassis"/> table refers to rows in this table to connect a chassis
+ port to a gateway function. Each row in this table describes the logical
+ networks to which a gateway port is attached. Each chassis, via
+ <code>ovn-controller</code>(8), adds and updates its own rows, if any
+ (since most chassis are not gateways), and keeps a copy of the remaining
+ rows to determine how to reach other chassis.
+ </p>
+
+ <column name="vlan_map">
+ Maps from a VLAN ID to a logical port name. Thus, each named logical
+ port corresponds to one VLAN on the gateway port.
+ </column>
+
+ <column name="attached_port">
+ The name of the gateway port in the chassis's Open vSwitch integration
+ bridge.
+ </column>
+ </table>
+
+ <table name="Pipeline" title="Logical Network Pipeline">
+ <p>
+ Each row in this table represents one logical flow. The cloud management
+ system, via its OVN integration, populates this table with logical flows
+ that implement the L2 and L3 topology specified in the CMS configuration.
+ Each hypervisor, via <code>ovn-controller</code>, translates the logical
+ flows into OpenFlow flows specific to its hypervisor and installs them
+ into Open vSwitch.
+ </p>
+
+ <p>
+ Logical flows are expressed in an OVN-specific format, described here. A
+ logical datapath flow is much like an OpenFlow flow, except that the
+ flows are written in terms of logical ports and logical datapaths instead
+ of physical ports and physical datapaths. Translation between logical
+ and physical flows helps to ensure isolation between logical datapaths.
+ (The logical flow abstraction also allows the CMS to do less work, since
+ it does not have to separately compute and push out physical physical
+ flows to each chassis.)
+ </p>
+
+ <p>
+ The default action when no flow matches is to drop packets.
+ </p>
+
+ <column name="table_id">
+ The stage in the logical pipeline, analogous to an OpenFlow table number.
+ </column>
+
+ <column name="priority">
+ The flow's priority. Flows with numerically higher priority take
+ precedence over those with lower. If two logical datapath flows with the
+ same priority both match, then the one actually applied to the packet is
+ undefined.
+ </column>
+
+ <column name="match">
+ <p>
+ A matching expression. OVN provides a superset of OpenFlow matching
+ capabilities, using a syntax similar to Boolean expressions in a
+ programming language.
+ </p>
+
+ <p>
+ Matching expressions have two important kinds of primary expression:
+ <dfn>fields</dfn> and <dfn>constants</dfn>. A field names a piece of
+ data or metadata. The supported fields are:
+ </p>
+
+ <ul>
+ <li>
+ <code>metadata</code> <code>reg0</code> ... <code>reg7</code>
+ <code>xreg0</code> ... <code>xreg3</code>
+ </li>
+ <li><code>inport</code> <code>outport</code> <code>queue</code></li>
+ <li><code>eth.src</code> <code>eth.dst</code> <code>eth.type</code></li>
+ <li><code>vlan.tci</code> <code>vlan.vid</code> <code>vlan.pcp</code> <code>vlan.present</code></li>
+ <li><code>ip.proto</code> <code>ip.dscp</code> <code>ip.ecn</code> <code>ip.ttl</code> <code>ip.frag</code></li>
+ <li><code>ip4.src</code> <code>ip4.dst</code></li>
+ <li><code>ip6.src</code> <code>ip6.dst</code> <code>ip6.label</code></li>
+ <li><code>arp.op</code> <code>arp.spa</code> <code>arp.tpa</code> <code>arp.sha</code> <code>arp.tha</code></li>
+ <li><code>tcp.src</code> <code>tcp.dst</code> <code>tcp.flags</code></li>
+ <li><code>udp.src</code> <code>udp.dst</code></li>
+ <li><code>sctp.src</code> <code>sctp.dst</code></li>
+ <li><code>icmp4.type</code> <code>icmp4.code</code></li>
+ <li><code>icmp6.type</code> <code>icmp6.code</code></li>
+ <li><code>nd.target</code> <code>nd.sll</code> <code>nd.tll</code></li>
+ </ul>
+
+ <p>
+ Subfields may be addressed using a <code>[]</code> suffix,
+ e.g. <code>tcp.src[0..7]</code> refers to the low 8 bits of the TCP
+ source port. A subfield may be used in any context a field is allowed.
+ </p>
+
+ <p>
+ Some fields have prerequisites. OVN implicitly adds clauses to satisfy
+ these. For example, <code>arp.op == 1</code> is equivalent to
+ <code>eth.type == 0x0806 && arp.op == 1</code>, and
+ <code>tcp.src == 80</code> is equivalent to <code>(eth.type == 0x0800
+ || eth.type == 0x86dd) && ip.proto == 6 && tcp.src ==
+ 80</code>.
+ </p>
+
+ <p>
+ Most fields have integer values. Integer constants may be expressed in
+ several forms: decimal integers, hexadecimal integers prefixed by
+ <code>0x</code>, dotted-quad IPv4 addresses, IPv6 addresses in their
+ standard forms, and as Ethernet addresses as colon-separated hex
+ digits. A constant in any of these forms may be followed by a slash
+ and a second constant (the mask) in the same form, to form a masked
+ constant. IPv4 and IPv6 masks may be given as integers, to express
+ CIDR prefixes.
+ </p>
+
+ <p>
+ The <code>inport</code> and <code>outport</code> fields have string
+ values. The useful values are <ref column="logical_port"/> names from
+ the <ref column="Bindings"/> and <ref column="Gateway"/> table.
+ </p>
+
+ <p>
+ The available operators, from highest to lowest precedence, are:
+ </p>
+
+ <ul>
+ <li><code>()</code></li>
+ <li><code>== != < <= > >= in not in</code></li>
+ <li><code>!</code></li>
+ <li><code>&&</code></li>
+ <li><code>||</code></li>
+ </ul>
+
+ <p>
+ The <code>()</code> operator is used for grouping.
+ </p>
+
+ <p>
+ The equality operator <code>==</code> is the most important operator.
+ Its operands must be a field and an optionally masked constant, in
+ either order. The <code>==</code> operator yields true when the
+ field's value equals the constant's value for all the bits included in
+ the mask. The <code>==</code> operator translates simply and naturally
+ to OpenFlow.
+ </p>
+
+ <p>
+ The inequality operator <code>!=</code> yields the inverse of
+ <code>==</code> but its syntax and use are the same. Implementation of
+ the inequality operator is expensive.
+ </p>
+
+ <p>
+ The relational operators are <, <=, >, and >=. Their
+ operands must be a field and a constant, in either order; the constant
+ must not be masked. These operators are most commonly useful for L4
+ ports, e.g. <code>tcp.src < 1024</code>. Implementation of the
+ relational operators is expensive.
+ </p>
+
+ <p>
+ The set membership operator <code>in</code>, with syntax
+ ``<code><var>field</var> in { <var>constant1</var>,
+ <var>constant2</var>,</code> ... <code>}</code>'', is syntactic sugar
+ for ``<code>(<var>field</var> == <var>constant1</var> ||
+ <var>field</var> == <var>constant2</var> || </code>...<code>)</code>.
+ Conversely, ``<code><var>field</var> not in { <var>constant1</var>,
+ <var>constant2</var>, </code>...<code> }</code>'' is syntactic sugar
+ for ``<code>(<var>field</var> != <var>constant1</var> &&
+ <var>field</var> != <var>constant2</var> &&
+ </code>...<code>)</code>''.
+ </p>
+
+ <p>
+ The unary prefix operator <code>!</code> yields its operand's inverse.
+ </p>
+
+ <p>
+ The logical AND operator <code>&&</code> yields true only if
+ both of its operands are true.
+ </p>
+
+ <p>
+ The logical OR operator <code>||</code> yields true if at least one of
+ its operands is true.
+ </p>
+
+ <p>
+ Finally, the keywords <code>true</code> and <code>false</code> may also
+ be used in matching expressions. <code>true</code> is useful by itself
+ as a catch-all expression that matches every packet.
+ </p>
+
+ <p>
+ (The above is pretty ambitious. It probably makes sense to initially
+ implement only a subset of this specification. The full specification
+ is written out mainly to get an idea of what a fully general matching
+ expression language could include.)
+ </p>
+ </column>
+
+ <column name="actions">
+ <p>
+ Below, a <var>value</var> is either a <var>constant</var> or a
+ <var>field</var>. The following actions seem most likely to be useful:
+ </p>
+
+ <dl>
+ <dt><code>drop;</code></dt>
+ <dd>syntactic sugar for no actions</dd>
+
+ <dt><code>output(<var>value</var>);</code></dt>
+ <dd>output to port</dd>
+
+ <dt><code>broadcast;</code></dt>
+ <dd>output to every logical port except ingress port</dd>
+
+ <dt><code>resubmit;</code></dt>
+ <dd>execute next logical datapath table as subroutine</dd>
+
+ <dt><code>set(<var>field</var>=<var>value</var>);</code></dt>
+ <dd>set data or metadata field, or copy between fields</dd>
+ </dl>
+
+ <p>
+ Following are not well thought out:
+ </p>
+
+ <dl>
+ <dt><code>learn</code></dt>
+
+ <dt><code>conntrack</code></dt>
+
+ <dt><code>with(<var>field</var>=<var>value</var>) { <var>action</var>, </code>...<code> }</code></dt>
+ <dd>execute <var>actions</var> with temporary changes to <var>fields</var></dd>
+
+ <dt><code>dec_ttl { <var>action</var>, </code>...<code> } { <var>action</var>; </code>...<code>}</code></dt>
+ <dd>
+ decrement TTL; execute first set of actions if
+ successful, second set if TTL decrement fails
+ </dd>
+
+ <dt><code>icmp_reply { <var>action</var>, </code>...<code> }</code></dt>
+ <dd>generate ICMP reply from packet, execute <var>action</var>s</dd>
+
+ <dt><code>arp { <var>action</var>, </code>...<code> }</code></dt>
+ <dd>generate ARP from packet, execute <var>action</var>s</dd>
+ </dl>
+
+ <p>
+ Other actions can be added as needed
+ (e.g. <code>push_vlan</code>, <code>pop_vlan</code>,
+ <code>push_mpls</code>, <code>pop_mpls</code>).
+ </p>
+
+ <p>
+ Some of the OVN actions do not map directly to OpenFlow actions, e.g.:
+ </p>
+
+ <ul>
+ <li>
+ <code>with</code>: Implemented as <code>stack_push;
+ set(</code>...<code>); <var>actions</var>; stack_pop</code>.
+ </li>
+
+ <li>
+ <code>dec_ttl</code>: Implemented as <code>dec_ttl</code> followed
+ by the successful actions. The failure case has to be implemented by
+ ovn-controller interpreting packet-ins. It might be difficult to
+ identify the particular place in the processing pipeline in
+ <code>ovn-controller</code>; maybe some restrictions will be
+ necessary.
+ </li>
+
+ <li>
+ <code>icmp_reply</code>: Implemented by sending the packet to
+ <code>ovn-controller</code>, which generates the ICMP reply and sends
+ the packet back to <code>ovs-vswitchd</code>.
+ </li>
+ </ul>
+ </column>
+ </table>
+
+ <table name="Bindings" title="Physical-Logical Bindings">
+ <p>
+ Each row in this table identifies the physical location of a logical
+ port. Each hypervisor, via <code>ovn-controller</code>, populates this
+ table with rows for the logical ports that are located on its hypervisor,
+ which <code>ovn-controller</code> in turn finds out by monitoring the
+ local hypervisor's Open_vSwitch database, which identifies logical ports
+ via the conventions described in <code>IntegrationGuide.md</code>.
+ </p>
+
+ <p>
+ When a chassis shuts down gracefully, it should remove its bindings.
+ (This is not critical because resources hosted on the chassis are equally
+ unreachable regardless of whether their rows are present.) To handle the
+ case where a VM is shut down abruptly on one chassis, then brought up
+ again on a different one, <code>ovn-controller</code> must delete any
+ existing <ref table="Binding"/> record for a logical port when it adds a
+ new one.
+ </p>
+
+ <column name="logical_port">
+ A logical port, taken from <ref key="iface-id" table="Interface"
+ column="external_ids" db="Open_vSwitch"/> in the Open_vSwitch database's
+ <ref table="Interface" db="Open_vSwitch"/> table. OVN does not prescribe
+ a particular format for the logical port ID.
+ </column>
+
+ <column name="chassis">
+ The physical location of the logical port. To successfully identify a
+ chassis, this column must match the <ref table="Chassis" column="name"/>
+ column in some row in the <ref table="Chassis"/> table.
+ </column>
+
+ <column name="mac">
+ <p>
+ The Ethernet address or addresses used as a source address on the
+ logical port, each in the form
+ <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>.
+ The string <code>unknown</code> is also allowed to indicate that the
+ logical port has an unknown set of (additional) source addresses.
+ </p>
+
+ <p>
+ A VM interface would ordinarily have a single Ethernet address. A
+ gateway port might initially only have <code>unknown</code>, and then
+ add MAC addresses to the set as it learns new source addresses.
+ </p>
+ </column>
+ </table>
+</database>