1 <?xml version="1.0" encoding="utf-8"?>
2 <database name="ovn-sb" title="OVN Southbound Database">
4 This database holds logical and physical configuration and state for the
5 Open Virtual Network (OVN) system to support virtual network abstraction.
6 For an introduction to OVN, please see <code>ovn-architecture</code>(7).
10 The OVN Southbound database sits at the center of the OVN
11 architecture. It is the one component that speaks both southbound
12 directly to all the hypervisors and gateways, via
13 <code>ovn-controller</code>/<code>ovn-controller-vtep</code>, and
14 northbound to the Cloud Management System, via <code>ovn-northd</code>:
17 <h2>Database Structure</h2>
20 The OVN Southbound database contains classes of data with
21 different properties, as described in the sections below.
24 <h3>Physical Network (PN) data</h3>
27 PN tables contain information about the chassis nodes in the system. This
28 contains all the information necessary to wire the overlay, such as IP
29 addresses, supported tunnel types, and security keys.
33 The amount of PN data is small (O(n) in the number of chassis) and it
34 changes infrequently, so it can be replicated to every chassis.
38 The <ref table="Chassis"/> table comprises the PN tables.
41 <h3>Logical Network (LN) data</h3>
44 LN tables contain the topology of logical switches and routers, ACLs,
45 firewall rules, and everything needed to describe how packets traverse a
46 logical network, represented as logical datapath flows (see Logical
47 Datapath Flows, below).
51 LN data may be large (O(n) in the number of logical ports, ACL rules,
52 etc.). Thus, to improve scaling, each chassis should receive only data
53 related to logical networks in which that chassis participates. Past
54 experience shows that in the presence of large logical networks, even
55 finer-grained partitioning of data, e.g. designing logical flows so that
56 only the chassis hosting a logical port needs related flows, pays off
57 scale-wise. (This is not necessary initially but it is worth bearing in
62 The LN is a slave of the cloud management system running northbound of OVN.
63 That CMS determines the entire OVN logical configuration and therefore the
64 LN's content at any given time is a deterministic function of the CMS's
65 configuration, although that happens indirectly via the
66 <ref db="OVN_Northbound"/> database and <code>ovn-northd</code>.
70 LN data is likely to change more quickly than PN data. This is especially
71 true in a container environment where VMs are created and destroyed (and
72 therefore added to and deleted from logical switches) quickly.
76 <ref table="Logical_Flow"/> and <ref table="Multicast_Group"/> contain LN
80 <h3>Logical-physical bindings</h3>
83 These tables link logical and physical components. They show the current
84 placement of logical components (such as VMs and VIFs) onto chassis, and
85 map logical entities to the values that represent them in tunnel
90 These tables change frequently, at least every time a VM powers up or down
91 or migrates, and especially quickly in a container environment. The
92 amount of data per VM (or VIF) is small.
96 Each chassis is authoritative about the VMs and VIFs that it hosts at any
97 given time and can efficiently flood that state to a central location, so
98 the consistency needs are minimal.
102 The <ref table="Port_Binding"/> and <ref table="Datapath_Binding"/> tables
103 contain binding data.
106 <h3>MAC bindings</h3>
109 The <ref table="MAC_Binding"/> table tracks the bindings from IP addresses
110 to Ethernet addresses that are dynamically discovered using ARP (for IPv4)
111 and neighbor discovery (for IPv6). Usually, IP-to-MAC bindings for virtual
112 machines are statically populated into the <ref table="Port_Binding"/>
113 table, so <ref table="MAC_Binding"/> is primarily used to discover bindings
114 on physical networks.
117 <h2>Common Columns</h2>
120 Some tables contain a special column named <code>external_ids</code>. This
121 column has the same form and purpose each place that it appears, so we
122 describe it here to save space later.
126 <dt><code>external_ids</code>: map of string-string pairs</dt>
128 Key-value pairs for use by the software that manages the OVN Southbound
129 database rather than by
130 <code>ovn-controller</code>/<code>ovn-controller-vtep</code>. In
131 particular, <code>ovn-northd</code> can use key-value pairs in this
132 column to relate entities in the southbound database to higher-level
133 entities (such as entities in the OVN Northbound database). Individual
134 key-value pairs in this column may be documented in some cases to aid
135 in understanding and troubleshooting, but the reader should not mistake
136 such documentation as comprehensive.
140 <table name="Chassis" title="Physical Network Hypervisor and Gateway Information">
142 Each row in this table represents a hypervisor or gateway (a chassis) in
143 the physical network (PN). Each chassis, via
144 <code>ovn-controller</code>/<code>ovn-controller-vtep</code>, adds
145 and updates its own row, and keeps a copy of the remaining rows to
146 determine how to reach other hypervisors.
150 When a chassis shuts down gracefully, it should remove its own row.
151 (This is not critical because resources hosted on the chassis are equally
152 unreachable regardless of whether the row is present.) If a chassis
153 shuts down permanently without removing its row, some kind of manual or
154 automatic cleanup is eventually needed; we can devise a process for that
159 OVN does not prescribe a particular format for chassis names.
160 ovn-controller populates this column using <ref key="system-id"
161 table="Open_vSwitch" column="external_ids" db="Open_vSwitch"/>
162 in the Open_vSwitch database's <ref table="Open_vSwitch"
163 db="Open_vSwitch"/> table. ovn-controller-vtep populates this
164 column with <ref table="Physical_Switch" column="name"
165 db="hardware_vtep"/> in the hardware_vtep database's
166 <ref table="Physical_Switch" db="hardware_vtep"/> table.
169 <column name="hostname">
170 The hostname of the chassis, if applicable. ovn-controller will populate
171 this column with the hostname of the host it is running on.
172 ovn-controller-vtep will leave this column empty.
175 <group title="Common Columns">
176 The overall purpose of these columns is described under <code>Common
177 Columns</code> at the beginning of this document.
179 <column name="external_ids"/>
182 <group title="Encapsulation Configuration">
184 OVN uses encapsulation to transmit logical dataplane packets
188 <column name="encaps">
189 Points to supported encapsulation configurations to transmit
190 logical dataplane packets to this chassis. Each entry is a <ref
191 table="Encap"/> record that describes the configuration.
195 <group title="Gateway Configuration">
197 A <dfn>gateway</dfn> is a chassis that forwards traffic between the
198 OVN-managed part of a logical network and a physical VLAN, extending a
199 tunnel-based logical network into a physical network. Gateways are
200 typically dedicated nodes that do not host VMs and will be controlled
201 by <code>ovn-controller-vtep</code>.
204 <column name="vtep_logical_switches">
205 Stores all VTEP logical switch names connected by this gateway
206 chassis. The <ref table="Port_Binding"/> table entry with
207 <ref column="options" table="Port_Binding"/>:<code>vtep-physical-switch</code>
208 equal <ref table="Chassis"/> <ref column="name" table="Chassis"/>, and
209 <ref column="options" table="Port_Binding"/>:<code>vtep-logical-switch</code>
210 value in <ref table="Chassis"/>
211 <ref column="vtep_logical_switches" table="Chassis"/>, will be
212 associated with this <ref table="Chassis"/>.
217 <table name="Encap" title="Encapsulation Types">
219 The <ref column="encaps" table="Chassis"/> column in the <ref
220 table="Chassis"/> table refers to rows in this table to identify
221 how OVN may transmit logical dataplane packets to this chassis.
222 Each chassis, via <code>ovn-controller</code>(8) or
223 <code>ovn-controller-vtep</code>(8), adds and updates its own rows
224 and keeps a copy of the remaining rows to determine how to reach
229 The encapsulation to use to transmit packets to this chassis.
230 Hypervisors must use either <code>geneve</code> or
231 <code>stt</code>. Gateways may use <code>vxlan</code>,
232 <code>geneve</code>, or <code>stt</code>.
235 <column name="options">
236 Options for configuring the encapsulation, e.g. IPsec parameters when
237 IPsec support is introduced. No options are currently defined.
241 The IPv4 address of the encapsulation tunnel endpoint.
245 <table name="Logical_Flow" title="Logical Network Flows">
247 Each row in this table represents one logical flow.
248 <code>ovn-northd</code> populates this table with logical flows
249 that implement the L2 and L3 topologies specified in the
250 <ref db="OVN_Northbound"/> database. Each hypervisor, via
251 <code>ovn-controller</code>, translates the logical flows into
252 OpenFlow flows specific to its hypervisor and installs them into
257 Logical flows are expressed in an OVN-specific format, described here. A
258 logical datapath flow is much like an OpenFlow flow, except that the
259 flows are written in terms of logical ports and logical datapaths instead
260 of physical ports and physical datapaths. Translation between logical
261 and physical flows helps to ensure isolation between logical datapaths.
262 (The logical flow abstraction also allows the OVN centralized
263 components to do less work, since they do not have to separately
264 compute and push out physical flows to each chassis.)
268 The default action when no flow matches is to drop packets.
271 <p><em>Architectural Logical Life Cycle of a Packet</em></p>
274 This following description focuses on the life cycle of a packet through
275 a logical datapath, ignoring physical details of the implementation.
276 Please refer to <em>Architectural Physical Life Cycle of a Packet</em> in
277 <code>ovn-architecture</code>(7) for the physical information.
281 The description here is written as if OVN itself executes these steps,
282 but in fact OVN (that is, <code>ovn-controller</code>) programs Open
283 vSwitch, via OpenFlow and OVSDB, to execute them on its behalf.
287 At a high level, OVN passes each packet through the logical datapath's
288 logical ingress pipeline, which may output the packet to one or more
289 logical port or logical multicast groups. For each such logical output
290 port, OVN passes the packet through the datapath's logical egress
291 pipeline, which may either drop the packet or deliver it to the
292 destination. Between the two pipelines, outputs to logical multicast
293 groups are expanded into logical ports, so that the egress pipeline only
294 processes a single logical output port at a time. Between the two
295 pipelines is also where, when necessary, OVN encapsulates a packet in a
296 tunnel (or tunnels) to transmit to remote hypervisors.
300 In more detail, to start, OVN searches the <ref table="Logical_Flow"/>
301 table for a row with correct <ref column="logical_datapath"/>, a <ref
302 column="pipeline"/> of <code>ingress</code>, a <ref column="table_id"/>
303 of 0, and a <ref column="match"/> that is true for the packet. If none
304 is found, OVN drops the packet. If OVN finds more than one, it chooses
305 the match with the highest <ref column="priority"/>. Then OVN executes
306 each of the actions specified in the row's <ref table="actions"/> column,
307 in the order specified. Some actions, such as those to modify packet
308 headers, require no further details. The <code>next</code> and
309 <code>output</code> actions are special.
313 The <code>next</code> action causes the above process to be repeated
314 recursively, except that OVN searches for <ref column="table_id"/> of 1
315 instead of 0. Similarly, any <code>next</code> action in a row found in
316 that table would cause a further search for a <ref column="table_id"/> of
317 2, and so on. When recursive processing completes, flow control returns
318 to the action following <code>next</code>.
322 The <code>output</code> action also introduces recursion. Its effect
323 depends on the current value of the <code>outport</code> field. Suppose
324 <code>outport</code> designates a logical port. First, OVN compares
325 <code>inport</code> to <code>outport</code>; if they are equal, it treats
326 the <code>output</code> as a no-op. In the common case, where they are
327 different, the packet enters the egress pipeline. This transition to the
328 egress pipeline discards register data, e.g. <code>reg0</code> ...
329 <code>reg4</code> and connection tracking state, to achieve
330 uniform behavior regardless of whether the egress pipeline is on a
331 different hypervisor (because registers aren't preserve across
332 tunnel encapsulation).
336 To execute the egress pipeline, OVN again searches the <ref
337 table="Logical_Flow"/> table for a row with correct <ref
338 column="logical_datapath"/>, a <ref column="table_id"/> of 0, a <ref
339 column="match"/> that is true for the packet, but now looking for a <ref
340 column="pipeline"/> of <code>egress</code>. If no matching row is found,
341 the output becomes a no-op. Otherwise, OVN executes the actions for the
342 matching flow (which is chosen from multiple, if necessary, as already
347 In the <code>egress</code> pipeline, the <code>next</code> action acts as
348 already described, except that it, of course, searches for
349 <code>egress</code> flows. The <code>output</code> action, however, now
350 directly outputs the packet to the output port (which is now fixed,
351 because <code>outport</code> is read-only within the egress pipeline).
355 The description earlier assumed that <code>outport</code> referred to a
356 logical port. If it instead designates a logical multicast group, then
357 the description above still applies, with the addition of fan-out from
358 the logical multicast group to each logical port in the group. For each
359 member of the group, OVN executes the logical pipeline as described, with
360 the logical output port replaced by the group member.
363 <p><em>Pipeline Stages</em></p>
366 <code>ovn-northd</code> is responsible for populating the
367 <ref table="Logical_Flow"/> table, so the stages are an
368 implementation detail and subject to change. This section
369 describes the current logical flow table.
373 The ingress pipeline consists of the following stages:
377 Port Security (Table 0): Validates the source address, drops
378 packets with a VLAN tag, and, if configured, verifies that the
379 logical port is allowed to send with the source address.
383 L2 Destination Lookup (Table 1): Forwards known unicast
384 addresses to the appropriate logical port. Unicast packets to
385 unknown hosts are forwarded to logical ports configured with the
386 special <code>unknown</code> mac address. Broadcast, and
387 multicast are flooded to all ports in the logical switch.
392 The egress pipeline consists of the following stages:
396 ACL (Table 0): Applies any specified access control lists.
400 Port Security (Table 1): If configured, verifies that the
401 logical port is allowed to receive packets with the destination
406 <column name="logical_datapath">
407 The logical datapath to which the logical flow belongs.
410 <column name="pipeline">
412 The primary flows used for deciding on a packet's destination are the
413 <code>ingress</code> flows. The <code>egress</code> flows implement
414 ACLs. See <em>Logical Life Cycle of a Packet</em>, above, for details.
418 <column name="table_id">
419 The stage in the logical pipeline, analogous to an OpenFlow table number.
422 <column name="priority">
423 The flow's priority. Flows with numerically higher priority take
424 precedence over those with lower. If two logical datapath flows with the
425 same priority both match, then the one actually applied to the packet is
429 <column name="match">
431 A matching expression. OVN provides a superset of OpenFlow matching
432 capabilities, using a syntax similar to Boolean expressions in a
433 programming language.
437 The most important components of match expression are
438 <dfn>comparisons</dfn> between <dfn>symbols</dfn> and
439 <dfn>constants</dfn>, e.g. <code>ip4.dst == 192.168.0.1</code>,
440 <code>ip.proto == 6</code>, <code>arp.op == 1</code>, <code>eth.type ==
441 0x800</code>. The logical AND operator <code>&&</code> and
442 logical OR operator <code>||</code> can combine comparisons into a
447 Matching expressions also support parentheses for grouping, the logical
448 NOT prefix operator <code>!</code>, and literals <code>0</code> and
449 <code>1</code> to express ``false'' or ``true,'' respectively. The
450 latter is useful by itself as a catch-all expression that matches every
454 <p><em>Symbols</em></p>
457 <em>Type</em>. Symbols have <dfn>integer</dfn> or <dfn>string</dfn>
458 type. Integer symbols have a <dfn>width</dfn> in bits.
462 <em>Kinds</em>. There are three kinds of symbols:
468 <dfn>Fields</dfn>. A field symbol represents a packet header or
469 metadata field. For example, a field
470 named <code>vlan.tci</code> might represent the VLAN TCI field in a
475 A field symbol can have integer or string type. Integer fields can
476 be nominal or ordinal (see <em>Level of Measurement</em>,
483 <dfn>Subfields</dfn>. A subfield represents a subset of bits from
484 a larger field. For example, a field <code>vlan.vid</code> might
485 be defined as an alias for <code>vlan.tci[0..11]</code>. Subfields
486 are provided for syntactic convenience, because it is always
487 possible to instead refer to a subset of bits from a field
492 Only ordinal fields (see <em>Level of Measurement</em>,
493 below) may have subfields. Subfields are always ordinal.
499 <dfn>Predicates</dfn>. A predicate is shorthand for a Boolean
500 expression. Predicates may be used much like 1-bit fields. For
501 example, <code>ip4</code> might expand to <code>eth.type ==
502 0x800</code>. Predicates are provided for syntactic convenience,
503 because it is always possible to instead specify the underlying
508 A predicate whose expansion refers to any nominal field or
509 predicate (see <em>Level of Measurement</em>, below) is nominal;
510 other predicates have Boolean level of measurement.
516 <em>Level of Measurement</em>. See
517 http://en.wikipedia.org/wiki/Level_of_measurement for the statistical
518 concept on which this classification is based. There are three
525 <dfn>Ordinal</dfn>. In statistics, ordinal values can be ordered
526 on a scale. OVN considers a field (or subfield) to be ordinal if
527 its bits can be examined individually. This is true for the
528 OpenFlow fields that OpenFlow or Open vSwitch makes ``maskable.''
532 Any use of a nominal field may specify a single bit or a range of
533 bits, e.g. <code>vlan.tci[13..15]</code> refers to the PCP field
534 within the VLAN TCI, and <code>eth.dst[40]</code> refers to the
535 multicast bit in the Ethernet destination address.
539 OVN supports all the usual arithmetic relations (<code>==</code>,
540 <code>!=</code>, <code><</code>, <code><=</code>,
541 <code>></code>, and <code>>=</code>) on ordinal fields and
542 their subfields, because OVN can implement these in OpenFlow and
543 Open vSwitch as collections of bitwise tests.
549 <dfn>Nominal</dfn>. In statistics, nominal values cannot be
550 usefully compared except for equality. This is true of OpenFlow
551 port numbers, Ethernet types, and IP protocols are examples: all of
552 these are just identifiers assigned arbitrarily with no deeper
553 meaning. In OpenFlow and Open vSwitch, bits in these fields
554 generally aren't individually addressable.
558 OVN only supports arithmetic tests for equality on nominal fields,
559 because OpenFlow and Open vSwitch provide no way for a flow to
560 efficiently implement other comparisons on them. (A test for
561 inequality can be sort of built out of two flows with different
562 priorities, but OVN matching expressions always generate flows with
567 String fields are always nominal.
573 <dfn>Boolean</dfn>. A nominal field that has only two values, 0
574 and 1, is somewhat exceptional, since it is easy to support both
575 equality and inequality tests on such a field: either one can be
576 implemented as a test for 0 or 1.
580 Only predicates (see above) have a Boolean level of measurement.
584 This isn't a standard level of measurement.
590 <em>Prerequisites</em>. Any symbol can have prerequisites, which are
591 additional condition implied by the use of the symbol. For example,
592 For example, <code>icmp4.type</code> symbol might have prerequisite
593 <code>icmp4</code>, which would cause an expression <code>icmp4.type ==
594 0</code> to be interpreted as <code>icmp4.type == 0 &&
595 icmp4</code>, which would in turn expand to <code>icmp4.type == 0
596 && eth.type == 0x800 && ip4.proto == 1</code> (assuming
597 <code>icmp4</code> is a predicate defined as suggested under
598 <em>Types</em> above).
601 <p><em>Relational operators</em></p>
604 All of the standard relational operators <code>==</code>,
605 <code>!=</code>, <code><</code>, <code><=</code>,
606 <code>></code>, and <code>>=</code> are supported. Nominal
607 fields support only <code>==</code> and <code>!=</code>, and only in a
608 positive sense when outer <code>!</code> are taken into account,
609 e.g. given string field <code>inport</code>, <code>inport ==
610 "eth0"</code> and <code>!(inport != "eth0")</code> are acceptable, but
611 not <code>inport != "eth0"</code>.
615 The implementation of <code>==</code> (or <code>!=</code> when it is
616 negated), is more efficient than that of the other relational
620 <p><em>Constants</em></p>
623 Integer constants may be expressed in decimal, hexadecimal prefixed by
624 <code>0x</code>, or as dotted-quad IPv4 addresses, IPv6 addresses in
625 their standard forms, or Ethernet addresses as colon-separated hex
626 digits. A constant in any of these forms may be followed by a slash
627 and a second constant (the mask) in the same form, to form a masked
628 constant. IPv4 and IPv6 masks may be given as integers, to express
633 String constants have the same syntax as quoted strings in JSON (thus,
634 they are Unicode strings).
638 Some operators support sets of constants written inside curly braces
639 <code>{</code> ... <code>}</code>. Commas between elements of a set,
640 and after the last elements, are optional. With <code>==</code>,
641 ``<code><var>field</var> == { <var>constant1</var>,
642 <var>constant2</var>,</code> ... <code>}</code>'' is syntactic sugar
643 for ``<code><var>field</var> == <var>constant1</var> ||
644 <var>field</var> == <var>constant2</var> || </code>...<code></code>.
645 Similarly, ``<code><var>field</var> != { <var>constant1</var>,
646 <var>constant2</var>, </code>...<code> }</code>'' is equivalent to
647 ``<code><var>field</var> != <var>constant1</var> &&
648 <var>field</var> != <var>constant2</var> &&
649 </code>...<code></code>''.
652 <p><em>Miscellaneous</em></p>
655 Comparisons may name the symbol or the constant first,
656 e.g. <code>tcp.src == 80</code> and <code>80 == tcp.src</code> are both
661 Tests for a range may be expressed using a syntax like <code>1024 <=
662 tcp.src <= 49151</code>, which is equivalent to <code>1024 <=
663 tcp.src && tcp.src <= 49151</code>.
667 For a one-bit field or predicate, a mention of its name is equivalent
668 to <code><var>symobl</var> == 1</code>, e.g. <code>vlan.present</code>
669 is equivalent to <code>vlan.present == 1</code>. The same is true for
670 one-bit subfields, e.g. <code>vlan.tci[12]</code>. There is no
671 technical limitation to implementing the same for ordinal fields of all
672 widths, but the implementation is expensive enough that the syntax
673 parser requires writing an explicit comparison against zero to make
674 mistakes less likely, e.g. in <code>tcp.src != 0</code> the comparison
675 against 0 is required.
679 <em>Operator precedence</em> is as shown below, from highest to lowest.
680 There are two exceptions where parentheses are required even though the
681 table would suggest that they are not: <code>&&</code> and
682 <code>||</code> require parentheses when used together, and
683 <code>!</code> requires parentheses when applied to a relational
684 expression. Thus, in <code>(eth.type == 0x800 || eth.type == 0x86dd)
685 && ip.proto == 6</code> or <code>!(arp.op == 1)</code>, the
686 parentheses are mandatory.
690 <li><code>()</code></li>
691 <li><code>== != < <= > >=</code></li>
692 <li><code>!</code></li>
693 <li><code>&& ||</code></li>
697 <em>Comments</em> may be introduced by <code>//</code>, which extends
698 to the next new-line. Comments within a line may be bracketed by
699 <code>/*</code> and <code>*/</code>. Multiline comments are not
703 <p><em>Symbols</em></p>
706 Most of the symbols below have integer type. Only <code>inport</code>
707 and <code>outport</code> have string type. <code>inport</code> names a
708 logical port. Thus, its value is a <ref column="logical_port"/> name
709 from the <ref table="Port_Binding"/> table. <code>outport</code> may
710 name a logical port, as <code>inport</code>, or a logical multicast
711 group defined in the <ref table="Multicast_Group"/> table. For both
712 symbols, only names within the flow's logical datapath may be used.
716 <li><code>reg0</code>...<code>reg4</code></li>
717 <li><code>inport</code> <code>outport</code></li>
718 <li><code>eth.src</code> <code>eth.dst</code> <code>eth.type</code></li>
719 <li><code>vlan.tci</code> <code>vlan.vid</code> <code>vlan.pcp</code> <code>vlan.present</code></li>
720 <li><code>ip.proto</code> <code>ip.dscp</code> <code>ip.ecn</code> <code>ip.ttl</code> <code>ip.frag</code></li>
721 <li><code>ip4.src</code> <code>ip4.dst</code></li>
722 <li><code>ip6.src</code> <code>ip6.dst</code> <code>ip6.label</code></li>
723 <li><code>arp.op</code> <code>arp.spa</code> <code>arp.tpa</code> <code>arp.sha</code> <code>arp.tha</code></li>
724 <li><code>tcp.src</code> <code>tcp.dst</code> <code>tcp.flags</code></li>
725 <li><code>udp.src</code> <code>udp.dst</code></li>
726 <li><code>sctp.src</code> <code>sctp.dst</code></li>
727 <li><code>icmp4.type</code> <code>icmp4.code</code></li>
728 <li><code>icmp6.type</code> <code>icmp6.code</code></li>
729 <li><code>nd.target</code> <code>nd.sll</code> <code>nd.tll</code></li>
730 <li><code>ct_mark</code> <code>ct_label</code></li>
733 <code>ct_state</code>, which has the following Boolean subfields:
736 <li><code>ct.new</code>: True for a new flow</li>
737 <li><code>ct.est</code>: True for an established flow</li>
738 <li><code>ct.rel</code>: True for a related flow</li>
739 <li><code>ct.rpl</code>: True for a reply flow</li>
740 <li><code>ct.inv</code>: True for a connection entry in a bad state</li>
743 <code>ct_state</code> and its subfields are initialized by the
744 <code>ct_next</code> action, described below.
750 The following predicates are supported:
754 <li><code>eth.bcast</code> expands to <code>eth.dst == ff:ff:ff:ff:ff:ff</code></li>
755 <li><code>eth.mcast</code> expands to <code>eth.dst[40]</code></li>
756 <li><code>vlan.present</code> expands to <code>vlan.tci[12]</code></li>
757 <li><code>ip4</code> expands to <code>eth.type == 0x800</code></li>
758 <li><code>ip4.mcast</code> expands to <code>ip4.dst[28..31] == 0xe</code></li>
759 <li><code>ip6</code> expands to <code>eth.type == 0x86dd</code></li>
760 <li><code>ip</code> expands to <code>ip4 || ip6</code></li>
761 <li><code>icmp4</code> expands to <code>ip4 && ip.proto == 1</code></li>
762 <li><code>icmp6</code> expands to <code>ip6 && ip.proto == 58</code></li>
763 <li><code>icmp</code> expands to <code>icmp4 || icmp6</code></li>
764 <li><code>ip.is_frag</code> expands to <code>ip.frag[0]</code></li>
765 <li><code>ip.later_frag</code> expands to <code>ip.frag[1]</code></li>
766 <li><code>ip.first_frag</code> expands to <code>ip.is_frag && !ip.later_frag</code></li>
767 <li><code>arp</code> expands to <code>eth.type == 0x806</code></li>
768 <li><code>nd</code> expands to <code>icmp6.type == {135, 136} && icmp6.code == 0</code></li>
769 <li><code>tcp</code> expands to <code>ip.proto == 6</code></li>
770 <li><code>udp</code> expands to <code>ip.proto == 17</code></li>
771 <li><code>sctp</code> expands to <code>ip.proto == 132</code></li>
775 <column name="actions">
777 Logical datapath actions, to be executed when the logical flow
778 represented by this row is the highest-priority match.
782 Actions share lexical syntax with the <ref column="match"/> column. An
783 empty set of actions (or one that contains just white space or
784 comments), or a set of actions that consists of just
785 <code>drop;</code>, causes the matched packets to be dropped.
786 Otherwise, the column should contain a sequence of actions, each
787 terminated by a semicolon.
791 The following actions are defined:
795 <dt><code>output;</code></dt>
798 In the ingress pipeline, this action executes the
799 <code>egress</code> pipeline as a subroutine. If
800 <code>outport</code> names a logical port, the egress pipeline
801 executes once; if it is a multicast group, the egress pipeline runs
802 once for each logical port in the group.
806 In the egress pipeline, this action performs the actual
807 output to the <code>outport</code> logical port. (In the egress
808 pipeline, <code>outport</code> never names a multicast group.)
812 Output to the input port is implicitly dropped, that is,
813 <code>output</code> becomes a no-op if <code>outport</code> ==
814 <code>inport</code>. Occasionally it may be useful to override
815 this behavior, e.g. to send an ARP reply to an ARP request; to do
816 so, use <code>inport = "";</code> to set the logical input port to
817 an empty string (which should not be used as the name of any
822 <dt><code>next;</code></dt>
823 <dt><code>next(<var>table</var>);</code></dt>
825 Executes another logical datapath table as a subroutine. By default,
826 the table after the current one is executed. Specify
827 <var>table</var> to jump to a specific table in the same pipeline.
830 <dt><code><var>field</var> = <var>constant</var>;</code></dt>
833 Sets data or metadata field <var>field</var> to constant value
834 <var>constant</var>, e.g. <code>outport = "vif0";</code> to set the
835 logical output port. To set only a subset of bits in a field,
836 specify a subfield for <var>field</var> or a masked
837 <var>constant</var>, e.g. one may use <code>vlan.pcp[2] = 1;</code>
838 or <code>vlan.pcp = 4/4;</code> to set the most sigificant bit of
843 Assigning to a field with prerequisites implicitly adds those
844 prerequisites to <ref column="match"/>; thus, for example, a flow
845 that sets <code>tcp.dst</code> applies only to TCP flows,
846 regardless of whether its <ref column="match"/> mentions any TCP
851 Not all fields are modifiable (e.g. <code>eth.type</code> and
852 <code>ip.proto</code> are read-only), and not all modifiable fields
853 may be partially modified (e.g. <code>ip.ttl</code> must assigned
854 as a whole). The <code>outport</code> field is modifiable in the
855 <code>ingress</code> pipeline but not in the <code>egress</code>
860 <dt><code><var>field1</var> = <var>field2</var>;</code></dt>
863 Sets data or metadata field <var>field1</var> to the value of data
864 or metadata field <var>field2</var>, e.g. <code>reg0 =
865 ip4.src;</code> copies <code>ip4.src</code> into <code>reg0</code>.
866 To modify only a subset of a field's bits, specify a subfield for
867 <var>field1</var> or <var>field2</var> or both, e.g. <code>vlan.pcp
868 = reg0[0..2];</code> copies the least-significant bits of
869 <code>reg0</code> into the VLAN PCP.
873 <var>field1</var> and <var>field2</var> must be the same type,
874 either both string or both integer fields. If they are both
875 integer fields, they must have the same width.
879 If <var>field1</var> or <var>field2</var> has prerequisites, they
880 are added implicitly to <ref column="match"/>. It is possible to
881 write an assignment with contradictory prerequisites, such as
882 <code>ip4.src = ip6.src[0..31];</code>, but the contradiction means
883 that a logical flow with such an assignment will never be matched.
887 <dt><code><var>field1</var> <-> <var>field2</var>;</code></dt>
890 Similar to <code><var>field1</var> = <var>field2</var>;</code>
891 except that the two values are exchanged instead of copied. Both
892 <var>field1</var> and <var>field2</var> must modifiable.
896 <dt><code>ip.ttl--;</code></dt>
899 Decrements the IPv4 or IPv6 TTL. If this would make the TTL zero
900 or negative, then processing of the packet halts; no further
901 actions are processed. (To properly handle such cases, a
902 higher-priority flow should match on
903 <code>ip.ttl == {0, 1};</code>.)
906 <p><b>Prerequisite:</b> <code>ip</code></p>
909 <dt><code>ct_next;</code></dt>
912 Apply connection tracking to the flow, initializing
913 <code>ct_state</code> for matching in later tables.
914 Automatically moves on to the next table, as if followed by
919 As a side effect, IP fragments will be reassembled for matching.
920 If a fragmented packet is output, then it will be sent with any
921 overlapping fragments squashed. The connection tracking state is
922 scoped by the logical port, so overlapping addresses may be used.
923 To allow traffic related to the matched flow, execute
924 <code>ct_commit</code>.
928 It is possible to have actions follow <code>ct_next</code>,
929 but they will not have access to any of its side-effects and
930 is not generally useful.
934 <dt><code>ct_commit;</code></dt>
936 Commit the flow to the connection tracking entry associated
937 with it by a previous call to <code>ct_next</code>.
940 <dt><code>arp { <var>action</var>; </code>...<code> };</code></dt>
943 Temporarily replaces the IPv4 packet being processed by an ARP
944 packet and executes each nested <var>action</var> on the ARP
945 packet. Actions following the <var>arp</var> action, if any, apply
946 to the original, unmodified packet.
950 The ARP packet that this action operates on is initialized based on
951 the IPv4 packet being processed, as follows. These are default
952 values that the nested actions will probably want to change:
956 <li><code>eth.src</code> unchanged</li>
957 <li><code>eth.dst</code> unchanged</li>
958 <li><code>eth.type = 0x0806</code></li>
959 <li><code>arp.op = 1</code> (ARP request)</li>
960 <li><code>arp.sha</code> copied from <code>eth.src</code></li>
961 <li><code>arp.spa</code> copied from <code>ip4.src</code></li>
962 <li><code>arp.tha = 00:00:00:00:00:00</code></li>
963 <li><code>arp.tpa</code> copied from <code>ip4.dst</code></li>
967 The ARP packet has the same VLAN header, if any, as the IP packet
971 <p><b>Prerequisite:</b> <code>ip4</code></p>
974 <dt><code>get_arp(<var>P</var>, <var>A</var>);</code></dt>
978 <b>Parameters</b>: logical port string field <var>P</var>, 32-bit
979 IP address field <var>A</var>.
983 Looks up <var>A</var> in <var>P</var>'s ARP table. If an entry is
984 found, stores its Ethernet address in <code>eth.dst</code>,
985 otherwise stores <code>00:00:00:00:00:00</code> in
986 <code>eth.dst</code>.
989 <p><b>Example:</b> <code>get_arp(outport, ip4.dst);</code></p>
993 <code>put_arp(<var>P</var>, <var>A</var>, <var>E</var>);</code>
998 <b>Parameters</b>: logical port string field <var>P</var>, 32-bit
999 IP address field <var>A</var>, 48-bit Ethernet address field
1004 Adds or updates the entry for IP address <var>A</var> in logical
1005 port <var>P</var>'s ARP table, setting its Ethernet address to
1009 <p><b>Example:</b> <code>put_arp(inport, arp.spa, arp.sha);</code></p>
1014 The following actions will likely be useful later, but they have not
1015 been thought out carefully.
1019 <dt><code>icmp4 { <var>action</var>; </code>...<code> };</code></dt>
1022 Temporarily replaces the IPv4 packet being processed by an ICMPv4
1023 packet and executes each nested <var>action</var> on the ICMPv4
1024 packet. Actions following the <var>icmp4</var> action, if any,
1025 apply to the original, unmodified packet.
1029 The ICMPv4 packet that this action operates on is initialized based
1030 on the IPv4 packet being processed, as follows. These are default
1031 values that the nested actions will probably want to change.
1032 Ethernet and IPv4 fields not listed here are not changed:
1036 <li><code>ip.proto = 1</code> (ICMPv4)</li>
1037 <li><code>ip.frag = 0</code> (not a fragment)</li>
1038 <li><code>icmp4.type = 3</code> (destination unreachable)</li>
1039 <li><code>icmp4.code = 1</code> (host unreachable)</li>
1046 <p><b>Prerequisite:</b> <code>ip4</code></p>
1049 <dt><code>tcp_reset;</code></dt>
1052 This action transforms the current TCP packet according to the
1053 following pseudocode:
1060 tcp.ack = tcp.seq + length(tcp.payload);
1067 Then, the action drops all TCP options and payload data, and
1068 updates the TCP checksum.
1075 <p><b>Prerequisite:</b> <code>tcp</code></p>
1080 <column name="external_ids" key="stage-name">
1081 Human-readable name for this flow's stage in the pipeline.
1084 <group title="Common Columns">
1085 The overall purpose of these columns is described under <code>Common
1086 Columns</code> at the beginning of this document.
1088 <column name="external_ids"/>
1092 <table name="Multicast_Group" title="Logical Port Multicast Groups">
1094 The rows in this table define multicast groups of logical ports.
1095 Multicast groups allow a single packet transmitted over a tunnel to a
1096 hypervisor to be delivered to multiple VMs on that hypervisor, which
1097 uses bandwidth more efficiently.
1101 Each row in this table defines a logical multicast group numbered <ref
1102 column="tunnel_key"/> within <ref column="datapath"/>, whose logical
1103 ports are listed in the <ref column="ports"/> column.
1106 <column name="datapath">
1107 The logical datapath in which the multicast group resides.
1110 <column name="tunnel_key">
1111 The value used to designate this logical egress port in tunnel
1112 encapsulations. An index forces the key to be unique within the <ref
1113 column="datapath"/>. The unusual range ensures that multicast group IDs
1114 do not overlap with logical port IDs.
1117 <column name="name">
1119 The logical multicast group's name. An index forces the name to be
1120 unique within the <ref column="datapath"/>. Logical flows in the
1121 ingress pipeline may output to the group just as for individual logical
1122 ports, by assigning the group's name to <code>outport</code> and
1123 executing an <code>output</code> action.
1127 Multicast group names and logical port names share a single namespace
1128 and thus should not overlap (but the database schema cannot enforce
1129 this). To try to avoid conflicts, <code>ovn-northd</code> uses names
1130 that begin with <code>_MC_</code>.
1134 <column name="ports">
1135 The logical ports included in the multicast group. All of these ports
1136 must be in the <ref column="datapath"/> logical datapath (but the
1137 database schema cannot enforce this).
1141 <table name="Datapath_Binding" title="Physical-Logical Datapath Bindings">
1143 Each row in this table identifies physical bindings of a logical
1144 datapath. A logical datapath implements a logical pipeline among the
1145 ports in the <ref table="Port_Binding"/> table associated with it. In
1146 practice, the pipeline in a given logical datapath implements either a
1147 logical switch or a logical router.
1150 <column name="tunnel_key">
1151 The tunnel key value to which the logical datapath is bound.
1152 The <code>Tunnel Encapsulation</code> section in
1153 <code>ovn-architecture</code>(7) describes how tunnel keys are
1154 constructed for each supported encapsulation.
1157 <group title="OVN_Northbound Relationship">
1159 Each row in <ref table="Datapath_Binding"/> is associated with some
1160 logical datapath. <code>ovn-northd</code> uses these keys to track the
1161 association of a logical datapath with concepts in the <ref
1162 db="OVN_Northbound"/> database.
1165 <column name="external_ids" key="logical-switch" type='{"type": "uuid"}'>
1166 For a logical datapath that represents a logical switch,
1167 <code>ovn-northd</code> stores in this key the UUID of the
1168 corresponding <ref table="Logical_Switch" db="OVN_Northbound"/> row in
1169 the <ref db="OVN_Northbound"/> database.
1172 <column name="external_ids" key="logical-router" type='{"type": "uuid"}'>
1173 For a logical datapath that represents a logical router,
1174 <code>ovn-northd</code> stores in this key the UUID of the
1175 corresponding <ref table="Logical_Router" db="OVN_Northbound"/> row in
1176 the <ref db="OVN_Northbound"/> database.
1180 <group title="Common Columns">
1181 The overall purpose of these columns is described under <code>Common
1182 Columns</code> at the beginning of this document.
1184 <column name="external_ids"/>
1188 <table name="Port_Binding" title="Physical-Logical Port Bindings">
1190 Most rows in this table identify the physical location of a logical port.
1191 (The exceptions are logical patch ports, which do not have any physical
1196 For every <code>Logical_Port</code> record in <code>OVN_Northbound</code>
1197 database, <code>ovn-northd</code> creates a record in this table.
1198 <code>ovn-northd</code> populates and maintains every column except
1199 the <code>chassis</code> column, which it leaves empty in new records.
1203 <code>ovn-controller</code>/<code>ovn-controller-vtep</code>
1204 populates the <code>chassis</code> column for the records that
1205 identify the logical ports that are located on its hypervisor/gateway,
1206 which <code>ovn-controller</code>/<code>ovn-controller-vtep</code> in
1207 turn finds out by monitoring the local hypervisor's Open_vSwitch
1208 database, which identifies logical ports via the conventions described
1209 in <code>IntegrationGuide.md</code>.
1213 When a chassis shuts down gracefully, it should clean up the
1214 <code>chassis</code> column that it previously had populated.
1215 (This is not critical because resources hosted on the chassis are equally
1216 unreachable regardless of whether their rows are present.) To handle the
1217 case where a VM is shut down abruptly on one chassis, then brought up
1218 again on a different one,
1219 <code>ovn-controller</code>/<code>ovn-controller-vtep</code> must
1220 overwrite the <code>chassis</code> column with new information.
1223 <group title="Core Features">
1224 <column name="datapath">
1225 The logical datapath to which the logical port belongs.
1228 <column name="logical_port">
1229 A logical port, taken from <ref table="Logical_Port" column="name"
1230 db="OVN_Northbound"/> in the OVN_Northbound database's <ref
1231 table="Logical_Port" db="OVN_Northbound"/> table. OVN does not
1232 prescribe a particular format for the logical port ID.
1235 <column name="chassis">
1236 The physical location of the logical port. To successfully identify a
1237 chassis, this column must be a <ref table="Chassis"/> record. This is
1239 <code>ovn-controller</code>/<code>ovn-controller-vtep</code>.
1242 <column name="tunnel_key">
1244 A number that represents the logical port in the key (e.g. STT key or
1245 Geneve TLV) field carried within tunnel protocol packets.
1249 The tunnel ID must be unique within the scope of a logical datapath.
1255 The Ethernet address or addresses used as a source address on the
1256 logical port, each in the form
1257 <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>.
1258 The string <code>unknown</code> is also allowed to indicate that the
1259 logical port has an unknown set of (additional) source addresses.
1263 A VM interface would ordinarily have a single Ethernet address. A
1264 gateway port might initially only have <code>unknown</code>, and then
1265 add MAC addresses to the set as it learns new source addresses.
1269 <column name="type">
1271 A type for this logical port. Logical ports can be used to model other
1272 types of connectivity into an OVN logical switch. The following types
1277 <dt>(empty string)</dt>
1278 <dd>VM (or VIF) interface.</dd>
1280 <dt><code>patch</code></dt>
1282 One of a pair of logical ports that act as if connected by a patch
1283 cable. Useful for connecting two logical datapaths, e.g. to connect
1284 a logical router to a logical switch or to another logical router.
1287 <dt><code>localnet</code></dt>
1289 A connection to a locally accessible network from each
1290 <code>ovn-controller</code> instance. A logical switch can only
1291 have a single <code>localnet</code> port attached. This is used
1292 to model direct connectivity to an existing network.
1295 <dt><code>vtep</code></dt>
1297 A port to a logical switch on a VTEP gateway chassis. In order to
1298 get this port correctly recognized by the OVN controller, the <ref
1300 table="Port_Binding"/>:<code>vtep-physical-switch</code> and <ref
1302 table="Port_Binding"/>:<code>vtep-logical-switch</code> must also
1309 <group title="Patch Options">
1311 These options apply to logical ports with <ref column="type"/> of
1315 <column name="options" key="peer">
1316 The <ref column="logical_port"/> in the <ref table="Port_Binding"/>
1317 record for the other side of the patch. The named <ref
1318 column="logical_port"/> must specify this <ref column="logical_port"/>
1319 in its own <code>peer</code> option. That is, the two patch logical
1320 ports must have reversed <ref column="logical_port"/> and
1321 <code>peer</code> values.
1325 <group title="Localnet Options">
1327 These options apply to logical ports with <ref column="type"/> of
1328 <code>localnet</code>.
1331 <column name="options" key="network_name">
1332 Required. <code>ovn-controller</code> uses the configuration entry
1333 <code>ovn-bridge-mappings</code> to determine how to connect to this
1334 network. <code>ovn-bridge-mappings</code> is a list of network names
1335 mapped to a local OVS bridge that provides access to that network. An
1336 example of configuring <code>ovn-bridge-mappings</code> would be:
1338 <pre>$ ovs-vsctl set open . external-ids:ovn-bridge-mappings=physnet1:br-eth0,physnet2:br-eth1</pre>
1341 When a logical switch has a <code>localnet</code> port attached,
1342 every chassis that may have a local vif attached to that logical
1343 switch must have a bridge mapping configured to reach that
1344 <code>localnet</code>. Traffic that arrives on a
1345 <code>localnet</code> port is never forwarded over a tunnel to
1351 If set, indicates that the port represents a connection to a specific
1352 VLAN on a locally accessible network. The VLAN ID is used to match
1353 incoming traffic and is also added to outgoing traffic.
1357 <group title="VTEP Options">
1359 These options apply to logical ports with <ref column="type"/> of
1363 <column name="options" key="vtep-physical-switch">
1364 Required. The name of the VTEP gateway.
1367 <column name="options" key="vtep-logical-switch">
1368 Required. A logical switch name connected by the VTEP gateway. Must
1369 be set when <ref column="type"/> is <code>vtep</code>.
1373 <group title="VMI (or VIF) Options">
1375 These options apply to logical ports with <ref column="type"/> having
1379 <column name="options" key="policing_rate">
1380 If set, indicates the maximum rate for data sent from this interface,
1381 in kbps. Data exceeding this rate is dropped.
1384 <column name="options" key="policing_burst">
1385 If set, indicates the maximum burst size for data sent from this
1390 <group title="Nested Containers">
1392 These columns support containers nested within a VM. Specifically,
1393 they are used when <ref column="type"/> is empty and <ref
1394 column="logical_port"/> identifies the interface of a container spawned
1395 inside a VM. They are empty for containers or VMs that run directly on
1399 <column name="parent_port">
1401 <ref table="Logical_Port" column="parent_name" db="OVN_Northbound"/>
1402 in the OVN_Northbound database's <ref table="Logical_Port"
1403 db="OVN_Northbound"/> table.
1408 Identifies the VLAN tag in the network traffic associated with that
1409 container's network interface.
1413 This column is used for a different purpose when <ref column="type"/>
1414 is <code>localnet</code> (see <code>Localnet Options</code>, above).
1420 <table name="MAC_Binding" title="IP to MAC bindings">
1422 Each row in this table specifies a binding from an IP address to an
1423 Ethernet address that has been discovered through ARP (for IPv4) or
1424 neighbor discovery (for IPv6). This table is primarily used to discover
1425 bindings on physical networks, because IP-to-MAC bindings for virtual
1426 machines are usually populated statically into the <ref
1427 table="Port_Binding"/> table.
1431 This table expresses a functional relationship: <ref
1432 table="MAC_Binding"/>(<ref column="logical_port"/>, <ref column="ip"/>) =
1433 <ref column="mac"/>.
1437 In outline, the lifetime of a logical router's MAC binding looks like
1443 On hypervisor 1, a logical router determines that a packet should be
1444 forwarded to IP address <var>A</var> on one of its router ports. It
1445 uses its logical flow table to determine that <var>A</var> lacks a
1446 static IP-to-MAC binding and the <code>get_arp</code> action to
1447 determine that it lacks a dynamic IP-to-MAC binding.
1451 Using an OVN logical <code>arp</code> action, the logical router
1452 generates and sends a broadcast ARP request to the router port. It
1453 drops the IP packet.
1457 The logical switch attached to the router port delivers the ARP request
1458 to all of its ports. (It might make sense to deliver it only to ports
1459 that have no static IP-to-MAC bindings, but this could also be
1460 surprising behavior.)
1464 A host or VM on hypervisor 2 (which might be the same as hypervisor 1)
1465 attached to the logical switch owns the IP address in question. It
1466 composes an ARP reply and unicasts it to the logical router port's
1471 The logical switch delivers the ARP reply to the logical router port.
1475 The logical router flow table executes a <code>put_arp</code> action.
1476 To record the IP-to-MAC binding, <code>ovn-controller</code> adds a row
1477 to the <ref table="MAC_Binding"/> table.
1481 On hypervisor 1, <code>ovn-controller</code> receives the updated <ref
1482 table="MAC_Binding"/> table from the OVN southbound database. The next
1483 packet destined to <var>A</var> through the logical router is sent
1484 directly to the bound Ethernet address.
1488 <column name="logical_port">
1489 The logical port on which the binding was discovered.
1493 The bound IP address.
1497 The Ethernet address to which the IP is bound.