1 <?xml version="1.0" encoding="utf-8"?>
2 <database name="ovn-sb" title="OVN Southbound Database">
4 This database holds logical and physical configuration and state for the
5 Open Virtual Network (OVN) system to support virtual network abstraction.
6 For an introduction to OVN, please see <code>ovn-architecture</code>(7).
10 The OVN Southbound database sits at the center of the OVN
11 architecture. It is the one component that speaks both southbound
12 directly to all the hypervisors and gateways, via
13 <code>ovn-controller</code>/<code>ovn-controller-vtep</code>, and
14 northbound to the Cloud Management System, via <code>ovn-northd</code>:
17 <h2>Database Structure</h2>
20 The OVN Southbound database contains classes of data with
21 different properties, as described in the sections below.
24 <h3>Physical Network (PN) data</h3>
27 PN tables contain information about the chassis nodes in the system. This
28 contains all the information necessary to wire the overlay, such as IP
29 addresses, supported tunnel types, and security keys.
33 The amount of PN data is small (O(n) in the number of chassis) and it
34 changes infrequently, so it can be replicated to every chassis.
38 The <ref table="Chassis"/> table comprises the PN tables.
41 <h3>Logical Network (LN) data</h3>
44 LN tables contain the topology of logical switches and routers, ACLs,
45 firewall rules, and everything needed to describe how packets traverse a
46 logical network, represented as logical datapath flows (see Logical
47 Datapath Flows, below).
51 LN data may be large (O(n) in the number of logical ports, ACL rules,
52 etc.). Thus, to improve scaling, each chassis should receive only data
53 related to logical networks in which that chassis participates. Past
54 experience shows that in the presence of large logical networks, even
55 finer-grained partitioning of data, e.g. designing logical flows so that
56 only the chassis hosting a logical port needs related flows, pays off
57 scale-wise. (This is not necessary initially but it is worth bearing in
62 The LN is a slave of the cloud management system running northbound of OVN.
63 That CMS determines the entire OVN logical configuration and therefore the
64 LN's content at any given time is a deterministic function of the CMS's
65 configuration, although that happens indirectly via the
66 <ref db="OVN_Northbound"/> database and <code>ovn-northd</code>.
70 LN data is likely to change more quickly than PN data. This is especially
71 true in a container environment where VMs are created and destroyed (and
72 therefore added to and deleted from logical switches) quickly.
76 <ref table="Logical_Flow"/> and <ref table="Multicast_Group"/> contain LN
80 <h3>Logical-physical bindings</h3>
83 These tables link logical and physical components. They show the current
84 placement of logical components (such as VMs and VIFs) onto chassis, and
85 map logical entities to the values that represent them in tunnel
90 These tables change frequently, at least every time a VM powers up or down
91 or migrates, and especially quickly in a container environment. The
92 amount of data per VM (or VIF) is small.
96 Each chassis is authoritative about the VMs and VIFs that it hosts at any
97 given time and can efficiently flood that state to a central location, so
98 the consistency needs are minimal.
102 The <ref table="Port_Binding"/> and <ref table="Datapath_Binding"/> tables
103 contain binding data.
106 <h3>MAC bindings</h3>
109 The <ref table="MAC_Binding"/> table tracks the bindings from IP addresses
110 to Ethernet addresses that are dynamically discovered using ARP (for IPv4)
111 and neighbor discovery (for IPv6). Usually, IP-to-MAC bindings for virtual
112 machines are statically populated into the <ref table="Port_Binding"/>
113 table, so <ref table="MAC_Binding"/> is primarily used to discover bindings
114 on physical networks.
117 <h2>Common Columns</h2>
120 Some tables contain a special column named <code>external_ids</code>. This
121 column has the same form and purpose each place that it appears, so we
122 describe it here to save space later.
126 <dt><code>external_ids</code>: map of string-string pairs</dt>
128 Key-value pairs for use by the software that manages the OVN Southbound
129 database rather than by
130 <code>ovn-controller</code>/<code>ovn-controller-vtep</code>. In
131 particular, <code>ovn-northd</code> can use key-value pairs in this
132 column to relate entities in the southbound database to higher-level
133 entities (such as entities in the OVN Northbound database). Individual
134 key-value pairs in this column may be documented in some cases to aid
135 in understanding and troubleshooting, but the reader should not mistake
136 such documentation as comprehensive.
140 <table name="Chassis" title="Physical Network Hypervisor and Gateway Information">
142 Each row in this table represents a hypervisor or gateway (a chassis) in
143 the physical network (PN). Each chassis, via
144 <code>ovn-controller</code>/<code>ovn-controller-vtep</code>, adds
145 and updates its own row, and keeps a copy of the remaining rows to
146 determine how to reach other hypervisors.
150 When a chassis shuts down gracefully, it should remove its own row.
151 (This is not critical because resources hosted on the chassis are equally
152 unreachable regardless of whether the row is present.) If a chassis
153 shuts down permanently without removing its row, some kind of manual or
154 automatic cleanup is eventually needed; we can devise a process for that
159 OVN does not prescribe a particular format for chassis names.
160 ovn-controller populates this column using <ref key="system-id"
161 table="Open_vSwitch" column="external_ids" db="Open_vSwitch"/>
162 in the Open_vSwitch database's <ref table="Open_vSwitch"
163 db="Open_vSwitch"/> table. ovn-controller-vtep populates this
164 column with <ref table="Physical_Switch" column="name"
165 db="hardware_vtep"/> in the hardware_vtep database's
166 <ref table="Physical_Switch" db="hardware_vtep"/> table.
169 <column name="hostname">
170 The hostname of the chassis, if applicable. ovn-controller will populate
171 this column with the hostname of the host it is running on.
172 ovn-controller-vtep will leave this column empty.
175 <column name="external_ids" key="ovn-bridge-mappings">
176 <code>ovn-controller</code> populates this key with the set of bridge
177 mappings it has been configured to use. Other applications should treat
178 this key as read-only. See <code>ovn-controller</code>(8) for more
182 <group title="Common Columns">
183 The overall purpose of these columns is described under <code>Common
184 Columns</code> at the beginning of this document.
186 <column name="external_ids"/>
189 <group title="Encapsulation Configuration">
191 OVN uses encapsulation to transmit logical dataplane packets
195 <column name="encaps">
196 Points to supported encapsulation configurations to transmit
197 logical dataplane packets to this chassis. Each entry is a <ref
198 table="Encap"/> record that describes the configuration.
202 <group title="Gateway Configuration">
204 A <dfn>gateway</dfn> is a chassis that forwards traffic between the
205 OVN-managed part of a logical network and a physical VLAN, extending a
206 tunnel-based logical network into a physical network. Gateways are
207 typically dedicated nodes that do not host VMs and will be controlled
208 by <code>ovn-controller-vtep</code>.
211 <column name="vtep_logical_switches">
212 Stores all VTEP logical switch names connected by this gateway
213 chassis. The <ref table="Port_Binding"/> table entry with
214 <ref column="options" table="Port_Binding"/>:<code>vtep-physical-switch</code>
215 equal <ref table="Chassis"/> <ref column="name" table="Chassis"/>, and
216 <ref column="options" table="Port_Binding"/>:<code>vtep-logical-switch</code>
217 value in <ref table="Chassis"/>
218 <ref column="vtep_logical_switches" table="Chassis"/>, will be
219 associated with this <ref table="Chassis"/>.
224 <table name="Encap" title="Encapsulation Types">
226 The <ref column="encaps" table="Chassis"/> column in the <ref
227 table="Chassis"/> table refers to rows in this table to identify
228 how OVN may transmit logical dataplane packets to this chassis.
229 Each chassis, via <code>ovn-controller</code>(8) or
230 <code>ovn-controller-vtep</code>(8), adds and updates its own rows
231 and keeps a copy of the remaining rows to determine how to reach
236 The encapsulation to use to transmit packets to this chassis.
237 Hypervisors must use either <code>geneve</code> or
238 <code>stt</code>. Gateways may use <code>vxlan</code>,
239 <code>geneve</code>, or <code>stt</code>.
242 <column name="options">
243 Options for configuring the encapsulation, e.g. IPsec parameters when
244 IPsec support is introduced. No options are currently defined.
248 The IPv4 address of the encapsulation tunnel endpoint.
252 <table name="Logical_Flow" title="Logical Network Flows">
254 Each row in this table represents one logical flow.
255 <code>ovn-northd</code> populates this table with logical flows
256 that implement the L2 and L3 topologies specified in the
257 <ref db="OVN_Northbound"/> database. Each hypervisor, via
258 <code>ovn-controller</code>, translates the logical flows into
259 OpenFlow flows specific to its hypervisor and installs them into
264 Logical flows are expressed in an OVN-specific format, described here. A
265 logical datapath flow is much like an OpenFlow flow, except that the
266 flows are written in terms of logical ports and logical datapaths instead
267 of physical ports and physical datapaths. Translation between logical
268 and physical flows helps to ensure isolation between logical datapaths.
269 (The logical flow abstraction also allows the OVN centralized
270 components to do less work, since they do not have to separately
271 compute and push out physical flows to each chassis.)
275 The default action when no flow matches is to drop packets.
278 <p><em>Architectural Logical Life Cycle of a Packet</em></p>
281 This following description focuses on the life cycle of a packet through
282 a logical datapath, ignoring physical details of the implementation.
283 Please refer to <em>Architectural Physical Life Cycle of a Packet</em> in
284 <code>ovn-architecture</code>(7) for the physical information.
288 The description here is written as if OVN itself executes these steps,
289 but in fact OVN (that is, <code>ovn-controller</code>) programs Open
290 vSwitch, via OpenFlow and OVSDB, to execute them on its behalf.
294 At a high level, OVN passes each packet through the logical datapath's
295 logical ingress pipeline, which may output the packet to one or more
296 logical port or logical multicast groups. For each such logical output
297 port, OVN passes the packet through the datapath's logical egress
298 pipeline, which may either drop the packet or deliver it to the
299 destination. Between the two pipelines, outputs to logical multicast
300 groups are expanded into logical ports, so that the egress pipeline only
301 processes a single logical output port at a time. Between the two
302 pipelines is also where, when necessary, OVN encapsulates a packet in a
303 tunnel (or tunnels) to transmit to remote hypervisors.
307 In more detail, to start, OVN searches the <ref table="Logical_Flow"/>
308 table for a row with correct <ref column="logical_datapath"/>, a <ref
309 column="pipeline"/> of <code>ingress</code>, a <ref column="table_id"/>
310 of 0, and a <ref column="match"/> that is true for the packet. If none
311 is found, OVN drops the packet. If OVN finds more than one, it chooses
312 the match with the highest <ref column="priority"/>. Then OVN executes
313 each of the actions specified in the row's <ref table="actions"/> column,
314 in the order specified. Some actions, such as those to modify packet
315 headers, require no further details. The <code>next</code> and
316 <code>output</code> actions are special.
320 The <code>next</code> action causes the above process to be repeated
321 recursively, except that OVN searches for <ref column="table_id"/> of 1
322 instead of 0. Similarly, any <code>next</code> action in a row found in
323 that table would cause a further search for a <ref column="table_id"/> of
324 2, and so on. When recursive processing completes, flow control returns
325 to the action following <code>next</code>.
329 The <code>output</code> action also introduces recursion. Its effect
330 depends on the current value of the <code>outport</code> field. Suppose
331 <code>outport</code> designates a logical port. First, OVN compares
332 <code>inport</code> to <code>outport</code>; if they are equal, it treats
333 the <code>output</code> as a no-op. In the common case, where they are
334 different, the packet enters the egress pipeline. This transition to the
335 egress pipeline discards register data, e.g. <code>reg0</code> ...
336 <code>reg4</code> and connection tracking state, to achieve
337 uniform behavior regardless of whether the egress pipeline is on a
338 different hypervisor (because registers aren't preserve across
339 tunnel encapsulation).
343 To execute the egress pipeline, OVN again searches the <ref
344 table="Logical_Flow"/> table for a row with correct <ref
345 column="logical_datapath"/>, a <ref column="table_id"/> of 0, a <ref
346 column="match"/> that is true for the packet, but now looking for a <ref
347 column="pipeline"/> of <code>egress</code>. If no matching row is found,
348 the output becomes a no-op. Otherwise, OVN executes the actions for the
349 matching flow (which is chosen from multiple, if necessary, as already
354 In the <code>egress</code> pipeline, the <code>next</code> action acts as
355 already described, except that it, of course, searches for
356 <code>egress</code> flows. The <code>output</code> action, however, now
357 directly outputs the packet to the output port (which is now fixed,
358 because <code>outport</code> is read-only within the egress pipeline).
362 The description earlier assumed that <code>outport</code> referred to a
363 logical port. If it instead designates a logical multicast group, then
364 the description above still applies, with the addition of fan-out from
365 the logical multicast group to each logical port in the group. For each
366 member of the group, OVN executes the logical pipeline as described, with
367 the logical output port replaced by the group member.
370 <p><em>Pipeline Stages</em></p>
373 <code>ovn-northd</code> is responsible for populating the
374 <ref table="Logical_Flow"/> table, so the stages are an
375 implementation detail and subject to change. This section
376 describes the current logical flow table.
380 The ingress pipeline consists of the following stages:
384 Port Security (Table 0): Validates the source address, drops
385 packets with a VLAN tag, and, if configured, verifies that the
386 logical port is allowed to send with the source address.
390 L2 Destination Lookup (Table 1): Forwards known unicast
391 addresses to the appropriate logical port. Unicast packets to
392 unknown hosts are forwarded to logical ports configured with the
393 special <code>unknown</code> mac address. Broadcast, and
394 multicast are flooded to all ports in the logical switch.
399 The egress pipeline consists of the following stages:
403 ACL (Table 0): Applies any specified access control lists.
407 Port Security (Table 1): If configured, verifies that the
408 logical port is allowed to receive packets with the destination
413 <column name="logical_datapath">
414 The logical datapath to which the logical flow belongs.
417 <column name="pipeline">
419 The primary flows used for deciding on a packet's destination are the
420 <code>ingress</code> flows. The <code>egress</code> flows implement
421 ACLs. See <em>Logical Life Cycle of a Packet</em>, above, for details.
425 <column name="table_id">
426 The stage in the logical pipeline, analogous to an OpenFlow table number.
429 <column name="priority">
430 The flow's priority. Flows with numerically higher priority take
431 precedence over those with lower. If two logical datapath flows with the
432 same priority both match, then the one actually applied to the packet is
436 <column name="match">
438 A matching expression. OVN provides a superset of OpenFlow matching
439 capabilities, using a syntax similar to Boolean expressions in a
440 programming language.
444 The most important components of match expression are
445 <dfn>comparisons</dfn> between <dfn>symbols</dfn> and
446 <dfn>constants</dfn>, e.g. <code>ip4.dst == 192.168.0.1</code>,
447 <code>ip.proto == 6</code>, <code>arp.op == 1</code>, <code>eth.type ==
448 0x800</code>. The logical AND operator <code>&&</code> and
449 logical OR operator <code>||</code> can combine comparisons into a
454 Matching expressions also support parentheses for grouping, the logical
455 NOT prefix operator <code>!</code>, and literals <code>0</code> and
456 <code>1</code> to express ``false'' or ``true,'' respectively. The
457 latter is useful by itself as a catch-all expression that matches every
461 <p><em>Symbols</em></p>
464 <em>Type</em>. Symbols have <dfn>integer</dfn> or <dfn>string</dfn>
465 type. Integer symbols have a <dfn>width</dfn> in bits.
469 <em>Kinds</em>. There are three kinds of symbols:
475 <dfn>Fields</dfn>. A field symbol represents a packet header or
476 metadata field. For example, a field
477 named <code>vlan.tci</code> might represent the VLAN TCI field in a
482 A field symbol can have integer or string type. Integer fields can
483 be nominal or ordinal (see <em>Level of Measurement</em>,
490 <dfn>Subfields</dfn>. A subfield represents a subset of bits from
491 a larger field. For example, a field <code>vlan.vid</code> might
492 be defined as an alias for <code>vlan.tci[0..11]</code>. Subfields
493 are provided for syntactic convenience, because it is always
494 possible to instead refer to a subset of bits from a field
499 Only ordinal fields (see <em>Level of Measurement</em>,
500 below) may have subfields. Subfields are always ordinal.
506 <dfn>Predicates</dfn>. A predicate is shorthand for a Boolean
507 expression. Predicates may be used much like 1-bit fields. For
508 example, <code>ip4</code> might expand to <code>eth.type ==
509 0x800</code>. Predicates are provided for syntactic convenience,
510 because it is always possible to instead specify the underlying
515 A predicate whose expansion refers to any nominal field or
516 predicate (see <em>Level of Measurement</em>, below) is nominal;
517 other predicates have Boolean level of measurement.
523 <em>Level of Measurement</em>. See
524 http://en.wikipedia.org/wiki/Level_of_measurement for the statistical
525 concept on which this classification is based. There are three
532 <dfn>Ordinal</dfn>. In statistics, ordinal values can be ordered
533 on a scale. OVN considers a field (or subfield) to be ordinal if
534 its bits can be examined individually. This is true for the
535 OpenFlow fields that OpenFlow or Open vSwitch makes ``maskable.''
539 Any use of a nominal field may specify a single bit or a range of
540 bits, e.g. <code>vlan.tci[13..15]</code> refers to the PCP field
541 within the VLAN TCI, and <code>eth.dst[40]</code> refers to the
542 multicast bit in the Ethernet destination address.
546 OVN supports all the usual arithmetic relations (<code>==</code>,
547 <code>!=</code>, <code><</code>, <code><=</code>,
548 <code>></code>, and <code>>=</code>) on ordinal fields and
549 their subfields, because OVN can implement these in OpenFlow and
550 Open vSwitch as collections of bitwise tests.
556 <dfn>Nominal</dfn>. In statistics, nominal values cannot be
557 usefully compared except for equality. This is true of OpenFlow
558 port numbers, Ethernet types, and IP protocols are examples: all of
559 these are just identifiers assigned arbitrarily with no deeper
560 meaning. In OpenFlow and Open vSwitch, bits in these fields
561 generally aren't individually addressable.
565 OVN only supports arithmetic tests for equality on nominal fields,
566 because OpenFlow and Open vSwitch provide no way for a flow to
567 efficiently implement other comparisons on them. (A test for
568 inequality can be sort of built out of two flows with different
569 priorities, but OVN matching expressions always generate flows with
574 String fields are always nominal.
580 <dfn>Boolean</dfn>. A nominal field that has only two values, 0
581 and 1, is somewhat exceptional, since it is easy to support both
582 equality and inequality tests on such a field: either one can be
583 implemented as a test for 0 or 1.
587 Only predicates (see above) have a Boolean level of measurement.
591 This isn't a standard level of measurement.
597 <em>Prerequisites</em>. Any symbol can have prerequisites, which are
598 additional condition implied by the use of the symbol. For example,
599 For example, <code>icmp4.type</code> symbol might have prerequisite
600 <code>icmp4</code>, which would cause an expression <code>icmp4.type ==
601 0</code> to be interpreted as <code>icmp4.type == 0 &&
602 icmp4</code>, which would in turn expand to <code>icmp4.type == 0
603 && eth.type == 0x800 && ip4.proto == 1</code> (assuming
604 <code>icmp4</code> is a predicate defined as suggested under
605 <em>Types</em> above).
608 <p><em>Relational operators</em></p>
611 All of the standard relational operators <code>==</code>,
612 <code>!=</code>, <code><</code>, <code><=</code>,
613 <code>></code>, and <code>>=</code> are supported. Nominal
614 fields support only <code>==</code> and <code>!=</code>, and only in a
615 positive sense when outer <code>!</code> are taken into account,
616 e.g. given string field <code>inport</code>, <code>inport ==
617 "eth0"</code> and <code>!(inport != "eth0")</code> are acceptable, but
618 not <code>inport != "eth0"</code>.
622 The implementation of <code>==</code> (or <code>!=</code> when it is
623 negated), is more efficient than that of the other relational
627 <p><em>Constants</em></p>
630 Integer constants may be expressed in decimal, hexadecimal prefixed by
631 <code>0x</code>, or as dotted-quad IPv4 addresses, IPv6 addresses in
632 their standard forms, or Ethernet addresses as colon-separated hex
633 digits. A constant in any of these forms may be followed by a slash
634 and a second constant (the mask) in the same form, to form a masked
635 constant. IPv4 and IPv6 masks may be given as integers, to express
640 String constants have the same syntax as quoted strings in JSON (thus,
641 they are Unicode strings).
645 Some operators support sets of constants written inside curly braces
646 <code>{</code> ... <code>}</code>. Commas between elements of a set,
647 and after the last elements, are optional. With <code>==</code>,
648 ``<code><var>field</var> == { <var>constant1</var>,
649 <var>constant2</var>,</code> ... <code>}</code>'' is syntactic sugar
650 for ``<code><var>field</var> == <var>constant1</var> ||
651 <var>field</var> == <var>constant2</var> || </code>...<code></code>.
652 Similarly, ``<code><var>field</var> != { <var>constant1</var>,
653 <var>constant2</var>, </code>...<code> }</code>'' is equivalent to
654 ``<code><var>field</var> != <var>constant1</var> &&
655 <var>field</var> != <var>constant2</var> &&
656 </code>...<code></code>''.
659 <p><em>Miscellaneous</em></p>
662 Comparisons may name the symbol or the constant first,
663 e.g. <code>tcp.src == 80</code> and <code>80 == tcp.src</code> are both
668 Tests for a range may be expressed using a syntax like <code>1024 <=
669 tcp.src <= 49151</code>, which is equivalent to <code>1024 <=
670 tcp.src && tcp.src <= 49151</code>.
674 For a one-bit field or predicate, a mention of its name is equivalent
675 to <code><var>symobl</var> == 1</code>, e.g. <code>vlan.present</code>
676 is equivalent to <code>vlan.present == 1</code>. The same is true for
677 one-bit subfields, e.g. <code>vlan.tci[12]</code>. There is no
678 technical limitation to implementing the same for ordinal fields of all
679 widths, but the implementation is expensive enough that the syntax
680 parser requires writing an explicit comparison against zero to make
681 mistakes less likely, e.g. in <code>tcp.src != 0</code> the comparison
682 against 0 is required.
686 <em>Operator precedence</em> is as shown below, from highest to lowest.
687 There are two exceptions where parentheses are required even though the
688 table would suggest that they are not: <code>&&</code> and
689 <code>||</code> require parentheses when used together, and
690 <code>!</code> requires parentheses when applied to a relational
691 expression. Thus, in <code>(eth.type == 0x800 || eth.type == 0x86dd)
692 && ip.proto == 6</code> or <code>!(arp.op == 1)</code>, the
693 parentheses are mandatory.
697 <li><code>()</code></li>
698 <li><code>== != < <= > >=</code></li>
699 <li><code>!</code></li>
700 <li><code>&& ||</code></li>
704 <em>Comments</em> may be introduced by <code>//</code>, which extends
705 to the next new-line. Comments within a line may be bracketed by
706 <code>/*</code> and <code>*/</code>. Multiline comments are not
710 <p><em>Symbols</em></p>
713 Most of the symbols below have integer type. Only <code>inport</code>
714 and <code>outport</code> have string type. <code>inport</code> names a
715 logical port. Thus, its value is a <ref column="logical_port"/> name
716 from the <ref table="Port_Binding"/> table. <code>outport</code> may
717 name a logical port, as <code>inport</code>, or a logical multicast
718 group defined in the <ref table="Multicast_Group"/> table. For both
719 symbols, only names within the flow's logical datapath may be used.
723 <li><code>reg0</code>...<code>reg4</code></li>
724 <li><code>inport</code> <code>outport</code></li>
725 <li><code>eth.src</code> <code>eth.dst</code> <code>eth.type</code></li>
726 <li><code>vlan.tci</code> <code>vlan.vid</code> <code>vlan.pcp</code> <code>vlan.present</code></li>
727 <li><code>ip.proto</code> <code>ip.dscp</code> <code>ip.ecn</code> <code>ip.ttl</code> <code>ip.frag</code></li>
728 <li><code>ip4.src</code> <code>ip4.dst</code></li>
729 <li><code>ip6.src</code> <code>ip6.dst</code> <code>ip6.label</code></li>
730 <li><code>arp.op</code> <code>arp.spa</code> <code>arp.tpa</code> <code>arp.sha</code> <code>arp.tha</code></li>
731 <li><code>tcp.src</code> <code>tcp.dst</code> <code>tcp.flags</code></li>
732 <li><code>udp.src</code> <code>udp.dst</code></li>
733 <li><code>sctp.src</code> <code>sctp.dst</code></li>
734 <li><code>icmp4.type</code> <code>icmp4.code</code></li>
735 <li><code>icmp6.type</code> <code>icmp6.code</code></li>
736 <li><code>nd.target</code> <code>nd.sll</code> <code>nd.tll</code></li>
737 <li><code>ct_mark</code> <code>ct_label</code></li>
740 <code>ct_state</code>, which has the following Boolean subfields:
743 <li><code>ct.new</code>: True for a new flow</li>
744 <li><code>ct.est</code>: True for an established flow</li>
745 <li><code>ct.rel</code>: True for a related flow</li>
746 <li><code>ct.rpl</code>: True for a reply flow</li>
747 <li><code>ct.inv</code>: True for a connection entry in a bad state</li>
750 <code>ct_state</code> and its subfields are initialized by the
751 <code>ct_next</code> action, described below.
757 The following predicates are supported:
761 <li><code>eth.bcast</code> expands to <code>eth.dst == ff:ff:ff:ff:ff:ff</code></li>
762 <li><code>eth.mcast</code> expands to <code>eth.dst[40]</code></li>
763 <li><code>vlan.present</code> expands to <code>vlan.tci[12]</code></li>
764 <li><code>ip4</code> expands to <code>eth.type == 0x800</code></li>
765 <li><code>ip4.mcast</code> expands to <code>ip4.dst[28..31] == 0xe</code></li>
766 <li><code>ip6</code> expands to <code>eth.type == 0x86dd</code></li>
767 <li><code>ip</code> expands to <code>ip4 || ip6</code></li>
768 <li><code>icmp4</code> expands to <code>ip4 && ip.proto == 1</code></li>
769 <li><code>icmp6</code> expands to <code>ip6 && ip.proto == 58</code></li>
770 <li><code>icmp</code> expands to <code>icmp4 || icmp6</code></li>
771 <li><code>ip.is_frag</code> expands to <code>ip.frag[0]</code></li>
772 <li><code>ip.later_frag</code> expands to <code>ip.frag[1]</code></li>
773 <li><code>ip.first_frag</code> expands to <code>ip.is_frag && !ip.later_frag</code></li>
774 <li><code>arp</code> expands to <code>eth.type == 0x806</code></li>
775 <li><code>nd</code> expands to <code>icmp6.type == {135, 136} && icmp6.code == 0</code></li>
776 <li><code>tcp</code> expands to <code>ip.proto == 6</code></li>
777 <li><code>udp</code> expands to <code>ip.proto == 17</code></li>
778 <li><code>sctp</code> expands to <code>ip.proto == 132</code></li>
782 <column name="actions">
784 Logical datapath actions, to be executed when the logical flow
785 represented by this row is the highest-priority match.
789 Actions share lexical syntax with the <ref column="match"/> column. An
790 empty set of actions (or one that contains just white space or
791 comments), or a set of actions that consists of just
792 <code>drop;</code>, causes the matched packets to be dropped.
793 Otherwise, the column should contain a sequence of actions, each
794 terminated by a semicolon.
798 The following actions are defined:
802 <dt><code>output;</code></dt>
805 In the ingress pipeline, this action executes the
806 <code>egress</code> pipeline as a subroutine. If
807 <code>outport</code> names a logical port, the egress pipeline
808 executes once; if it is a multicast group, the egress pipeline runs
809 once for each logical port in the group.
813 In the egress pipeline, this action performs the actual
814 output to the <code>outport</code> logical port. (In the egress
815 pipeline, <code>outport</code> never names a multicast group.)
819 Output to the input port is implicitly dropped, that is,
820 <code>output</code> becomes a no-op if <code>outport</code> ==
821 <code>inport</code>. Occasionally it may be useful to override
822 this behavior, e.g. to send an ARP reply to an ARP request; to do
823 so, use <code>inport = "";</code> to set the logical input port to
824 an empty string (which should not be used as the name of any
829 <dt><code>next;</code></dt>
830 <dt><code>next(<var>table</var>);</code></dt>
832 Executes another logical datapath table as a subroutine. By default,
833 the table after the current one is executed. Specify
834 <var>table</var> to jump to a specific table in the same pipeline.
837 <dt><code><var>field</var> = <var>constant</var>;</code></dt>
840 Sets data or metadata field <var>field</var> to constant value
841 <var>constant</var>, e.g. <code>outport = "vif0";</code> to set the
842 logical output port. To set only a subset of bits in a field,
843 specify a subfield for <var>field</var> or a masked
844 <var>constant</var>, e.g. one may use <code>vlan.pcp[2] = 1;</code>
845 or <code>vlan.pcp = 4/4;</code> to set the most sigificant bit of
850 Assigning to a field with prerequisites implicitly adds those
851 prerequisites to <ref column="match"/>; thus, for example, a flow
852 that sets <code>tcp.dst</code> applies only to TCP flows,
853 regardless of whether its <ref column="match"/> mentions any TCP
858 Not all fields are modifiable (e.g. <code>eth.type</code> and
859 <code>ip.proto</code> are read-only), and not all modifiable fields
860 may be partially modified (e.g. <code>ip.ttl</code> must assigned
861 as a whole). The <code>outport</code> field is modifiable in the
862 <code>ingress</code> pipeline but not in the <code>egress</code>
867 <dt><code><var>field1</var> = <var>field2</var>;</code></dt>
870 Sets data or metadata field <var>field1</var> to the value of data
871 or metadata field <var>field2</var>, e.g. <code>reg0 =
872 ip4.src;</code> copies <code>ip4.src</code> into <code>reg0</code>.
873 To modify only a subset of a field's bits, specify a subfield for
874 <var>field1</var> or <var>field2</var> or both, e.g. <code>vlan.pcp
875 = reg0[0..2];</code> copies the least-significant bits of
876 <code>reg0</code> into the VLAN PCP.
880 <var>field1</var> and <var>field2</var> must be the same type,
881 either both string or both integer fields. If they are both
882 integer fields, they must have the same width.
886 If <var>field1</var> or <var>field2</var> has prerequisites, they
887 are added implicitly to <ref column="match"/>. It is possible to
888 write an assignment with contradictory prerequisites, such as
889 <code>ip4.src = ip6.src[0..31];</code>, but the contradiction means
890 that a logical flow with such an assignment will never be matched.
894 <dt><code><var>field1</var> <-> <var>field2</var>;</code></dt>
897 Similar to <code><var>field1</var> = <var>field2</var>;</code>
898 except that the two values are exchanged instead of copied. Both
899 <var>field1</var> and <var>field2</var> must modifiable.
903 <dt><code>ip.ttl--;</code></dt>
906 Decrements the IPv4 or IPv6 TTL. If this would make the TTL zero
907 or negative, then processing of the packet halts; no further
908 actions are processed. (To properly handle such cases, a
909 higher-priority flow should match on
910 <code>ip.ttl == {0, 1};</code>.)
913 <p><b>Prerequisite:</b> <code>ip</code></p>
916 <dt><code>ct_next;</code></dt>
919 Apply connection tracking to the flow, initializing
920 <code>ct_state</code> for matching in later tables.
921 Automatically moves on to the next table, as if followed by
926 As a side effect, IP fragments will be reassembled for matching.
927 If a fragmented packet is output, then it will be sent with any
928 overlapping fragments squashed. The connection tracking state is
929 scoped by the logical port, so overlapping addresses may be used.
930 To allow traffic related to the matched flow, execute
931 <code>ct_commit</code>.
935 It is possible to have actions follow <code>ct_next</code>,
936 but they will not have access to any of its side-effects and
937 is not generally useful.
941 <dt><code>ct_commit;</code></dt>
944 Commit the flow to the connection tracking entry associated
945 with it by a previous call to <code>ct_next</code>.
948 Note that if you want processing to continue in the next table,
949 you must execute the <code>next</code> action after
950 <code>ct_commit</code>.
954 <dt><code>arp { <var>action</var>; </code>...<code> };</code></dt>
957 Temporarily replaces the IPv4 packet being processed by an ARP
958 packet and executes each nested <var>action</var> on the ARP
959 packet. Actions following the <var>arp</var> action, if any, apply
960 to the original, unmodified packet.
964 The ARP packet that this action operates on is initialized based on
965 the IPv4 packet being processed, as follows. These are default
966 values that the nested actions will probably want to change:
970 <li><code>eth.src</code> unchanged</li>
971 <li><code>eth.dst</code> unchanged</li>
972 <li><code>eth.type = 0x0806</code></li>
973 <li><code>arp.op = 1</code> (ARP request)</li>
974 <li><code>arp.sha</code> copied from <code>eth.src</code></li>
975 <li><code>arp.spa</code> copied from <code>ip4.src</code></li>
976 <li><code>arp.tha = 00:00:00:00:00:00</code></li>
977 <li><code>arp.tpa</code> copied from <code>ip4.dst</code></li>
981 The ARP packet has the same VLAN header, if any, as the IP packet
985 <p><b>Prerequisite:</b> <code>ip4</code></p>
988 <dt><code>get_arp(<var>P</var>, <var>A</var>);</code></dt>
992 <b>Parameters</b>: logical port string field <var>P</var>, 32-bit
993 IP address field <var>A</var>.
997 Looks up <var>A</var> in <var>P</var>'s ARP table. If an entry is
998 found, stores its Ethernet address in <code>eth.dst</code>,
999 otherwise stores <code>00:00:00:00:00:00</code> in
1000 <code>eth.dst</code>.
1003 <p><b>Example:</b> <code>get_arp(outport, ip4.dst);</code></p>
1007 <code>put_arp(<var>P</var>, <var>A</var>, <var>E</var>);</code>
1012 <b>Parameters</b>: logical port string field <var>P</var>, 32-bit
1013 IP address field <var>A</var>, 48-bit Ethernet address field
1018 Adds or updates the entry for IP address <var>A</var> in logical
1019 port <var>P</var>'s ARP table, setting its Ethernet address to
1023 <p><b>Example:</b> <code>put_arp(inport, arp.spa, arp.sha);</code></p>
1028 The following actions will likely be useful later, but they have not
1029 been thought out carefully.
1033 <dt><code>icmp4 { <var>action</var>; </code>...<code> };</code></dt>
1036 Temporarily replaces the IPv4 packet being processed by an ICMPv4
1037 packet and executes each nested <var>action</var> on the ICMPv4
1038 packet. Actions following the <var>icmp4</var> action, if any,
1039 apply to the original, unmodified packet.
1043 The ICMPv4 packet that this action operates on is initialized based
1044 on the IPv4 packet being processed, as follows. These are default
1045 values that the nested actions will probably want to change.
1046 Ethernet and IPv4 fields not listed here are not changed:
1050 <li><code>ip.proto = 1</code> (ICMPv4)</li>
1051 <li><code>ip.frag = 0</code> (not a fragment)</li>
1052 <li><code>icmp4.type = 3</code> (destination unreachable)</li>
1053 <li><code>icmp4.code = 1</code> (host unreachable)</li>
1060 <p><b>Prerequisite:</b> <code>ip4</code></p>
1063 <dt><code>tcp_reset;</code></dt>
1066 This action transforms the current TCP packet according to the
1067 following pseudocode:
1074 tcp.ack = tcp.seq + length(tcp.payload);
1081 Then, the action drops all TCP options and payload data, and
1082 updates the TCP checksum.
1089 <p><b>Prerequisite:</b> <code>tcp</code></p>
1094 <column name="external_ids" key="stage-name">
1095 Human-readable name for this flow's stage in the pipeline.
1098 <group title="Common Columns">
1099 The overall purpose of these columns is described under <code>Common
1100 Columns</code> at the beginning of this document.
1102 <column name="external_ids"/>
1106 <table name="Multicast_Group" title="Logical Port Multicast Groups">
1108 The rows in this table define multicast groups of logical ports.
1109 Multicast groups allow a single packet transmitted over a tunnel to a
1110 hypervisor to be delivered to multiple VMs on that hypervisor, which
1111 uses bandwidth more efficiently.
1115 Each row in this table defines a logical multicast group numbered <ref
1116 column="tunnel_key"/> within <ref column="datapath"/>, whose logical
1117 ports are listed in the <ref column="ports"/> column.
1120 <column name="datapath">
1121 The logical datapath in which the multicast group resides.
1124 <column name="tunnel_key">
1125 The value used to designate this logical egress port in tunnel
1126 encapsulations. An index forces the key to be unique within the <ref
1127 column="datapath"/>. The unusual range ensures that multicast group IDs
1128 do not overlap with logical port IDs.
1131 <column name="name">
1133 The logical multicast group's name. An index forces the name to be
1134 unique within the <ref column="datapath"/>. Logical flows in the
1135 ingress pipeline may output to the group just as for individual logical
1136 ports, by assigning the group's name to <code>outport</code> and
1137 executing an <code>output</code> action.
1141 Multicast group names and logical port names share a single namespace
1142 and thus should not overlap (but the database schema cannot enforce
1143 this). To try to avoid conflicts, <code>ovn-northd</code> uses names
1144 that begin with <code>_MC_</code>.
1148 <column name="ports">
1149 The logical ports included in the multicast group. All of these ports
1150 must be in the <ref column="datapath"/> logical datapath (but the
1151 database schema cannot enforce this).
1155 <table name="Datapath_Binding" title="Physical-Logical Datapath Bindings">
1157 Each row in this table identifies physical bindings of a logical
1158 datapath. A logical datapath implements a logical pipeline among the
1159 ports in the <ref table="Port_Binding"/> table associated with it. In
1160 practice, the pipeline in a given logical datapath implements either a
1161 logical switch or a logical router.
1164 <column name="tunnel_key">
1165 The tunnel key value to which the logical datapath is bound.
1166 The <code>Tunnel Encapsulation</code> section in
1167 <code>ovn-architecture</code>(7) describes how tunnel keys are
1168 constructed for each supported encapsulation.
1171 <group title="OVN_Northbound Relationship">
1173 Each row in <ref table="Datapath_Binding"/> is associated with some
1174 logical datapath. <code>ovn-northd</code> uses these keys to track the
1175 association of a logical datapath with concepts in the <ref
1176 db="OVN_Northbound"/> database.
1179 <column name="external_ids" key="logical-switch" type='{"type": "uuid"}'>
1180 For a logical datapath that represents a logical switch,
1181 <code>ovn-northd</code> stores in this key the UUID of the
1182 corresponding <ref table="Logical_Switch" db="OVN_Northbound"/> row in
1183 the <ref db="OVN_Northbound"/> database.
1186 <column name="external_ids" key="logical-router" type='{"type": "uuid"}'>
1187 For a logical datapath that represents a logical router,
1188 <code>ovn-northd</code> stores in this key the UUID of the
1189 corresponding <ref table="Logical_Router" db="OVN_Northbound"/> row in
1190 the <ref db="OVN_Northbound"/> database.
1194 <group title="Common Columns">
1195 The overall purpose of these columns is described under <code>Common
1196 Columns</code> at the beginning of this document.
1198 <column name="external_ids"/>
1202 <table name="Port_Binding" title="Physical-Logical Port Bindings">
1204 Most rows in this table identify the physical location of a logical port.
1205 (The exceptions are logical patch ports, which do not have any physical
1210 For every <code>Logical_Port</code> record in <code>OVN_Northbound</code>
1211 database, <code>ovn-northd</code> creates a record in this table.
1212 <code>ovn-northd</code> populates and maintains every column except
1213 the <code>chassis</code> column, which it leaves empty in new records.
1217 <code>ovn-controller</code>/<code>ovn-controller-vtep</code>
1218 populates the <code>chassis</code> column for the records that
1219 identify the logical ports that are located on its hypervisor/gateway,
1220 which <code>ovn-controller</code>/<code>ovn-controller-vtep</code> in
1221 turn finds out by monitoring the local hypervisor's Open_vSwitch
1222 database, which identifies logical ports via the conventions described
1223 in <code>IntegrationGuide.md</code>. (The exceptions are for
1224 <code>Port_Binding</code> records with <code>type</code> of
1225 <code>gateway</code>, whose locations are identified by
1226 <code>ovn-northd</code> via the <code>options:gateway-chassis</code>
1227 column in this table. <code>ovn-controller</code> is still responsible
1228 to populate the <code>chassis</code> column.)
1232 When a chassis shuts down gracefully, it should clean up the
1233 <code>chassis</code> column that it previously had populated.
1234 (This is not critical because resources hosted on the chassis are equally
1235 unreachable regardless of whether their rows are present.) To handle the
1236 case where a VM is shut down abruptly on one chassis, then brought up
1237 again on a different one,
1238 <code>ovn-controller</code>/<code>ovn-controller-vtep</code> must
1239 overwrite the <code>chassis</code> column with new information.
1242 <group title="Core Features">
1243 <column name="datapath">
1244 The logical datapath to which the logical port belongs.
1247 <column name="logical_port">
1248 A logical port, taken from <ref table="Logical_Port" column="name"
1249 db="OVN_Northbound"/> in the OVN_Northbound database's <ref
1250 table="Logical_Port" db="OVN_Northbound"/> table. OVN does not
1251 prescribe a particular format for the logical port ID.
1254 <column name="chassis">
1255 The physical location of the logical port. To successfully identify a
1256 chassis, this column must be a <ref table="Chassis"/> record. This is
1258 <code>ovn-controller</code>/<code>ovn-controller-vtep</code>.
1261 <column name="tunnel_key">
1263 A number that represents the logical port in the key (e.g. STT key or
1264 Geneve TLV) field carried within tunnel protocol packets.
1268 The tunnel ID must be unique within the scope of a logical datapath.
1274 The Ethernet address or addresses used as a source address on the
1275 logical port, each in the form
1276 <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>.
1277 The string <code>unknown</code> is also allowed to indicate that the
1278 logical port has an unknown set of (additional) source addresses.
1282 A VM interface would ordinarily have a single Ethernet address. A
1283 gateway port might initially only have <code>unknown</code>, and then
1284 add MAC addresses to the set as it learns new source addresses.
1288 <column name="type">
1290 A type for this logical port. Logical ports can be used to model other
1291 types of connectivity into an OVN logical switch. The following types
1296 <dt>(empty string)</dt>
1297 <dd>VM (or VIF) interface.</dd>
1299 <dt><code>patch</code></dt>
1301 One of a pair of logical ports that act as if connected by a patch
1302 cable. Useful for connecting two logical datapaths, e.g. to connect
1303 a logical router to a logical switch or to another logical router.
1306 <dt><code>gateway</code></dt>
1308 One of a pair of logical ports that act as if connected by a patch
1309 cable across multiple chassis. Useful for connecting a logical
1310 switch with a Gateway router (which is only resident on a
1311 particular chassis).
1314 <dt><code>localnet</code></dt>
1316 A connection to a locally accessible network from each
1317 <code>ovn-controller</code> instance. A logical switch can only
1318 have a single <code>localnet</code> port attached. This is used
1319 to model direct connectivity to an existing network.
1322 <dt><code>vtep</code></dt>
1324 A port to a logical switch on a VTEP gateway chassis. In order to
1325 get this port correctly recognized by the OVN controller, the <ref
1327 table="Port_Binding"/>:<code>vtep-physical-switch</code> and <ref
1329 table="Port_Binding"/>:<code>vtep-logical-switch</code> must also
1336 <group title="Patch Options">
1338 These options apply to logical ports with <ref column="type"/> of
1342 <column name="options" key="peer">
1343 The <ref column="logical_port"/> in the <ref table="Port_Binding"/>
1344 record for the other side of the patch. The named <ref
1345 column="logical_port"/> must specify this <ref column="logical_port"/>
1346 in its own <code>peer</code> option. That is, the two patch logical
1347 ports must have reversed <ref column="logical_port"/> and
1348 <code>peer</code> values.
1352 <group title="Gateway Options">
1354 These options apply to logical ports with <ref column="type"/> of
1355 <code>gateway</code>.
1358 <column name="options" key="peer">
1359 The <ref column="logical_port"/> in the <ref table="Port_Binding"/>
1360 record for the other side of the 'gateway' port. The named <ref
1361 column="logical_port"/> must specify this <ref column="logical_port"/>
1362 in its own <code>peer</code> option. That is, the two 'gateway'
1363 logical ports must have reversed <ref column="logical_port"/> and
1364 <code>peer</code> values.
1367 <column name="options" key="gateway-chassis">
1368 The <code>chassis</code> in which the port resides.
1372 <group title="Localnet Options">
1374 These options apply to logical ports with <ref column="type"/> of
1375 <code>localnet</code>.
1378 <column name="options" key="network_name">
1379 Required. <code>ovn-controller</code> uses the configuration entry
1380 <code>ovn-bridge-mappings</code> to determine how to connect to this
1381 network. <code>ovn-bridge-mappings</code> is a list of network names
1382 mapped to a local OVS bridge that provides access to that network. An
1383 example of configuring <code>ovn-bridge-mappings</code> would be:
1385 <pre>$ ovs-vsctl set open . external-ids:ovn-bridge-mappings=physnet1:br-eth0,physnet2:br-eth1</pre>
1388 When a logical switch has a <code>localnet</code> port attached,
1389 every chassis that may have a local vif attached to that logical
1390 switch must have a bridge mapping configured to reach that
1391 <code>localnet</code>. Traffic that arrives on a
1392 <code>localnet</code> port is never forwarded over a tunnel to
1398 If set, indicates that the port represents a connection to a specific
1399 VLAN on a locally accessible network. The VLAN ID is used to match
1400 incoming traffic and is also added to outgoing traffic.
1404 <group title="VTEP Options">
1406 These options apply to logical ports with <ref column="type"/> of
1410 <column name="options" key="vtep-physical-switch">
1411 Required. The name of the VTEP gateway.
1414 <column name="options" key="vtep-logical-switch">
1415 Required. A logical switch name connected by the VTEP gateway. Must
1416 be set when <ref column="type"/> is <code>vtep</code>.
1420 <group title="VMI (or VIF) Options">
1422 These options apply to logical ports with <ref column="type"/> having
1426 <column name="options" key="policing_rate">
1427 If set, indicates the maximum rate for data sent from this interface,
1428 in kbps. Data exceeding this rate is dropped.
1431 <column name="options" key="policing_burst">
1432 If set, indicates the maximum burst size for data sent from this
1437 <group title="Nested Containers">
1439 These columns support containers nested within a VM. Specifically,
1440 they are used when <ref column="type"/> is empty and <ref
1441 column="logical_port"/> identifies the interface of a container spawned
1442 inside a VM. They are empty for containers or VMs that run directly on
1446 <column name="parent_port">
1448 <ref table="Logical_Port" column="parent_name" db="OVN_Northbound"/>
1449 in the OVN_Northbound database's <ref table="Logical_Port"
1450 db="OVN_Northbound"/> table.
1455 Identifies the VLAN tag in the network traffic associated with that
1456 container's network interface.
1460 This column is used for a different purpose when <ref column="type"/>
1461 is <code>localnet</code> (see <code>Localnet Options</code>, above).
1467 <table name="MAC_Binding" title="IP to MAC bindings">
1469 Each row in this table specifies a binding from an IP address to an
1470 Ethernet address that has been discovered through ARP (for IPv4) or
1471 neighbor discovery (for IPv6). This table is primarily used to discover
1472 bindings on physical networks, because IP-to-MAC bindings for virtual
1473 machines are usually populated statically into the <ref
1474 table="Port_Binding"/> table.
1478 This table expresses a functional relationship: <ref
1479 table="MAC_Binding"/>(<ref column="logical_port"/>, <ref column="ip"/>) =
1480 <ref column="mac"/>.
1484 In outline, the lifetime of a logical router's MAC binding looks like
1490 On hypervisor 1, a logical router determines that a packet should be
1491 forwarded to IP address <var>A</var> on one of its router ports. It
1492 uses its logical flow table to determine that <var>A</var> lacks a
1493 static IP-to-MAC binding and the <code>get_arp</code> action to
1494 determine that it lacks a dynamic IP-to-MAC binding.
1498 Using an OVN logical <code>arp</code> action, the logical router
1499 generates and sends a broadcast ARP request to the router port. It
1500 drops the IP packet.
1504 The logical switch attached to the router port delivers the ARP request
1505 to all of its ports. (It might make sense to deliver it only to ports
1506 that have no static IP-to-MAC bindings, but this could also be
1507 surprising behavior.)
1511 A host or VM on hypervisor 2 (which might be the same as hypervisor 1)
1512 attached to the logical switch owns the IP address in question. It
1513 composes an ARP reply and unicasts it to the logical router port's
1518 The logical switch delivers the ARP reply to the logical router port.
1522 The logical router flow table executes a <code>put_arp</code> action.
1523 To record the IP-to-MAC binding, <code>ovn-controller</code> adds a row
1524 to the <ref table="MAC_Binding"/> table.
1528 On hypervisor 1, <code>ovn-controller</code> receives the updated <ref
1529 table="MAC_Binding"/> table from the OVN southbound database. The next
1530 packet destined to <var>A</var> through the logical router is sent
1531 directly to the bound Ethernet address.
1535 <column name="logical_port">
1536 The logical port on which the binding was discovered.
1540 The bound IP address.
1544 The Ethernet address to which the IP is bound.