1 <?xml version="1.0" encoding="utf-8"?>
2 <database name="ovn-sb" title="OVN Southbound Database">
4 This database holds logical and physical configuration and state for the
5 Open Virtual Network (OVN) system to support virtual network abstraction.
6 For an introduction to OVN, please see <code>ovn-architecture</code>(7).
10 The OVN Southbound database sits at the center of the OVN
11 architecture. It is the one component that speaks both southbound
12 directly to all the hypervisors and gateways, via
13 <code>ovn-controller</code>, and northbound to the Cloud Management
14 System, via <code>ovn-northd</code>:
17 <h2>Database Structure</h2>
20 The OVN Southbound database contains three classes of data with
21 different properties, as described in the sections below.
24 <h3>Physical Network (PN) data</h3>
27 PN tables contain information about the chassis nodes in the system. This
28 contains all the information necessary to wire the overlay, such as IP
29 addresses, supported tunnel types, and security keys.
33 The amount of PN data is small (O(n) in the number of chassis) and it
34 changes infrequently, so it can be replicated to every chassis.
38 The <ref table="Chassis"/> and <ref table="Gateway"/> tables comprise the
42 <h3>Logical Network (LN) data</h3>
45 LN tables contain the topology of logical switches and routers, ACLs,
46 firewall rules, and everything needed to describe how packets traverse a
47 logical network, represented as logical datapath flows (see Logical
48 Datapath Flows, below).
52 LN data may be large (O(n) in the number of logical ports, ACL rules,
53 etc.). Thus, to improve scaling, each chassis should receive only data
54 related to logical networks in which that chassis participates. Past
55 experience shows that in the presence of large logical networks, even
56 finer-grained partitioning of data, e.g. designing logical flows so that
57 only the chassis hosting a logical port needs related flows, pays off
58 scale-wise. (This is not necessary initially but it is worth bearing in
63 The LN is a slave of the cloud management system running northbound of OVN.
64 That CMS determines the entire OVN logical configuration and therefore the
65 LN's content at any given time is a deterministic function of the CMS's
66 configuration, although that happens indirectly via the OVN Northbound DB
67 and <code>ovn-northd</code>.
71 LN data is likely to change more quickly than PN data. This is especially
72 true in a container environment where VMs are created and destroyed (and
73 therefore added to and deleted from logical switches) quickly.
77 The <ref table="Pipeline"/> table is currently the only LN table.
80 <h3>Bindings data</h3>
83 The Binding tables contain the current placement of logical components
84 (such as VMs and VIFs) onto chassis and the bindings between logical ports
89 Bindings change frequently, at least every time a VM powers up or down
90 or migrates, and especially quickly in a container environment. The
91 amount of data per VM (or VIF) is small.
95 Each chassis is authoritative about the VMs and VIFs that it hosts at any
96 given time and can efficiently flood that state to a central location, so
97 the consistency needs are minimal.
101 The <ref table="Binding"/> table is currently the only binding data.
104 <table name="Chassis" title="Physical Network Hypervisor and Gateway Information">
106 Each row in this table represents a hypervisor or gateway (a chassis) in
107 the physical network (PN). Each chassis, via
108 <code>ovn-controller</code>, adds and updates its own row, and keeps a
109 copy of the remaining rows to determine how to reach other hypervisors.
113 When a chassis shuts down gracefully, it should remove its own row.
114 (This is not critical because resources hosted on the chassis are equally
115 unreachable regardless of whether the row is present.) If a chassis
116 shuts down permanently without removing its row, some kind of manual or
117 automatic cleanup is eventually needed; we can devise a process for that
122 A chassis name, taken from <ref key="system-id" table="Open_vSwitch"
123 column="external_ids" db="Open_vSwitch"/> in the Open_vSwitch
124 database's <ref table="Open_vSwitch" db="Open_vSwitch"/> table. OVN does
125 not prescribe a particular format for chassis names.
128 <group title="Encapsulation Configuration">
130 OVN uses encapsulation to transmit logical dataplane packets
134 <column name="encaps">
135 Points to supported encapsulation configurations to transmit
136 logical dataplane packets to this chassis. Each entry is a <ref
137 table="Encap"/> record that describes the configuration.
141 <group title="Gateway Configuration">
143 A <dfn>gateway</dfn> is a chassis that forwards traffic between a
144 logical network and a physical VLAN. Gateways are typically dedicated
145 nodes that do not host VMs.
148 <column name="gateway_ports">
149 Maps from the name of a gateway port, which is typically a physical
150 port (e.g. <code>eth1</code>) or an Open vSwitch patch port, to a <ref
151 table="Gateway"/> record that describes the details of the gatewaying
157 <table name="Encap" title="Encapsulation Types">
159 The <ref column="encaps" table="Chassis"/> column in the <ref
160 table="Chassis"/> table refers to rows in this table to identify
161 how OVN may transmit logical dataplane packets to this chassis.
162 Each chassis, via <code>ovn-controller</code>(8), adds and updates
163 its own rows and keeps a copy of the remaining rows to determine
164 how to reach other chassis.
168 The encapsulation to use to transmit packets to this chassis.
169 Hypervisors must use either <code>geneve</code> or
170 <code>stt</code>. Gateways may use <code>vxlan</code>,
171 <code>geneve</code>, or <code>stt</code>.
174 <column name="options">
175 Options for configuring the encapsulation, e.g. IPsec parameters when
176 IPsec support is introduced. No options are currently defined.
180 The IPv4 address of the encapsulation tunnel endpoint.
184 <table name="Gateway" title="Physical Network Gateway Ports">
186 The <ref column="gateway_ports" table="Chassis"/> column in the <ref
187 table="Chassis"/> table refers to rows in this table to connect a chassis
188 port to a gateway function. Each row in this table describes the logical
189 networks to which a gateway port is attached. Each chassis, via
190 <code>ovn-controller</code>(8), adds and updates its own rows, if any
191 (since most chassis are not gateways), and keeps a copy of the remaining
192 rows to determine how to reach other chassis.
195 <column name="vlan_map">
196 Maps from a VLAN ID to a logical port name. Thus, each named logical
197 port corresponds to one VLAN on the gateway port.
200 <column name="attached_port">
201 The name of the gateway port in the chassis's Open vSwitch integration
206 <table name="Pipeline" title="Logical Network Pipeline">
208 Each row in this table represents one logical flow. The cloud management
209 system, via its OVN integration, populates this table with logical flows
210 that implement the L2 and L3 topology specified in the CMS configuration.
211 Each hypervisor, via <code>ovn-controller</code>, translates the logical
212 flows into OpenFlow flows specific to its hypervisor and installs them
217 Logical flows are expressed in an OVN-specific format, described here. A
218 logical datapath flow is much like an OpenFlow flow, except that the
219 flows are written in terms of logical ports and logical datapaths instead
220 of physical ports and physical datapaths. Translation between logical
221 and physical flows helps to ensure isolation between logical datapaths.
222 (The logical flow abstraction also allows the CMS to do less work, since
223 it does not have to separately compute and push out physical flows to each
228 The default action when no flow matches is to drop packets.
231 <column name="logical_datapath">
232 The logical datapath to which the logical flow belongs. A logical
233 datapath implements a logical pipeline among the ports in the <ref
234 table="Binding"/> table associated with it. (No table represents a
235 logical datapath.) In practice, the pipeline in a given logical datapath
236 implements either a logical switch or a logical router, and
237 <code>ovn-northd</code> reuses the UUIDs for those logical entities from
238 the <code>OVN_Northbound</code> for logical datapaths.
241 <column name="table_id">
242 The stage in the logical pipeline, analogous to an OpenFlow table number.
245 <column name="priority">
246 The flow's priority. Flows with numerically higher priority take
247 precedence over those with lower. If two logical datapath flows with the
248 same priority both match, then the one actually applied to the packet is
252 <column name="match">
254 A matching expression. OVN provides a superset of OpenFlow matching
255 capabilities, using a syntax similar to Boolean expressions in a
256 programming language.
260 The most important components of match expression are
261 <dfn>comparisons</dfn> between <dfn>symbols</dfn> and
262 <dfn>constants</dfn>, e.g. <code>ip4.dst == 192.168.0.1</code>,
263 <code>ip.proto == 6</code>, <code>arp.op == 1</code>, <code>eth.type ==
264 0x800</code>. The logical AND operator <code>&&</code> and
265 logical OR operator <code>||</code> can combine comparisons into a
270 Matching expressions also support parentheses for grouping, the logical
271 NOT prefix operator <code>!</code>, and literals <code>0</code> and
272 <code>1</code> to express ``false'' or ``true,'' respectively. The
273 latter is useful by itself as a catch-all expression that matches every
277 <p><em>Symbols</em></p>
280 <em>Type</em>. Symbols have <dfn>integer</dfn> or <dfn>string</dfn>
281 type. Integer symbols have a <dfn>width</dfn> in bits.
285 <em>Kinds</em>. There are three kinds of symbols:
291 <dfn>Fields</dfn>. A field symbol represents a packet header or
292 metadata field. For example, a field
293 named <code>vlan.tci</code> might represent the VLAN TCI field in a
298 A field symbol can have integer or string type. Integer fields can
299 be nominal or ordinal (see <em>Level of Measurement</em>,
306 <dfn>Subfields</dfn>. A subfield represents a subset of bits from
307 a larger field. For example, a field <code>vlan.vid</code> might
308 be defined as an alias for <code>vlan.tci[0..11]</code>. Subfields
309 are provided for syntactic convenience, because it is always
310 possible to instead refer to a subset of bits from a field
315 Only ordinal fields (see <em>Level of Measurement</em>,
316 below) may have subfields. Subfields are always ordinal.
322 <dfn>Predicates</dfn>. A predicate is shorthand for a Boolean
323 expression. Predicates may be used much like 1-bit fields. For
324 example, <code>ip4</code> might expand to <code>eth.type ==
325 0x800</code>. Predicates are provided for syntactic convenience,
326 because it is always possible to instead specify the underlying
331 A predicate whose expansion refers to any nominal field or
332 predicate (see <em>Level of Measurement</em>, below) is nominal;
333 other predicates have Boolean level of measurement.
339 <em>Level of Measurement</em>. See
340 http://en.wikipedia.org/wiki/Level_of_measurement for the statistical
341 concept on which this classification is based. There are three
348 <dfn>Ordinal</dfn>. In statistics, ordinal values can be ordered
349 on a scale. OVN considers a field (or subfield) to be ordinal if
350 its bits can be examined individually. This is true for the
351 OpenFlow fields that OpenFlow or Open vSwitch makes ``maskable.''
355 Any use of a nominal field may specify a single bit or a range of
356 bits, e.g. <code>vlan.tci[13..15]</code> refers to the PCP field
357 within the VLAN TCI, and <code>eth.dst[40]</code> refers to the
358 multicast bit in the Ethernet destination address.
362 OVN supports all the usual arithmetic relations (<code>==</code>,
363 <code>!=</code>, <code><</code>, <code><=</code>,
364 <code>></code>, and <code>>=</code>) on ordinal fields and
365 their subfields, because OVN can implement these in OpenFlow and
366 Open vSwitch as collections of bitwise tests.
372 <dfn>Nominal</dfn>. In statistics, nominal values cannot be
373 usefully compared except for equality. This is true of OpenFlow
374 port numbers, Ethernet types, and IP protocols are examples: all of
375 these are just identifiers assigned arbitrarily with no deeper
376 meaning. In OpenFlow and Open vSwitch, bits in these fields
377 generally aren't individually addressable.
381 OVN only supports arithmetic tests for equality on nominal fields,
382 because OpenFlow and Open vSwitch provide no way for a flow to
383 efficiently implement other comparisons on them. (A test for
384 inequality can be sort of built out of two flows with different
385 priorities, but OVN matching expressions always generate flows with
390 String fields are always nominal.
396 <dfn>Boolean</dfn>. A nominal field that has only two values, 0
397 and 1, is somewhat exceptional, since it is easy to support both
398 equality and inequality tests on such a field: either one can be
399 implemented as a test for 0 or 1.
403 Only predicates (see above) have a Boolean level of measurement.
407 This isn't a standard level of measurement.
413 <em>Prerequisites</em>. Any symbol can have prerequisites, which are
414 additional condition implied by the use of the symbol. For example,
415 For example, <code>icmp4.type</code> symbol might have prerequisite
416 <code>icmp4</code>, which would cause an expression <code>icmp4.type ==
417 0</code> to be interpreted as <code>icmp4.type == 0 &&
418 icmp4</code>, which would in turn expand to <code>icmp4.type == 0
419 && eth.type == 0x800 && ip4.proto == 1</code> (assuming
420 <code>icmp4</code> is a predicate defined as suggested under
421 <em>Types</em> above).
424 <p><em>Relational operators</em></p>
427 All of the standard relational operators <code>==</code>,
428 <code>!=</code>, <code><</code>, <code><=</code>,
429 <code>></code>, and <code>>=</code> are supported. Nominal
430 fields support only <code>==</code> and <code>!=</code>, and only in a
431 positive sense when outer <code>!</code> are taken into account,
432 e.g. given string field <code>inport</code>, <code>inport ==
433 "eth0"</code> and <code>!(inport != "eth0")</code> are acceptable, but
434 not <code>inport != "eth0"</code>.
438 The implementation of <code>==</code> (or <code>!=</code> when it is
439 negated), is more efficient than that of the other relational
443 <p><em>Constants</em></p>
446 Integer constants may be expressed in decimal, hexadecimal prefixed by
447 <code>0x</code>, or as dotted-quad IPv4 addresses, IPv6 addresses in
448 their standard forms, or Ethernet addresses as colon-separated hex
449 digits. A constant in any of these forms may be followed by a slash
450 and a second constant (the mask) in the same form, to form a masked
451 constant. IPv4 and IPv6 masks may be given as integers, to express
456 String constants have the same syntax as quoted strings in JSON (thus,
457 they are Unicode strings). String constants are used for naming
458 logical ports. Thus, the useful values are <ref
459 column="logical_port"/> names from the <ref column="Binding"/> and
460 <ref column="Gateway"/> tables in a logical flow's <ref
461 column="logical_datapath"/>.
465 Some operators support sets of constants written inside curly braces
466 <code>{</code> ... <code>}</code>. Commas between elements of a set,
467 and after the last elements, are optional. With <code>==</code>,
468 ``<code><var>field</var> == { <var>constant1</var>,
469 <var>constant2</var>,</code> ... <code>}</code>'' is syntactic sugar
470 for ``<code><var>field</var> == <var>constant1</var> ||
471 <var>field</var> == <var>constant2</var> || </code>...<code></code>.
472 Similarly, ``<code><var>field</var> != { <var>constant1</var>,
473 <var>constant2</var>, </code>...<code> }</code>'' is equivalent to
474 ``<code><var>field</var> != <var>constant1</var> &&
475 <var>field</var> != <var>constant2</var> &&
476 </code>...<code></code>''.
479 <p><em>Miscellaneous</em></p>
482 Comparisons may name the symbol or the constant first,
483 e.g. <code>tcp.src == 80</code> and <code>80 == tcp.src</code> are both
488 Tests for a range may be expressed using a syntax like <code>1024 <=
489 tcp.src <= 49151</code>, which is equivalent to <code>1024 <=
490 tcp.src && tcp.src <= 49151</code>.
494 For a one-bit field or predicate, a mention of its name is equivalent
495 to <code><var>symobl</var> == 1</code>, e.g. <code>vlan.present</code>
496 is equivalent to <code>vlan.present == 1</code>. The same is true for
497 one-bit subfields, e.g. <code>vlan.tci[12]</code>. There is no
498 technical limitation to implementing the same for ordinal fields of all
499 widths, but the implementation is expensive enough that the syntax
500 parser requires writing an explicit comparison against zero to make
501 mistakes less likely, e.g. in <code>tcp.src != 0</code> the comparison
502 against 0 is required.
506 <em>Operator precedence</em> is as shown below, from highest to lowest.
507 There are two exceptions where parentheses are required even though the
508 table would suggest that they are not: <code>&&</code> and
509 <code>||</code> require parentheses when used together, and
510 <code>!</code> requires parentheses when applied to a relational
511 expression. Thus, in <code>(eth.type == 0x800 || eth.type == 0x86dd)
512 && ip.proto == 6</code> or <code>!(arp.op == 1)</code>, the
513 parentheses are mandatory.
517 <li><code>()</code></li>
518 <li><code>== != < <= > >=</code></li>
519 <li><code>!</code></li>
520 <li><code>&& ||</code></li>
524 <em>Comments</em> may be introduced by <code>//</code>, which extends
525 to the next new-line. Comments within a line may be bracketed by
526 <code>/*</code> and <code>*/</code>. Multiline comments are not
530 <p><em>Symbols</em></p>
534 <code>metadata</code> <code>reg0</code> ... <code>reg7</code>
535 <code>xreg0</code> ... <code>xreg3</code>
537 <li><code>inport</code> <code>outport</code> <code>queue</code></li>
538 <li><code>eth.src</code> <code>eth.dst</code> <code>eth.type</code></li>
539 <li><code>vlan.tci</code> <code>vlan.vid</code> <code>vlan.pcp</code> <code>vlan.present</code></li>
540 <li><code>ip.proto</code> <code>ip.dscp</code> <code>ip.ecn</code> <code>ip.ttl</code> <code>ip.frag</code></li>
541 <li><code>ip4.src</code> <code>ip4.dst</code></li>
542 <li><code>ip6.src</code> <code>ip6.dst</code> <code>ip6.label</code></li>
543 <li><code>arp.op</code> <code>arp.spa</code> <code>arp.tpa</code> <code>arp.sha</code> <code>arp.tha</code></li>
544 <li><code>tcp.src</code> <code>tcp.dst</code> <code>tcp.flags</code></li>
545 <li><code>udp.src</code> <code>udp.dst</code></li>
546 <li><code>sctp.src</code> <code>sctp.dst</code></li>
547 <li><code>icmp4.type</code> <code>icmp4.code</code></li>
548 <li><code>icmp6.type</code> <code>icmp6.code</code></li>
549 <li><code>nd.target</code> <code>nd.sll</code> <code>nd.tll</code></li>
554 <column name="actions">
556 Logical datapath actions, to be executed when the logical flow
557 represented by this row is the highest-priority match.
561 Actions share lexical syntax with the <ref column="match"/> column. An
562 empty set of actions (or one that contains just white space or
563 comments), or a set of actions that consists of just
564 <code>drop;</code>, causes the matched packets to be dropped.
565 Otherwise, the column should contain a sequence of actions, each
566 terminated by a semicolon.
570 The following actions will be initially supported:
574 <dt><code>output;</code></dt>
576 Outputs the packet to the logical port current designated by
577 <code>outport</code>. Output to the ingress port is implicitly
578 dropped, that is, <code>output</code> becomes a no-op if
579 <code>outport</code> == <code>inport</code>.
582 <dt><code>next;</code></dt>
584 Executes the next logical datapath table as a subroutine.
587 <dt><code><var>field</var> = <var>constant</var>;</code></dt>
589 Sets data or metadata field <var>field</var> to constant value
590 <var>constant</var>, e.g. <code>outport = "vif0";</code> to set the
591 logical output port. Assigning to a field with prerequisites
592 implicitly adds those prerequisites to <ref column="match"/>; thus,
593 for example, a flow that sets <code>tcp.dst</code> applies only to
594 TCP flows, regardless of whether its <ref column="match"/> mentions
595 any TCP field. To set only a subset of bits in a field,
596 <var>field</var> may be a subfield or <var>constant</var> may be
597 masked, e.g. <code>vlan.pcp[2] = 1;</code> and <code>vlan.pcp =
598 4/4;</code> both set the most sigificant bit of the VLAN PCP. Not
599 all fields are modifiable (e.g. <code>eth.type</code> and
600 <code>ip.proto</code> are read-only), and not all modifiable fields
601 may be partially modified (e.g. <code>ip.ttl</code> must assigned as
607 The following actions will likely be useful later, but they have not
608 been thought out carefully.
612 <dt><code><var>field1</var> = <var>field2</var>;</code></dt>
614 Extends the assignment action to allow copying between fields.
617 <dt><code>learn</code></dt>
619 <dt><code>conntrack</code></dt>
621 <dt><code>dec_ttl { <var>action</var>, </code>...<code> } { <var>action</var>; </code>...<code>};</code></dt>
623 decrement TTL; execute first set of actions if
624 successful, second set if TTL decrement fails
627 <dt><code>icmp_reply { <var>action</var>, </code>...<code> };</code></dt>
628 <dd>generate ICMP reply from packet, execute <var>action</var>s</dd>
630 <dt><code>arp { <var>action</var>, </code>...<code> }</code></dt>
631 <dd>generate ARP from packet, execute <var>action</var>s</dd>
636 <table name="Binding" title="Physical-Logical Bindings">
638 Each row in this table identifies the physical location of a logical
643 For every <code>Logical_Port</code> record in <code>OVN_Northbound</code>
644 database, <code>ovn-northd</code> creates a record in this table.
645 <code>ovn-northd</code> populates and maintains every column except
646 the <code>chassis</code> column, which it leaves empty in new records.
650 <code>ovn-controller</code> populates the <code>chassis</code> column
651 for the records that identify the logical ports that are located on its
652 hypervisor, which <code>ovn-controller</code> in turn finds out by
653 monitoring the local hypervisor's Open_vSwitch database, which
654 identifies logical ports via the conventions described in
655 <code>IntegrationGuide.md</code>.
659 When a chassis shuts down gracefully, it should cleanup the
660 <code>chassis</code> column that it previously had populated.
661 (This is not critical because resources hosted on the chassis are equally
662 unreachable regardless of whether their rows are present.) To handle the
663 case where a VM is shut down abruptly on one chassis, then brought up
664 again on a different one, <code>ovn-controller</code> must overwrite the
665 <code>chassis</code> column with new information.
668 <column name="logical_datapath">
669 The logical datapath to which the logical port belongs. A logical
670 datapath implements a logical pipeline via logical flows in the <ref
671 table="Pipeline"/> table. (No table represents a logical datapath.)
674 <column name="logical_port">
675 A logical port, taken from <ref table="Logical_Port" column="name"
676 db="OVN_Northbound"/> in the OVN_Northbound database's
677 <ref table="Logical_Port" db="OVN_Northbound"/> table. OVN does not
678 prescribe a particular format for the logical port ID.
681 <column name="tunnel_key">
683 A number that represents the logical port in the key (e.g. VXLAN VNI or
684 STT key) field carried within tunnel protocol packets. (This avoids
685 wasting space for a whole UUID in tunneled packets. It also allows OVN
686 to support encapsulations that cannot fit an entire UUID in their
691 Tunnel ID 0 is reserved for internal use within OVN.
695 <column name="parent_port">
696 For containers created inside a VM, this is taken from
697 <ref table="Logical_Port" column="parent_name" db="OVN_Northbound"/>
698 in the OVN_Northbound database's <ref table="Logical_Port"
699 db="OVN_Northbound"/> table. It is left empty if
700 <ref column="logical_port"/> belongs to a VM or a container created
705 When <ref column="logical_port"/> identifies the interface of a container
706 spawned inside a VM, this column identifies the VLAN tag in
707 the network traffic associated with that container's network interface.
708 It is left empty if <ref column="logical_port"/> belongs to a VM or a
709 container created in the hypervisor.
712 <column name="chassis">
713 The physical location of the logical port. To successfully identify a
714 chassis, this column must be a <ref table="Chassis"/> record. This is
715 populated by <code>ovn-controller</code>.
720 The Ethernet address or addresses used as a source address on the
721 logical port, each in the form
722 <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>.
723 The string <code>unknown</code> is also allowed to indicate that the
724 logical port has an unknown set of (additional) source addresses.
728 A VM interface would ordinarily have a single Ethernet address. A
729 gateway port might initially only have <code>unknown</code>, and then
730 add MAC addresses to the set as it learns new source addresses.