1 <?xml version="1.0" encoding="utf-8"?>
2 <database name="ovn-sb" title="OVN Southbound Database">
4 This database holds logical and physical configuration and state for the
5 Open Virtual Network (OVN) system to support virtual network abstraction.
6 For an introduction to OVN, please see <code>ovn-architecture</code>(7).
10 The OVN Southbound database sits at the center of the OVN
11 architecture. It is the one component that speaks both southbound
12 directly to all the hypervisors and gateways, via
13 <code>ovn-controller</code>, and northbound to the Cloud Management
14 System, via <code>ovn-northd</code>:
17 <h2>Database Structure</h2>
20 The OVN Southbound database contains three classes of data with
21 different properties, as described in the sections below.
24 <h3>Physical Network (PN) data</h3>
27 PN tables contain information about the chassis nodes in the system. This
28 contains all the information necessary to wire the overlay, such as IP
29 addresses, supported tunnel types, and security keys.
33 The amount of PN data is small (O(n) in the number of chassis) and it
34 changes infrequently, so it can be replicated to every chassis.
38 The <ref table="Chassis"/> and <ref table="Gateway"/> tables comprise the
42 <h3>Logical Network (LN) data</h3>
45 LN tables contain the topology of logical switches and routers, ACLs,
46 firewall rules, and everything needed to describe how packets traverse a
47 logical network, represented as logical datapath flows (see Logical
48 Datapath Flows, below).
52 LN data may be large (O(n) in the number of logical ports, ACL rules,
53 etc.). Thus, to improve scaling, each chassis should receive only data
54 related to logical networks in which that chassis participates. Past
55 experience shows that in the presence of large logical networks, even
56 finer-grained partitioning of data, e.g. designing logical flows so that
57 only the chassis hosting a logical port needs related flows, pays off
58 scale-wise. (This is not necessary initially but it is worth bearing in
63 The LN is a slave of the cloud management system running northbound of OVN.
64 That CMS determines the entire OVN logical configuration and therefore the
65 LN's content at any given time is a deterministic function of the CMS's
66 configuration, although that happens indirectly via the OVN Northbound DB
67 and <code>ovn-northd</code>.
71 LN data is likely to change more quickly than PN data. This is especially
72 true in a container environment where VMs are created and destroyed (and
73 therefore added to and deleted from logical switches) quickly.
77 The <ref table="Pipeline"/> table is currently the only LN table.
80 <h3>Bindings data</h3>
83 The Bindings tables contain the current placement of logical components
84 (such as VMs and VIFs) onto chassis and the bindings between logical ports
89 Bindings change frequently, at least every time a VM powers up or down
90 or migrates, and especially quickly in a container environment. The
91 amount of data per VM (or VIF) is small.
95 Each chassis is authoritative about the VMs and VIFs that it hosts at any
96 given time and can efficiently flood that state to a central location, so
97 the consistency needs are minimal.
101 The <ref table="Bindings"/> table is currently the only Bindings table.
104 <table name="Chassis" title="Physical Network Hypervisor and Gateway Information">
106 Each row in this table represents a hypervisor or gateway (a chassis) in
107 the physical network (PN). Each chassis, via
108 <code>ovn-controller</code>, adds and updates its own row, and keeps a
109 copy of the remaining rows to determine how to reach other hypervisors.
113 When a chassis shuts down gracefully, it should remove its own row.
114 (This is not critical because resources hosted on the chassis are equally
115 unreachable regardless of whether the row is present.) If a chassis
116 shuts down permanently without removing its row, some kind of manual or
117 automatic cleanup is eventually needed; we can devise a process for that
122 A chassis name, taken from <ref key="system-id" table="Open_vSwitch"
123 column="external_ids" db="Open_vSwitch"/> in the Open_vSwitch
124 database's <ref table="Open_vSwitch" db="Open_vSwitch"/> table. OVN does
125 not prescribe a particular format for chassis names.
128 <group title="Encapsulation Configuration">
130 OVN uses encapsulation to transmit logical dataplane packets
134 <column name="encaps">
135 Points to supported encapsulation configurations to transmit
136 logical dataplane packets to this chassis. Each entry is a <ref
137 table="Encap"/> record that describes the configuration.
141 <group title="Gateway Configuration">
143 A <dfn>gateway</dfn> is a chassis that forwards traffic between a
144 logical network and a physical VLAN. Gateways are typically dedicated
145 nodes that do not host VMs.
148 <column name="gateway_ports">
149 Maps from the name of a gateway port, which is typically a physical
150 port (e.g. <code>eth1</code>) or an Open vSwitch patch port, to a <ref
151 table="Gateway"/> record that describes the details of the gatewaying
157 <table name="Encap" title="Encapsulation Types">
159 The <ref column="encaps" table="Chassis"/> column in the <ref
160 table="Chassis"/> table refers to rows in this table to identify
161 how OVN may transmit logical dataplane packets to this chassis.
162 Each chassis, via <code>ovn-controller</code>(8), adds and updates
163 its own rows and keeps a copy of the remaining rows to determine
164 how to reach other chassis.
168 The encapsulation to use to transmit packets to this chassis.
169 Examples include <code>geneve</code>, <code>vxlan</code>, and
173 <column name="options">
174 Options for configuring the encapsulation, e.g. IPsec parameters when
175 IPsec support is introduced. No options are currently defined.
179 The IPv4 address of the encapsulation tunnel endpoint.
183 <table name="Gateway" title="Physical Network Gateway Ports">
185 The <ref column="gateway_ports" table="Chassis"/> column in the <ref
186 table="Chassis"/> table refers to rows in this table to connect a chassis
187 port to a gateway function. Each row in this table describes the logical
188 networks to which a gateway port is attached. Each chassis, via
189 <code>ovn-controller</code>(8), adds and updates its own rows, if any
190 (since most chassis are not gateways), and keeps a copy of the remaining
191 rows to determine how to reach other chassis.
194 <column name="vlan_map">
195 Maps from a VLAN ID to a logical port name. Thus, each named logical
196 port corresponds to one VLAN on the gateway port.
199 <column name="attached_port">
200 The name of the gateway port in the chassis's Open vSwitch integration
205 <table name="Pipeline" title="Logical Network Pipeline">
207 Each row in this table represents one logical flow. The cloud management
208 system, via its OVN integration, populates this table with logical flows
209 that implement the L2 and L3 topology specified in the CMS configuration.
210 Each hypervisor, via <code>ovn-controller</code>, translates the logical
211 flows into OpenFlow flows specific to its hypervisor and installs them
216 Logical flows are expressed in an OVN-specific format, described here. A
217 logical datapath flow is much like an OpenFlow flow, except that the
218 flows are written in terms of logical ports and logical datapaths instead
219 of physical ports and physical datapaths. Translation between logical
220 and physical flows helps to ensure isolation between logical datapaths.
221 (The logical flow abstraction also allows the CMS to do less work, since
222 it does not have to separately compute and push out physical flows to each
227 The default action when no flow matches is to drop packets.
230 <column name="logical_datapath">
231 The logical datapath to which the logical port belongs. A logical
232 datapath implements a logical pipeline among the ports in the <ref
233 table="Bindings"/> table associated with it. (No table represents a
234 logical datapath.) In practice, the pipeline in a given logical datapath
235 implements either a logical switch or a logical router, and
236 <code>ovn-northd</code> reuses the UUIDs for those logical entities from
237 the <code>OVN_Northbound</code> for logical datapaths.
240 <column name="table_id">
241 The stage in the logical pipeline, analogous to an OpenFlow table number.
244 <column name="priority">
245 The flow's priority. Flows with numerically higher priority take
246 precedence over those with lower. If two logical datapath flows with the
247 same priority both match, then the one actually applied to the packet is
251 <column name="match">
253 A matching expression. OVN provides a superset of OpenFlow matching
254 capabilities, using a syntax similar to Boolean expressions in a
255 programming language.
259 The most important components of match expression are
260 <dfn>comparisons</dfn> between <dfn>symbols</dfn> and
261 <dfn>constants</dfn>, e.g. <code>ip4.dst == 192.168.0.1</code>,
262 <code>ip.proto == 6</code>, <code>arp.op == 1</code>, <code>eth.type ==
263 0x800</code>. The logical AND operator <code>&&</code> and
264 logical OR operator <code>||</code> can combine comparisons into a
269 Matching expressions also support parentheses for grouping, the logical
270 NOT prefix operator <code>!</code>, and literals <code>0</code> and
271 <code>1</code> to express ``false'' or ``true,'' respectively. The
272 latter is useful by itself as a catch-all expression that matches every
276 <p><em>Symbols</em></p>
279 <em>Type</em>. Symbols have <dfn>integer</dfn> or <dfn>string</dfn>
280 type. Integer symbols have a <dfn>width</dfn> in bits.
284 <em>Kinds</em>. There are three kinds of symbols:
290 <dfn>Fields</dfn>. A field symbol represents a packet header or
291 metadata field. For example, a field
292 named <code>vlan.tci</code> might represent the VLAN TCI field in a
297 A field symbol can have integer or string type. Integer fields can
298 be nominal or ordinal (see <em>Level of Measurement</em>,
305 <dfn>Subfields</dfn>. A subfield represents a subset of bits from
306 a larger field. For example, a field <code>vlan.vid</code> might
307 be defined as an alias for <code>vlan.tci[0..11]</code>. Subfields
308 are provided for syntactic convenience, because it is always
309 possible to instead refer to a subset of bits from a field
314 Only ordinal fields (see <em>Level of Measurement</em>,
315 below) may have subfields. Subfields are always ordinal.
321 <dfn>Predicates</dfn>. A predicate is shorthand for a Boolean
322 expression. Predicates may be used much like 1-bit fields. For
323 example, <code>ip4</code> might expand to <code>eth.type ==
324 0x800</code>. Predicates are provided for syntactic convenience,
325 because it is always possible to instead specify the underlying
330 A predicate whose expansion refers to any nominal field or
331 predicate (see <em>Level of Measurement</em>, below) is nominal;
332 other predicates have Boolean level of measurement.
338 <em>Level of Measurement</em>. See
339 http://en.wikipedia.org/wiki/Level_of_measurement for the statistical
340 concept on which this classification is based. There are three
347 <dfn>Ordinal</dfn>. In statistics, ordinal values can be ordered
348 on a scale. OVN considers a field (or subfield) to be ordinal if
349 its bits can be examined individually. This is true for the
350 OpenFlow fields that OpenFlow or Open vSwitch makes ``maskable.''
354 Any use of a nominal field may specify a single bit or a range of
355 bits, e.g. <code>vlan.tci[13..15]</code> refers to the PCP field
356 within the VLAN TCI, and <code>eth.dst[40]</code> refers to the
357 multicast bit in the Ethernet destination address.
361 OVN supports all the usual arithmetic relations (<code>==</code>,
362 <code>!=</code>, <code><</code>, <code><=</code>,
363 <code>></code>, and <code>>=</code>) on ordinal fields and
364 their subfields, because OVN can implement these in OpenFlow and
365 Open vSwitch as collections of bitwise tests.
371 <dfn>Nominal</dfn>. In statistics, nominal values cannot be
372 usefully compared except for equality. This is true of OpenFlow
373 port numbers, Ethernet types, and IP protocols are examples: all of
374 these are just identifiers assigned arbitrarily with no deeper
375 meaning. In OpenFlow and Open vSwitch, bits in these fields
376 generally aren't individually addressable.
380 OVN only supports arithmetic tests for equality on nominal fields,
381 because OpenFlow and Open vSwitch provide no way for a flow to
382 efficiently implement other comparisons on them. (A test for
383 inequality can be sort of built out of two flows with different
384 priorities, but OVN matching expressions always generate flows with
389 String fields are always nominal.
395 <dfn>Boolean</dfn>. A nominal field that has only two values, 0
396 and 1, is somewhat exceptional, since it is easy to support both
397 equality and inequality tests on such a field: either one can be
398 implemented as a test for 0 or 1.
402 Only predicates (see above) have a Boolean level of measurement.
406 This isn't a standard level of measurement.
412 <em>Prerequisites</em>. Any symbol can have prerequisites, which are
413 additional condition implied by the use of the symbol. For example,
414 For example, <code>icmp4.type</code> symbol might have prerequisite
415 <code>icmp4</code>, which would cause an expression <code>icmp4.type ==
416 0</code> to be interpreted as <code>icmp4.type == 0 &&
417 icmp4</code>, which would in turn expand to <code>icmp4.type == 0
418 && eth.type == 0x800 && ip4.proto == 1</code> (assuming
419 <code>icmp4</code> is a predicate defined as suggested under
420 <em>Types</em> above).
423 <p><em>Relational operators</em></p>
426 All of the standard relational operators <code>==</code>,
427 <code>!=</code>, <code><</code>, <code><=</code>,
428 <code>></code>, and <code>>=</code> are supported. Nominal
429 fields support only <code>==</code> and <code>!=</code>, and only in a
430 positive sense when outer <code>!</code> are taken into account,
431 e.g. given string field <code>inport</code>, <code>inport ==
432 "eth0"</code> and <code>!(inport != "eth0")</code> are acceptable, but
433 not <code>inport != "eth0"</code>.
437 The implementation of <code>==</code> (or <code>!=</code> when it is
438 negated), is more efficient than that of the other relational
442 <p><em>Constants</em></p>
445 Integer constants may be expressed in decimal, hexadecimal prefixed by
446 <code>0x</code>, or as dotted-quad IPv4 addresses, IPv6 addresses in
447 their standard forms, or Ethernet addresses as colon-separated hex
448 digits. A constant in any of these forms may be followed by a slash
449 and a second constant (the mask) in the same form, to form a masked
450 constant. IPv4 and IPv6 masks may be given as integers, to express
455 String constants have the same syntax as quoted strings in JSON (thus,
456 they are Unicode strings). String constants are used for naming
457 logical ports. Thus, the useful values are <ref
458 column="logical_port"/> names from the <ref column="Bindings"/> and
459 <ref column="Gateway"/> table in a logical flow's <ref
460 column="logical_datapath"/>.
464 Some operators support sets of constants written inside curly braces
465 <code>{</code> ... <code>}</code>. Commas between elements of a set,
466 and after the last elements, are optional. With <code>==</code>,
467 ``<code><var>field</var> == { <var>constant1</var>,
468 <var>constant2</var>,</code> ... <code>}</code>'' is syntactic sugar
469 for ``<code><var>field</var> == <var>constant1</var> ||
470 <var>field</var> == <var>constant2</var> || </code>...<code></code>.
471 Similarly, ``<code><var>field</var> != { <var>constant1</var>,
472 <var>constant2</var>, </code>...<code> }</code>'' is equivalent to
473 ``<code><var>field</var> != <var>constant1</var> &&
474 <var>field</var> != <var>constant2</var> &&
475 </code>...<code></code>''.
478 <p><em>Miscellaneous</em></p>
481 Comparisons may name the symbol or the constant first,
482 e.g. <code>tcp.src == 80</code> and <code>80 == tcp.src</code> are both
487 Tests for a range may be expressed using a syntax like <code>1024 <=
488 tcp.src <= 49151</code>, which is equivalent to <code>1024 <=
489 tcp.src && tcp.src <= 49151</code>.
493 For a one-bit field or predicate, a mention of its name is equivalent
494 to <code><var>symobl</var> == 1</code>, e.g. <code>vlan.present</code>
495 is equivalent to <code>vlan.present == 1</code>. The same is true for
496 one-bit subfields, e.g. <code>vlan.tci[12]</code>. There is no
497 technical limitation to implementing the same for ordinal fields of all
498 widths, but the implementation is expensive enough that the syntax
499 parser requires writing an explicit comparison against zero to make
500 mistakes less likely, e.g. in <code>tcp.src != 0</code> the comparison
501 against 0 is required.
505 <em>Operator precedence</em> is as shown below, from highest to lowest.
506 There are two exceptions where parentheses are required even though the
507 table would suggest that they are not: <code>&&</code> and
508 <code>||</code> require parentheses when used together, and
509 <code>!</code> requires parentheses when applied to a relational
510 expression. Thus, in <code>(eth.type == 0x800 || eth.type == 0x86dd)
511 && ip.proto == 6</code> or <code>!(arp.op == 1)</code>, the
512 parentheses are mandatory.
516 <li><code>()</code></li>
517 <li><code>== != < <= > >=</code></li>
518 <li><code>!</code></li>
519 <li><code>&& ||</code></li>
523 <em>Comments</em> may be introduced by <code>//</code>, which extends
524 to the next new-line. Comments within a line may be bracketed by
525 <code>/*</code> and <code>*/</code>. Multiline comments are not
529 <p><em>Symbols</em></p>
533 <code>metadata</code> <code>reg0</code> ... <code>reg7</code>
534 <code>xreg0</code> ... <code>xreg3</code>
536 <li><code>inport</code> <code>outport</code> <code>queue</code></li>
537 <li><code>eth.src</code> <code>eth.dst</code> <code>eth.type</code></li>
538 <li><code>vlan.tci</code> <code>vlan.vid</code> <code>vlan.pcp</code> <code>vlan.present</code></li>
539 <li><code>ip.proto</code> <code>ip.dscp</code> <code>ip.ecn</code> <code>ip.ttl</code> <code>ip.frag</code></li>
540 <li><code>ip4.src</code> <code>ip4.dst</code></li>
541 <li><code>ip6.src</code> <code>ip6.dst</code> <code>ip6.label</code></li>
542 <li><code>arp.op</code> <code>arp.spa</code> <code>arp.tpa</code> <code>arp.sha</code> <code>arp.tha</code></li>
543 <li><code>tcp.src</code> <code>tcp.dst</code> <code>tcp.flags</code></li>
544 <li><code>udp.src</code> <code>udp.dst</code></li>
545 <li><code>sctp.src</code> <code>sctp.dst</code></li>
546 <li><code>icmp4.type</code> <code>icmp4.code</code></li>
547 <li><code>icmp6.type</code> <code>icmp6.code</code></li>
548 <li><code>nd.target</code> <code>nd.sll</code> <code>nd.tll</code></li>
553 <column name="actions">
555 Logical datapath actions, to be executed when the logical flow
556 represented by this row is the highest-priority match.
560 Actions share lexical syntax with the <ref column="match"/> column. An
561 empty set of actions (or one that contains just white space or
562 comments), or a set of actions that consists of just
563 <code>drop;</code>, causes the matched packets to be dropped.
564 Otherwise, the column should contain a sequence of actions, each
565 terminated by a semicolon.
569 The following actions will be initially supported:
573 <dt><code>output;</code></dt>
575 Outputs the packet to the logical port current designated by
576 <code>outport</code>. Output to the ingress port is implicitly
577 dropped, that is, <code>output</code> becomes a no-op if
578 <code>outport</code> == <code>inport</code>.
581 <dt><code>next;</code></dt>
583 Executes the next logical datapath table as a subroutine.
586 <dt><code><var>field</var> = <var>constant</var>;</code></dt>
588 Sets data or metadata field <var>field</var> to constant value
594 The following actions will likely be useful later, but they have not
595 been thought out carefully.
599 <dt><code><var>field1</var> = <var>field2</var>;</code></dt>
601 Extends the assignment action to allow copying between fields.
604 <dt><code>learn</code></dt>
606 <dt><code>conntrack</code></dt>
608 <dt><code>dec_ttl { <var>action</var>, </code>...<code> } { <var>action</var>; </code>...<code>};</code></dt>
610 decrement TTL; execute first set of actions if
611 successful, second set if TTL decrement fails
614 <dt><code>icmp_reply { <var>action</var>, </code>...<code> };</code></dt>
615 <dd>generate ICMP reply from packet, execute <var>action</var>s</dd>
617 <dt><code>arp { <var>action</var>, </code>...<code> }</code></dt>
618 <dd>generate ARP from packet, execute <var>action</var>s</dd>
623 <table name="Bindings" title="Physical-Logical Bindings">
625 Each row in this table identifies the physical location of a logical
630 For every <code>Logical_Port</code> record in <code>OVN_Northbound</code>
631 database, <code>ovn-northd</code> creates a record in this table.
632 <code>ovn-northd</code> populates and maintains every column except
633 the <code>chassis</code> column, which it leaves empty in new records.
637 <code>ovn-controller</code> populates the <code>chassis</code> column
638 for the records that identify the logical ports that are located on its
639 hypervisor, which <code>ovn-controller</code> in turn finds out by
640 monitoring the local hypervisor's Open_vSwitch database, which
641 identifies logical ports via the conventions described in
642 <code>IntegrationGuide.md</code>.
646 When a chassis shuts down gracefully, it should cleanup the
647 <code>chassis</code> column that it previously had populated.
648 (This is not critical because resources hosted on the chassis are equally
649 unreachable regardless of whether their rows are present.) To handle the
650 case where a VM is shut down abruptly on one chassis, then brought up
651 again on a different one, <code>ovn-controller</code> must overwrite the
652 <code>chassis</code> column with new information.
655 <column name="logical_datapath">
656 The logical datapath to which the logical port belongs. A logical
657 datapath implements a logical pipeline via logical flows in the <ref
658 table="Pipeline"/> table. (No table represents a logical datapath.)
661 <column name="logical_port">
662 A logical port, taken from <ref table="Logical_Port" column="name"
663 db="OVN_Northbound"/> in the OVN_Northbound database's
664 <ref table="Logical_Port" db="OVN_Northbound"/> table. OVN does not
665 prescribe a particular format for the logical port ID.
668 <column name="parent_port">
669 For containers created inside a VM, this is taken from
670 <ref table="Logical_Port" column="parent_name" db="OVN_Northbound"/>
671 in the OVN_Northbound database's <ref table="Logical_Port"
672 db="OVN_Northbound"/> table. It is left empty if
673 <ref column="logical_port"/> belongs to a VM or a container created
678 When <ref column="logical_port"/> identifies the interface of a container
679 spawned inside a VM, this column identifies the VLAN tag in
680 the network traffic associated with that container's network interface.
681 It is left empty if <ref column="logical_port"/> belongs to a VM or a
682 container created in the hypervisor.
685 <column name="chassis">
686 The physical location of the logical port. To successfully identify a
687 chassis, this column must match the <ref table="Chassis" column="name"/>
688 column in some row in the <ref table="Chassis"/> table. This is
689 populated by <code>ovn-controller</code>.
694 The Ethernet address or addresses used as a source address on the
695 logical port, each in the form
696 <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>.
697 The string <code>unknown</code> is also allowed to indicate that the
698 logical port has an unknown set of (additional) source addresses.
702 A VM interface would ordinarily have a single Ethernet address. A
703 gateway port might initially only have <code>unknown</code>, and then
704 add MAC addresses to the set as it learns new source addresses.