1 <?xml version="1.0" encoding="utf-8"?>
2 <database name="ovn-sb" title="OVN Southbound Database">
4 This database holds logical and physical configuration and state for the
5 Open Virtual Network (OVN) system to support virtual network abstraction.
6 For an introduction to OVN, please see <code>ovn-architecture</code>(7).
10 The OVN Southbound database sits at the center of the OVN
11 architecture. It is the one component that speaks both southbound
12 directly to all the hypervisors and gateways, via
13 <code>ovn-controller</code>, and northbound to the Cloud Management
14 System, via <code>ovn-nbd</code>:
17 <h2>Database Structure</h2>
20 The OVN Southbound database contains three classes of data with
21 different properties, as described in the sections below.
24 <h3>Physical Network (PN) data</h3>
27 PN tables contain information about the chassis nodes in the system. This
28 contains all the information necessary to wire the overlay, such as IP
29 addresses, supported tunnel types, and security keys.
33 The amount of PN data is small (O(n) in the number of chassis) and it
34 changes infrequently, so it can be replicated to every chassis.
38 The <ref table="Chassis"/> and <ref table="Gateway"/> tables comprise the
42 <h3>Logical Network (LN) data</h3>
45 LN tables contain the topology of logical switches and routers, ACLs,
46 firewall rules, and everything needed to describe how packets traverse a
47 logical network, represented as logical datapath flows (see Logical
48 Datapath Flows, below).
52 LN data may be large (O(n) in the number of logical ports, ACL rules,
53 etc.). Thus, to improve scaling, each chassis should receive only data
54 related to logical networks in which that chassis participates. Past
55 experience shows that in the presence of large logical networks, even
56 finer-grained partitioning of data, e.g. designing logical flows so that
57 only the chassis hosting a logical port needs related flows, pays off
58 scale-wise. (This is not necessary initially but it is worth bearing in
63 The LN is a slave of the cloud management system running northbound of OVN.
64 That CMS determines the entire OVN logical configuration and therefore the
65 LN's content at any given time is a deterministic function of the CMS's
66 configuration, although that happens indirectly via the OVN Northbound DB
67 and <code>ovn-nbd</code>.
71 LN data is likely to change more quickly than PN data. This is especially
72 true in a container environment where VMs are created and destroyed (and
73 therefore added to and deleted from logical switches) quickly.
77 The <ref table="Pipeline"/> table is currently the only LN table.
80 <h3>Bindings data</h3>
83 The Bindings tables contain the current placement of logical components
84 (such as VMs and VIFs) onto chassis and the bindings between logical ports
89 Bindings change frequently, at least every time a VM powers up or down
90 or migrates, and especially quickly in a container environment. The
91 amount of data per VM (or VIF) is small.
95 Each chassis is authoritative about the VMs and VIFs that it hosts at any
96 given time and can efficiently flood that state to a central location, so
97 the consistency needs are minimal.
101 The <ref table="Bindings"/> table is currently the only Bindings table.
104 <table name="Chassis" title="Physical Network Hypervisor and Gateway Information">
106 Each row in this table represents a hypervisor or gateway (a chassis) in
107 the physical network (PN). Each chassis, via
108 <code>ovn-controller</code>, adds and updates its own row, and keeps a
109 copy of the remaining rows to determine how to reach other hypervisors.
113 When a chassis shuts down gracefully, it should remove its own row.
114 (This is not critical because resources hosted on the chassis are equally
115 unreachable regardless of whether the row is present.) If a chassis
116 shuts down permanently without removing its row, some kind of manual or
117 automatic cleanup is eventually needed; we can devise a process for that
122 A chassis name, taken from <ref key="system-id" table="Open_vSwitch"
123 column="external_ids" db="Open_vSwitch"/> in the Open_vSwitch
124 database's <ref table="Open_vSwitch" db="Open_vSwitch"/> table. OVN does
125 not prescribe a particular format for chassis names.
128 <group title="Encapsulation Configuration">
130 OVN uses encapsulation to transmit logical dataplane packets
134 <column name="encaps">
135 Points to supported encapsulation configurations to transmit
136 logical dataplane packets to this chassis. Each entry is a <ref
137 table="Encap"/> record that describes the configuration.
141 <group title="Gateway Configuration">
143 A <dfn>gateway</dfn> is a chassis that forwards traffic between a
144 logical network and a physical VLAN. Gateways are typically dedicated
145 nodes that do not host VMs.
148 <column name="gateway_ports">
149 Maps from the name of a gateway port, which is typically a physical
150 port (e.g. <code>eth1</code>) or an Open vSwitch patch port, to a <ref
151 table="Gateway"/> record that describes the details of the gatewaying
157 <table name="Encap" title="Encapsulation Types">
159 The <ref column="encaps" table="Chassis"/> column in the <ref
160 table="Chassis"/> table refers to rows in this table to identify
161 how OVN may transmit logical dataplane packets to this chassis.
162 Each chassis, via <code>ovn-controller</code>(8), adds and updates
163 its own rows and keeps a copy of the remaining rows to determine
164 how to reach other chassis.
168 The encapsulation to use to transmit packets to this chassis.
169 Examples include <code>geneve</code>, <code>vxlan</code>, and
173 <column name="options">
174 Options for configuring the encapsulation, e.g. IPsec parameters when
175 IPsec support is introduced. No options are currently defined.
179 The IPv4 address of the encapsulation tunnel endpoint.
183 <table name="Gateway" title="Physical Network Gateway Ports">
185 The <ref column="gateway_ports" table="Chassis"/> column in the <ref
186 table="Chassis"/> table refers to rows in this table to connect a chassis
187 port to a gateway function. Each row in this table describes the logical
188 networks to which a gateway port is attached. Each chassis, via
189 <code>ovn-controller</code>(8), adds and updates its own rows, if any
190 (since most chassis are not gateways), and keeps a copy of the remaining
191 rows to determine how to reach other chassis.
194 <column name="vlan_map">
195 Maps from a VLAN ID to a logical port name. Thus, each named logical
196 port corresponds to one VLAN on the gateway port.
199 <column name="attached_port">
200 The name of the gateway port in the chassis's Open vSwitch integration
205 <table name="Pipeline" title="Logical Network Pipeline">
207 Each row in this table represents one logical flow. The cloud management
208 system, via its OVN integration, populates this table with logical flows
209 that implement the L2 and L3 topology specified in the CMS configuration.
210 Each hypervisor, via <code>ovn-controller</code>, translates the logical
211 flows into OpenFlow flows specific to its hypervisor and installs them
216 Logical flows are expressed in an OVN-specific format, described here. A
217 logical datapath flow is much like an OpenFlow flow, except that the
218 flows are written in terms of logical ports and logical datapaths instead
219 of physical ports and physical datapaths. Translation between logical
220 and physical flows helps to ensure isolation between logical datapaths.
221 (The logical flow abstraction also allows the CMS to do less work, since
222 it does not have to separately compute and push out physical physical
223 flows to each chassis.)
227 The default action when no flow matches is to drop packets.
230 <column name="table_id">
231 The stage in the logical pipeline, analogous to an OpenFlow table number.
234 <column name="priority">
235 The flow's priority. Flows with numerically higher priority take
236 precedence over those with lower. If two logical datapath flows with the
237 same priority both match, then the one actually applied to the packet is
241 <column name="match">
243 A matching expression. OVN provides a superset of OpenFlow matching
244 capabilities, using a syntax similar to Boolean expressions in a
245 programming language.
249 The most important components of match expression are
250 <dfn>comparisons</dfn> between <dfn>symbols</dfn> and
251 <dfn>constants</dfn>, e.g. <code>ip4.dst == 192.168.0.1</code>,
252 <code>ip.proto == 6</code>, <code>arp.op == 1</code>, <code>eth.type ==
253 0x800</code>. The logical AND operator <code>&&</code> and
254 logical OR operator <code>||</code> can combine comparisons into a
259 Matching expressions also support parentheses for grouping, the logical
260 NOT prefix operator <code>!</code>, and literals <code>0</code> and
261 <code>1</code> to express ``false'' or ``true,'' respectively. The
262 latter is useful by itself as a catch-all expression that matches every
266 <p><em>Symbols</em></p>
269 <em>Type</em>. Symbols have <dfn>integer</dfn> or <dfn>string</dfn>
270 type. Integer symbols have a <dfn>width</dfn> in bits.
274 <em>Kinds</em>. There are three kinds of symbols:
280 <dfn>Fields</dfn>. A field symbol represents a packet header or
281 metadata field. For example, a field
282 named <code>vlan.tci</code> might represent the VLAN TCI field in a
287 A field symbol can have integer or string type. Integer fields can
288 be nominal or ordinal (see <em>Level of Measurement</em>,
295 <dfn>Subfields</dfn>. A subfield represents a subset of bits from
296 a larger field. For example, a field <code>vlan.vid</code> might
297 be defined as an alias for <code>vlan.tci[0..11]</code>. Subfields
298 are provided for syntactic convenience, because it is always
299 possible to instead refer to a subset of bits from a field
304 Only ordinal fields (see <em>Level of Measurement</em>,
305 below) may have subfields. Subfields are always ordinal.
311 <dfn>Predicates</dfn>. A predicate is shorthand for a Boolean
312 expression. Predicates may be used much like 1-bit fields. For
313 example, <code>ip4</code> might expand to <code>eth.type ==
314 0x800</code>. Predicates are provided for syntactic convenience,
315 because it is always possible to instead specify the underlying
320 A predicate whose expansion refers to any nominal field or
321 predicate (see <em>Level of Measurement</em>, below) is nominal;
322 other predicates have Boolean level of measurement.
328 <em>Level of Measurement</em>. See
329 http://en.wikipedia.org/wiki/Level_of_measurement for the statistical
330 concept on which this classification is based. There are three
337 <dfn>Ordinal</dfn>. In statistics, ordinal values can be ordered
338 on a scale. OVN considers a field (or subfield) to be ordinal if
339 its bits can be examined individually. This is true for the
340 OpenFlow fields that OpenFlow or Open vSwitch makes ``maskable.''
344 Any use of a nominal field may specify a single bit or a range of
345 bits, e.g. <code>vlan.tci[13..15]</code> refers to the PCP field
346 within the VLAN TCI, and <code>eth.dst[40]</code> refers to the
347 multicast bit in the Ethernet destination address.
351 OVN supports all the usual arithmetic relations (<code>==</code>,
352 <code>!=</code>, <code><</code>, <code><=</code>,
353 <code>></code>, and <code>>=</code>) on ordinal fields and
354 their subfields, because OVN can implement these in OpenFlow and
355 Open vSwitch as collections of bitwise tests.
361 <dfn>Nominal</dfn>. In statistics, nominal values cannot be
362 usefully compared except for equality. This is true of OpenFlow
363 port numbers, Ethernet types, and IP protocols are examples: all of
364 these are just identifiers assigned arbitrarily with no deeper
365 meaning. In OpenFlow and Open vSwitch, bits in these fields
366 generally aren't individually addressable.
370 OVN only supports arithmetic tests for equality on nominal fields,
371 because OpenFlow and Open vSwitch provide no way for a flow to
372 efficiently implement other comparisons on them. (A test for
373 inequality can be sort of built out of two flows with different
374 priorities, but OVN matching expressions always generate flows with
379 String fields are always nominal.
385 <dfn>Boolean</dfn>. A nominal field that has only two values, 0
386 and 1, is somewhat exceptional, since it is easy to support both
387 equality and inequality tests on such a field: either one can be
388 implemented as a test for 0 or 1.
392 Only predicates (see above) have a Boolean level of measurement.
396 This isn't a standard level of measurement.
402 <em>Prerequisites</em>. Any symbol can have prerequisites, which are
403 additional condition implied by the use of the symbol. For example,
404 For example, <code>icmp4.type</code> symbol might have prerequisite
405 <code>icmp4</code>, which would cause an expression <code>icmp4.type ==
406 0</code> to be interpreted as <code>icmp4.type == 0 &&
407 icmp4</code>, which would in turn expand to <code>icmp4.type == 0
408 && eth.type == 0x800 && ip4.proto == 1</code> (assuming
409 <code>icmp4</code> is a predicate defined as suggested under
410 <em>Types</em> above).
413 <p><em>Relational operators</em></p>
416 All of the standard relational operators <code>==</code>,
417 <code>!=</code>, <code><</code>, <code><=</code>,
418 <code>></code>, and <code>>=</code> are supported. Nominal
419 fields support only <code>==</code> and <code>!=</code>, and only in a
420 positive sense when outer <code>!</code> are taken into account,
421 e.g. given string field <code>inport</code>, <code>inport ==
422 "eth0"</code> and <code>!(inport != "eth0")</code> are acceptable, but
423 not <code>inport != "eth0"</code>.
427 The implementation of <code>==</code> (or <code>!=</code> when it is
428 negated), is more efficient than that of the other relational
432 <p><em>Constants</em></p>
435 Integer constants may be expressed in decimal, hexadecimal prefixed by
436 <code>0x</code>, or as dotted-quad IPv4 addresses, IPv6 addresses in
437 their standard forms, or Ethernet addresses as colon-separated hex
438 digits. A constant in any of these forms may be followed by a slash
439 and a second constant (the mask) in the same form, to form a masked
440 constant. IPv4 and IPv6 masks may be given as integers, to express
445 String constants have the same syntax as quoted strings in JSON (thus,
446 they are Unicode strings). String constants are used for naming
447 logical ports. Thus, the useful values are <ref
448 column="logical_port"/> names from the <ref column="Bindings"/> and
449 <ref column="Gateway"/> table.
453 Some operators support sets of constants written inside curly braces
454 <code>{</code> ... <code>}</code>. Commas between elements of a set,
455 and after the last elements, are optional. With <code>==</code>,
456 ``<code><var>field</var> == { <var>constant1</var>,
457 <var>constant2</var>,</code> ... <code>}</code>'' is syntactic sugar
458 for ``<code><var>field</var> == <var>constant1</var> ||
459 <var>field</var> == <var>constant2</var> || </code>...<code></code>.
460 Similarly, ``<code><var>field</var> != { <var>constant1</var>,
461 <var>constant2</var>, </code>...<code> }</code>'' is equivalent to
462 ``<code><var>field</var> != <var>constant1</var> &&
463 <var>field</var> != <var>constant2</var> &&
464 </code>...<code></code>''.
467 <p><em>Miscellaneous</em></p>
470 Comparisons may name the symbol or the constant first,
471 e.g. <code>tcp.src == 80</code> and <code>80 == tcp.src</code> are both
476 Tests for a range may be expressed using a syntax like <code>1024 <=
477 tcp.src <= 49151</code>, which is equivalent to <code>1024 <=
478 tcp.src && tcp.src <= 49151</code>.
482 For a one-bit field or predicate, a mention of its name is equivalent
483 to <code><var>symobl</var> == 1</code>, e.g. <code>vlan.present</code>
484 is equivalent to <code>vlan.present == 1</code>. The same is true for
485 one-bit subfields, e.g. <code>vlan.tci[12]</code>. There is no
486 technical limitation to implementing the same for ordinal fields of all
487 widths, but the implementation is expensive enough that the syntax
488 parser requires writing an explicit comparison against zero to make
489 mistakes less likely, e.g. in <code>tcp.src != 0</code> the comparison
490 against 0 is required.
494 <em>Operator precedence</em> is as shown below, from highest to lowest.
495 There are two exceptions where parentheses are required even though the
496 table would suggest that they are not: <code>&&</code> and
497 <code>||</code> require parentheses when used together, and
498 <code>!</code> requires parentheses when applied to a relational
499 expression. Thus, in <code>(eth.type == 0x800 || eth.type == 0x86dd)
500 && ip.proto == 6</code> or <code>!(arp.op == 1)</code>, the
501 parentheses are mandatory.
505 <li><code>()</code></li>
506 <li><code>== != < <= > >=</code></li>
507 <li><code>!</code></li>
508 <li><code>&& ||</code></li>
512 <em>Comments</em> may be introduced by <code>//</code>, which extends
513 to the next new-line. Comments within a line may be bracketed by
514 <code>/*</code> and <code>*/</code>. Multiline comments are not
518 <p><em>Symbols</em></p>
522 <code>metadata</code> <code>reg0</code> ... <code>reg7</code>
523 <code>xreg0</code> ... <code>xreg3</code>
525 <li><code>inport</code> <code>outport</code> <code>queue</code></li>
526 <li><code>eth.src</code> <code>eth.dst</code> <code>eth.type</code></li>
527 <li><code>vlan.tci</code> <code>vlan.vid</code> <code>vlan.pcp</code> <code>vlan.present</code></li>
528 <li><code>ip.proto</code> <code>ip.dscp</code> <code>ip.ecn</code> <code>ip.ttl</code> <code>ip.frag</code></li>
529 <li><code>ip4.src</code> <code>ip4.dst</code></li>
530 <li><code>ip6.src</code> <code>ip6.dst</code> <code>ip6.label</code></li>
531 <li><code>arp.op</code> <code>arp.spa</code> <code>arp.tpa</code> <code>arp.sha</code> <code>arp.tha</code></li>
532 <li><code>tcp.src</code> <code>tcp.dst</code> <code>tcp.flags</code></li>
533 <li><code>udp.src</code> <code>udp.dst</code></li>
534 <li><code>sctp.src</code> <code>sctp.dst</code></li>
535 <li><code>icmp4.type</code> <code>icmp4.code</code></li>
536 <li><code>icmp6.type</code> <code>icmp6.code</code></li>
537 <li><code>nd.target</code> <code>nd.sll</code> <code>nd.tll</code></li>
542 <column name="actions">
544 Below, a <var>value</var> is either a <var>constant</var> or a
545 <var>field</var>. The following actions seem most likely to be useful:
549 <dt><code>drop;</code></dt>
550 <dd>syntactic sugar for no actions</dd>
552 <dt><code>output(<var>value</var>);</code></dt>
553 <dd>output to port</dd>
555 <dt><code>broadcast;</code></dt>
556 <dd>output to every logical port except ingress port</dd>
558 <dt><code>resubmit;</code></dt>
559 <dd>execute next logical datapath table as subroutine</dd>
561 <dt><code>set(<var>field</var>=<var>value</var>);</code></dt>
562 <dd>set data or metadata field, or copy between fields</dd>
566 Following are not well thought out:
570 <dt><code>learn</code></dt>
572 <dt><code>conntrack</code></dt>
574 <dt><code>with(<var>field</var>=<var>value</var>) { <var>action</var>, </code>...<code> }</code></dt>
575 <dd>execute <var>actions</var> with temporary changes to <var>fields</var></dd>
577 <dt><code>dec_ttl { <var>action</var>, </code>...<code> } { <var>action</var>; </code>...<code>}</code></dt>
579 decrement TTL; execute first set of actions if
580 successful, second set if TTL decrement fails
583 <dt><code>icmp_reply { <var>action</var>, </code>...<code> }</code></dt>
584 <dd>generate ICMP reply from packet, execute <var>action</var>s</dd>
586 <dt><code>arp { <var>action</var>, </code>...<code> }</code></dt>
587 <dd>generate ARP from packet, execute <var>action</var>s</dd>
591 Other actions can be added as needed
592 (e.g. <code>push_vlan</code>, <code>pop_vlan</code>,
593 <code>push_mpls</code>, <code>pop_mpls</code>).
597 Some of the OVN actions do not map directly to OpenFlow actions, e.g.:
602 <code>with</code>: Implemented as <code>stack_push;
603 set(</code>...<code>); <var>actions</var>; stack_pop</code>.
607 <code>dec_ttl</code>: Implemented as <code>dec_ttl</code> followed
608 by the successful actions. The failure case has to be implemented by
609 ovn-controller interpreting packet-ins. It might be difficult to
610 identify the particular place in the processing pipeline in
611 <code>ovn-controller</code>; maybe some restrictions will be
616 <code>icmp_reply</code>: Implemented by sending the packet to
617 <code>ovn-controller</code>, which generates the ICMP reply and sends
618 the packet back to <code>ovs-vswitchd</code>.
624 <table name="Bindings" title="Physical-Logical Bindings">
626 Each row in this table identifies the physical location of a logical
631 For every <code>Logical_Port</code> record in <code>OVN_Northbound</code>
632 database, <code>ovn-nbd</code> creates a record in this table.
633 <code>ovn-nbd</code> populates and maintains every column except
634 the <code>chassis</code> column, which it leaves empty in new records.
638 <code>ovn-controller</code> populates the <code>chassis</code> column
639 for the records that identify the logical ports that are located on its
640 hypervisor, which <code>ovn-controller</code> in turn finds out by
641 monitoring the local hypervisor's Open_vSwitch database, which
642 identifies logical ports via the conventions described in
643 <code>IntegrationGuide.md</code>.
647 When a chassis shuts down gracefully, it should cleanup the
648 <code>chassis</code> column that it previously had populated.
649 (This is not critical because resources hosted on the chassis are equally
650 unreachable regardless of whether their rows are present.) To handle the
651 case where a VM is shut down abruptly on one chassis, then brought up
652 again on a different one, <code>ovn-controller</code> must overwrite the
653 <code>chassis</code> column with new information.
656 <column name="logical_port">
657 A logical port, taken from <ref table="Logical_Port" column="name"
658 db="OVN_Northbound"/> in the OVN_Northbound database's
659 <ref table="Logical_Port" db="OVN_Northbound"/> table. OVN does not
660 prescribe a particular format for the logical port ID.
663 <column name="parent_port">
664 For containers created inside a VM, this is taken from
665 <ref table="Logical_Port" column="parent_name" db="OVN_Northbound"/>
666 in the OVN_Northbound database's <ref table="Logical_Port"
667 db="OVN_Northbound"/> table. It is left empty if
668 <ref column="logical_port"/> belongs to a VM or a container created
673 When <ref column="logical_port"/> identifies the interface of a container
674 spawned inside a VM, this column identifies the VLAN tag in
675 the network traffic associated with that container's network interface.
676 It is left empty if <ref column="logical_port"/> belongs to a VM or a
677 container created in the hypervisor.
680 <column name="chassis">
681 The physical location of the logical port. To successfully identify a
682 chassis, this column must match the <ref table="Chassis" column="name"/>
683 column in some row in the <ref table="Chassis"/> table. This is
684 populated by <code>ovn-controller</code>.
689 The Ethernet address or addresses used as a source address on the
690 logical port, each in the form
691 <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>.
692 The string <code>unknown</code> is also allowed to indicate that the
693 logical port has an unknown set of (additional) source addresses.
697 A VM interface would ordinarily have a single Ethernet address. A
698 gateway port might initially only have <code>unknown</code>, and then
699 add MAC addresses to the set as it learns new source addresses.