1 <?xml version="1.0" encoding="utf-8"?>
2 <database name="ovn-sb" title="OVN Southbound Database">
4 This database holds logical and physical configuration and state for the
5 Open Virtual Network (OVN) system to support virtual network abstraction.
6 For an introduction to OVN, please see <code>ovn-architecture</code>(7).
10 The OVN Southbound database sits at the center of the OVN
11 architecture. It is the one component that speaks both southbound
12 directly to all the hypervisors and gateways, via
13 <code>ovn-controller</code>, and northbound to the Cloud Management
14 System, via <code>ovn-nbd</code>:
17 <h2>Database Structure</h2>
20 The OVN Southbound database contains three classes of data with
21 different properties, as described in the sections below.
24 <h3>Physical Network (PN) data</h3>
27 PN tables contain information about the chassis nodes in the system. This
28 contains all the information necessary to wire the overlay, such as IP
29 addresses, supported tunnel types, and security keys.
33 The amount of PN data is small (O(n) in the number of chassis) and it
34 changes infrequently, so it can be replicated to every chassis.
38 The <ref table="Chassis"/> and <ref table="Gateway"/> tables comprise the
42 <h3>Logical Network (LN) data</h3>
45 LN tables contain the topology of logical switches and routers, ACLs,
46 firewall rules, and everything needed to describe how packets traverse a
47 logical network, represented as logical datapath flows (see Logical
48 Datapath Flows, below).
52 LN data may be large (O(n) in the number of logical ports, ACL rules,
53 etc.). Thus, to improve scaling, each chassis should receive only data
54 related to logical networks in which that chassis participates. Past
55 experience shows that in the presence of large logical networks, even
56 finer-grained partitioning of data, e.g. designing logical flows so that
57 only the chassis hosting a logical port needs related flows, pays off
58 scale-wise. (This is not necessary initially but it is worth bearing in
63 The LN is a slave of the cloud management system running northbound of OVN.
64 That CMS determines the entire OVN logical configuration and therefore the
65 LN's content at any given time is a deterministic function of the CMS's
66 configuration, although that happens indirectly via the OVN Northbound DB
67 and <code>ovn-nbd</code>.
71 LN data is likely to change more quickly than PN data. This is especially
72 true in a container environment where VMs are created and destroyed (and
73 therefore added to and deleted from logical switches) quickly.
77 The <ref table="Pipeline"/> table is currently the only LN table.
80 <h3>Bindings data</h3>
83 The Bindings tables contain the current placement of logical components
84 (such as VMs and VIFs) onto chassis and the bindings between logical ports
89 Bindings change frequently, at least every time a VM powers up or down
90 or migrates, and especially quickly in a container environment. The
91 amount of data per VM (or VIF) is small.
95 Each chassis is authoritative about the VMs and VIFs that it hosts at any
96 given time and can efficiently flood that state to a central location, so
97 the consistency needs are minimal.
101 The <ref table="Bindings"/> table is currently the only Bindings table.
104 <table name="Chassis" title="Physical Network Hypervisor and Gateway Information">
106 Each row in this table represents a hypervisor or gateway (a chassis) in
107 the physical network (PN). Each chassis, via
108 <code>ovn-controller</code>, adds and updates its own row, and keeps a
109 copy of the remaining rows to determine how to reach other hypervisors.
113 When a chassis shuts down gracefully, it should remove its own row.
114 (This is not critical because resources hosted on the chassis are equally
115 unreachable regardless of whether the row is present.) If a chassis
116 shuts down permanently without removing its row, some kind of manual or
117 automatic cleanup is eventually needed; we can devise a process for that
122 A chassis name, taken from <ref key="system-id" table="Open_vSwitch"
123 column="external_ids" db="Open_vSwitch"/> in the Open_vSwitch
124 database's <ref table="Open_vSwitch" db="Open_vSwitch"/> table. OVN does
125 not prescribe a particular format for chassis names.
128 <group title="Encapsulation Configuration">
130 OVN uses encapsulation to transmit logical dataplane packets
134 <column name="encaps">
135 Points to supported encapsulation configurations to transmit
136 logical dataplane packets to this chassis. Each entry is a <ref
137 table="Encap"/> record that describes the configuration.
141 <group title="Gateway Configuration">
143 A <dfn>gateway</dfn> is a chassis that forwards traffic between a
144 logical network and a physical VLAN. Gateways are typically dedicated
145 nodes that do not host VMs.
148 <column name="gateway_ports">
149 Maps from the name of a gateway port, which is typically a physical
150 port (e.g. <code>eth1</code>) or an Open vSwitch patch port, to a <ref
151 table="Gateway"/> record that describes the details of the gatewaying
157 <table name="Encap" title="Encapsulation Types">
159 The <ref column="encaps" table="Chassis"/> column in the <ref
160 table="Chassis"/> table refers to rows in this table to identify
161 how OVN may transmit logical dataplane packets to this chassis.
162 Each chassis, via <code>ovn-controller</code>(8), adds and updates
163 its own rows and keeps a copy of the remaining rows to determine
164 how to reach other chassis.
168 The encapsulation to use to transmit packets to this chassis.
169 Examples include <code>geneve</code>, <code>vxlan</code>, and
173 <column name="options">
174 Options for configuring the encapsulation, e.g. IPsec parameters when
175 IPsec support is introduced. No options are currently defined.
179 The IPv4 address of the encapsulation tunnel endpoint.
183 <table name="Gateway" title="Physical Network Gateway Ports">
185 The <ref column="gateway_ports" table="Chassis"/> column in the <ref
186 table="Chassis"/> table refers to rows in this table to connect a chassis
187 port to a gateway function. Each row in this table describes the logical
188 networks to which a gateway port is attached. Each chassis, via
189 <code>ovn-controller</code>(8), adds and updates its own rows, if any
190 (since most chassis are not gateways), and keeps a copy of the remaining
191 rows to determine how to reach other chassis.
194 <column name="vlan_map">
195 Maps from a VLAN ID to a logical port name. Thus, each named logical
196 port corresponds to one VLAN on the gateway port.
199 <column name="attached_port">
200 The name of the gateway port in the chassis's Open vSwitch integration
205 <table name="Pipeline" title="Logical Network Pipeline">
207 Each row in this table represents one logical flow. The cloud management
208 system, via its OVN integration, populates this table with logical flows
209 that implement the L2 and L3 topology specified in the CMS configuration.
210 Each hypervisor, via <code>ovn-controller</code>, translates the logical
211 flows into OpenFlow flows specific to its hypervisor and installs them
216 Logical flows are expressed in an OVN-specific format, described here. A
217 logical datapath flow is much like an OpenFlow flow, except that the
218 flows are written in terms of logical ports and logical datapaths instead
219 of physical ports and physical datapaths. Translation between logical
220 and physical flows helps to ensure isolation between logical datapaths.
221 (The logical flow abstraction also allows the CMS to do less work, since
222 it does not have to separately compute and push out physical physical
223 flows to each chassis.)
227 The default action when no flow matches is to drop packets.
230 <column name="table_id">
231 The stage in the logical pipeline, analogous to an OpenFlow table number.
234 <column name="priority">
235 The flow's priority. Flows with numerically higher priority take
236 precedence over those with lower. If two logical datapath flows with the
237 same priority both match, then the one actually applied to the packet is
241 <column name="match">
243 A matching expression. OVN provides a superset of OpenFlow matching
244 capabilities, using a syntax similar to Boolean expressions in a
245 programming language.
249 Matching expressions have two important kinds of primary expression:
250 <dfn>fields</dfn> and <dfn>constants</dfn>. A field names a piece of
251 data or metadata. The supported fields are:
256 <code>metadata</code> <code>reg0</code> ... <code>reg7</code>
257 <code>xreg0</code> ... <code>xreg3</code>
259 <li><code>inport</code> <code>outport</code> <code>queue</code></li>
260 <li><code>eth.src</code> <code>eth.dst</code> <code>eth.type</code></li>
261 <li><code>vlan.tci</code> <code>vlan.vid</code> <code>vlan.pcp</code> <code>vlan.present</code></li>
262 <li><code>ip.proto</code> <code>ip.dscp</code> <code>ip.ecn</code> <code>ip.ttl</code> <code>ip.frag</code></li>
263 <li><code>ip4.src</code> <code>ip4.dst</code></li>
264 <li><code>ip6.src</code> <code>ip6.dst</code> <code>ip6.label</code></li>
265 <li><code>arp.op</code> <code>arp.spa</code> <code>arp.tpa</code> <code>arp.sha</code> <code>arp.tha</code></li>
266 <li><code>tcp.src</code> <code>tcp.dst</code> <code>tcp.flags</code></li>
267 <li><code>udp.src</code> <code>udp.dst</code></li>
268 <li><code>sctp.src</code> <code>sctp.dst</code></li>
269 <li><code>icmp4.type</code> <code>icmp4.code</code></li>
270 <li><code>icmp6.type</code> <code>icmp6.code</code></li>
271 <li><code>nd.target</code> <code>nd.sll</code> <code>nd.tll</code></li>
275 Subfields may be addressed using a <code>[]</code> suffix,
276 e.g. <code>tcp.src[0..7]</code> refers to the low 8 bits of the TCP
277 source port. A subfield may be used in any context a field is allowed.
281 Some fields have prerequisites. OVN implicitly adds clauses to satisfy
282 these. For example, <code>arp.op == 1</code> is equivalent to
283 <code>eth.type == 0x0806 && arp.op == 1</code>, and
284 <code>tcp.src == 80</code> is equivalent to <code>(eth.type == 0x0800
285 || eth.type == 0x86dd) && ip.proto == 6 && tcp.src ==
290 Most fields have integer values. Integer constants may be expressed in
291 several forms: decimal integers, hexadecimal integers prefixed by
292 <code>0x</code>, dotted-quad IPv4 addresses, IPv6 addresses in their
293 standard forms, and as Ethernet addresses as colon-separated hex
294 digits. A constant in any of these forms may be followed by a slash
295 and a second constant (the mask) in the same form, to form a masked
296 constant. IPv4 and IPv6 masks may be given as integers, to express
301 The <code>inport</code> and <code>outport</code> fields have string
302 values. The useful values are <ref column="logical_port"/> names from
303 the <ref column="Bindings"/> and <ref column="Gateway"/> table.
307 The available operators, from highest to lowest precedence, are:
311 <li><code>()</code></li>
312 <li><code>== != < <= > >= in not in</code></li>
313 <li><code>!</code></li>
314 <li><code>&&</code></li>
315 <li><code>||</code></li>
319 The <code>()</code> operator is used for grouping.
323 The equality operator <code>==</code> is the most important operator.
324 Its operands must be a field and an optionally masked constant, in
325 either order. The <code>==</code> operator yields true when the
326 field's value equals the constant's value for all the bits included in
327 the mask. The <code>==</code> operator translates simply and naturally
332 The inequality operator <code>!=</code> yields the inverse of
333 <code>==</code> but its syntax and use are the same. Implementation of
334 the inequality operator is expensive.
338 The relational operators are <, <=, >, and >=. Their
339 operands must be a field and a constant, in either order; the constant
340 must not be masked. These operators are most commonly useful for L4
341 ports, e.g. <code>tcp.src < 1024</code>. Implementation of the
342 relational operators is expensive.
346 The set membership operator <code>in</code>, with syntax
347 ``<code><var>field</var> in { <var>constant1</var>,
348 <var>constant2</var>,</code> ... <code>}</code>'', is syntactic sugar
349 for ``<code>(<var>field</var> == <var>constant1</var> ||
350 <var>field</var> == <var>constant2</var> || </code>...<code>)</code>.
351 Conversely, ``<code><var>field</var> not in { <var>constant1</var>,
352 <var>constant2</var>, </code>...<code> }</code>'' is syntactic sugar
353 for ``<code>(<var>field</var> != <var>constant1</var> &&
354 <var>field</var> != <var>constant2</var> &&
355 </code>...<code>)</code>''.
359 The unary prefix operator <code>!</code> yields its operand's inverse.
363 The logical AND operator <code>&&</code> yields true only if
364 both of its operands are true.
368 The logical OR operator <code>||</code> yields true if at least one of
369 its operands is true.
373 Finally, the keywords <code>true</code> and <code>false</code> may also
374 be used in matching expressions. <code>true</code> is useful by itself
375 as a catch-all expression that matches every packet.
379 (The above is pretty ambitious. It probably makes sense to initially
380 implement only a subset of this specification. The full specification
381 is written out mainly to get an idea of what a fully general matching
382 expression language could include.)
386 <column name="actions">
388 Below, a <var>value</var> is either a <var>constant</var> or a
389 <var>field</var>. The following actions seem most likely to be useful:
393 <dt><code>drop;</code></dt>
394 <dd>syntactic sugar for no actions</dd>
396 <dt><code>output(<var>value</var>);</code></dt>
397 <dd>output to port</dd>
399 <dt><code>broadcast;</code></dt>
400 <dd>output to every logical port except ingress port</dd>
402 <dt><code>resubmit;</code></dt>
403 <dd>execute next logical datapath table as subroutine</dd>
405 <dt><code>set(<var>field</var>=<var>value</var>);</code></dt>
406 <dd>set data or metadata field, or copy between fields</dd>
410 Following are not well thought out:
414 <dt><code>learn</code></dt>
416 <dt><code>conntrack</code></dt>
418 <dt><code>with(<var>field</var>=<var>value</var>) { <var>action</var>, </code>...<code> }</code></dt>
419 <dd>execute <var>actions</var> with temporary changes to <var>fields</var></dd>
421 <dt><code>dec_ttl { <var>action</var>, </code>...<code> } { <var>action</var>; </code>...<code>}</code></dt>
423 decrement TTL; execute first set of actions if
424 successful, second set if TTL decrement fails
427 <dt><code>icmp_reply { <var>action</var>, </code>...<code> }</code></dt>
428 <dd>generate ICMP reply from packet, execute <var>action</var>s</dd>
430 <dt><code>arp { <var>action</var>, </code>...<code> }</code></dt>
431 <dd>generate ARP from packet, execute <var>action</var>s</dd>
435 Other actions can be added as needed
436 (e.g. <code>push_vlan</code>, <code>pop_vlan</code>,
437 <code>push_mpls</code>, <code>pop_mpls</code>).
441 Some of the OVN actions do not map directly to OpenFlow actions, e.g.:
446 <code>with</code>: Implemented as <code>stack_push;
447 set(</code>...<code>); <var>actions</var>; stack_pop</code>.
451 <code>dec_ttl</code>: Implemented as <code>dec_ttl</code> followed
452 by the successful actions. The failure case has to be implemented by
453 ovn-controller interpreting packet-ins. It might be difficult to
454 identify the particular place in the processing pipeline in
455 <code>ovn-controller</code>; maybe some restrictions will be
460 <code>icmp_reply</code>: Implemented by sending the packet to
461 <code>ovn-controller</code>, which generates the ICMP reply and sends
462 the packet back to <code>ovs-vswitchd</code>.
468 <table name="Bindings" title="Physical-Logical Bindings">
470 Each row in this table identifies the physical location of a logical
475 For every <code>Logical_Port</code> record in <code>OVN_Northbound</code>
476 database, <code>ovn-nbd</code> creates a record in this table.
477 <code>ovn-nbd</code> populates and maintains every column except
478 the <code>chassis</code> column, which it leaves empty in new records.
482 <code>ovn-controller</code> populates the <code>chassis</code> column
483 for the records that identify the logical ports that are located on its
484 hypervisor, which <code>ovn-controller</code> in turn finds out by
485 monitoring the local hypervisor's Open_vSwitch database, which
486 identifies logical ports via the conventions described in
487 <code>IntegrationGuide.md</code>.
491 When a chassis shuts down gracefully, it should cleanup the
492 <code>chassis</code> column that it previously had populated.
493 (This is not critical because resources hosted on the chassis are equally
494 unreachable regardless of whether their rows are present.) To handle the
495 case where a VM is shut down abruptly on one chassis, then brought up
496 again on a different one, <code>ovn-controller</code> must overwrite the
497 <code>chassis</code> column with new information.
500 <column name="logical_port">
501 A logical port, taken from <ref table="Logical_Port" column="name"
502 db="OVN_Northbound"/> in the OVN_Northbound database's
503 <ref table="Logical_Port" db="OVN_Northbound"/> table. OVN does not
504 prescribe a particular format for the logical port ID.
507 <column name="parent_port">
508 For containers created inside a VM, this is taken from
509 <ref table="Logical_Port" column="parent_name" db="OVN_Northbound"/>
510 in the OVN_Northbound database's <ref table="Logical_Port"
511 db="OVN_Northbound"/> table. It is left empty if
512 <ref column="logical_port"/> belongs to a VM or a container created
517 When <ref column="logical_port"/> identifies the interface of a container
518 spawned inside a VM, this column identifies the VLAN tag in
519 the network traffic associated with that container's network interface.
520 It is left empty if <ref column="logical_port"/> belongs to a VM or a
521 container created in the hypervisor.
524 <column name="chassis">
525 The physical location of the logical port. To successfully identify a
526 chassis, this column must match the <ref table="Chassis" column="name"/>
527 column in some row in the <ref table="Chassis"/> table. This is
528 populated by <code>ovn-controller</code>.
533 The Ethernet address or addresses used as a source address on the
534 logical port, each in the form
535 <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>.
536 The string <code>unknown</code> is also allowed to indicate that the
537 logical port has an unknown set of (additional) source addresses.
541 A VM interface would ordinarily have a single Ethernet address. A
542 gateway port might initially only have <code>unknown</code>, and then
543 add MAC addresses to the set as it learns new source addresses.