- Each row in this table represents one logical flow. The cloud management
- system, via its OVN integration, populates this table with logical flows
- that implement the L2 and L3 topology specified in the CMS configuration.
- Each hypervisor, via ovn-controller
, translates the logical
- flows into OpenFlow flows specific to its hypervisor and installs them
- into Open vSwitch.
+ Each row in this table represents one logical flow.
+ ovn-northd
populates this table with logical flows
+ that implement the L2 and L3 topologies specified in the
+ database. Each hypervisor, via
+ ovn-controller
, translates the logical flows into
+ OpenFlow flows specific to its hypervisor and installs them into
+ Open vSwitch.
@@ -219,23 +231,160 @@
flows are written in terms of logical ports and logical datapaths instead
of physical ports and physical datapaths. Translation between logical
and physical flows helps to ensure isolation between logical datapaths.
- (The logical flow abstraction also allows the CMS to do less work, since
- it does not have to separately compute and push out physical flows to each
- chassis.)
+ (The logical flow abstraction also allows the OVN centralized
+ components to do less work, since they do not have to separately
+ compute and push out physical flows to each chassis.)
The default action when no flow matches is to drop packets.
+ Architectural Logical Life Cycle of a Packet
+
+
+ This following description focuses on the life cycle of a packet through
+ a logical datapath, ignoring physical details of the implementation.
+ Please refer to Architectural Physical Life Cycle of a Packet in
+ ovn-architecture
(7) for the physical information.
+
+
+
+ The description here is written as if OVN itself executes these steps,
+ but in fact OVN (that is, ovn-controller
) programs Open
+ vSwitch, via OpenFlow and OVSDB, to execute them on its behalf.
+
+
+
+ At a high level, OVN passes each packet through the logical datapath's
+ logical ingress pipeline, which may output the packet to one or more
+ logical port or logical multicast groups. For each such logical output
+ port, OVN passes the packet through the datapath's logical egress
+ pipeline, which may either drop the packet or deliver it to the
+ destination. Between the two pipelines, outputs to logical multicast
+ groups are expanded into logical ports, so that the egress pipeline only
+ processes a single logical output port at a time. Between the two
+ pipelines is also where, when necessary, OVN encapsulates a packet in a
+ tunnel (or tunnels) to transmit to remote hypervisors.
+
+
+
+ In more detail, to start, OVN searches the
+ table for a row with correct , a of ingress
, a
+ of 0, and a that is true for the packet. If none
+ is found, OVN drops the packet. If OVN finds more than one, it chooses
+ the match with the highest . Then OVN executes
+ each of the actions specified in the row's column,
+ in the order specified. Some actions, such as those to modify packet
+ headers, require no further details. The next
and
+ output
actions are special.
+
+
+
+ The next
action causes the above process to be repeated
+ recursively, except that OVN searches for of 1
+ instead of 0. Similarly, any next
action in a row found in
+ that table would cause a further search for a of
+ 2, and so on. When recursive processing completes, flow control returns
+ to the action following next
.
+
+
+
+ The output
action also introduces recursion. Its effect
+ depends on the current value of the outport
field. Suppose
+ outport
designates a logical port. First, OVN compares
+ inport
to outport
; if they are equal, it treats
+ the output
as a no-op. In the common case, where they are
+ different, the packet enters the egress pipeline. This transition to the
+ egress pipeline discards register data, e.g. reg0
...
+ reg4
and connection tracking state, to achieve
+ uniform behavior regardless of whether the egress pipeline is on a
+ different hypervisor (because registers aren't preserve across
+ tunnel encapsulation).
+
+
+
+ To execute the egress pipeline, OVN again searches the table for a row with correct , a of 0, a that is true for the packet, but now looking for a of egress
. If no matching row is found,
+ the output becomes a no-op. Otherwise, OVN executes the actions for the
+ matching flow (which is chosen from multiple, if necessary, as already
+ described).
+
+
+
+ In the egress
pipeline, the next
action acts as
+ already described, except that it, of course, searches for
+ egress
flows. The output
action, however, now
+ directly outputs the packet to the output port (which is now fixed,
+ because outport
is read-only within the egress pipeline).
+
+
+
+ The description earlier assumed that outport
referred to a
+ logical port. If it instead designates a logical multicast group, then
+ the description above still applies, with the addition of fan-out from
+ the logical multicast group to each logical port in the group. For each
+ member of the group, OVN executes the logical pipeline as described, with
+ the logical output port replaced by the group member.
+
+
+ Pipeline Stages
+
+
+ ovn-northd
is responsible for populating the
+ table, so the stages are an
+ implementation detail and subject to change. This section
+ describes the current logical flow table.
+
+
+
+ The ingress pipeline consists of the following stages:
+
+
+ -
+ Port Security (Table 0): Validates the source address, drops
+ packets with a VLAN tag, and, if configured, verifies that the
+ logical port is allowed to send with the source address.
+
+
+ -
+ L2 Destination Lookup (Table 1): Forwards known unicast
+ addresses to the appropriate logical port. Unicast packets to
+ unknown hosts are forwarded to logical ports configured with the
+ special
unknown
mac address. Broadcast, and
+ multicast are flooded to all ports in the logical switch.
+
+
+
+
+ The egress pipeline consists of the following stages:
+
+
+ -
+ ACL (Table 0): Applies any specified access control lists.
+
+
+ -
+ Port Security (Table 1): If configured, verifies that the
+ logical port is allowed to receive packets with the destination
+ address.
+
+
+
- The logical datapath to which the logical flow belongs. A logical
- datapath implements a logical pipeline among the ports in the table associated with it. (No table represents a
- logical datapath.) In practice, the pipeline in a given logical datapath
- implements either a logical switch or a logical router, and
- ovn-northd
reuses the UUIDs for those logical entities from
- the OVN_Northbound
for logical datapaths.
+ The logical datapath to which the logical flow belongs.
+
+
+
+
+ The primary flows used for deciding on a packet's destination are the
+ ingress
flows. The egress
flows implement
+ ACLs. See Logical Life Cycle of a Packet, above, for details.
+
@@ -454,11 +603,7 @@
String constants have the same syntax as quoted strings in JSON (thus,
- they are Unicode strings). String constants are used for naming
- logical ports. Thus, the useful values are names from the and
- tables in a logical flow's .
+ they are Unicode strings).
@@ -529,12 +674,19 @@
Symbols
+
+ Most of the symbols below have integer type. Only inport
+ and outport
have string type. inport
names a
+ logical port. Thus, its value is a name
+ from the table. outport
may
+ name a logical port, as inport
, or a logical multicast
+ group defined in the table. For both
+ symbols, only names within the flow's logical datapath may be used.
+
+
- -
-
metadata
reg0
... reg7
- xreg0
... xreg3
-
- inport
outport
queue
+ reg0
...reg4
+ inport
outport
eth.src
eth.dst
eth.type
vlan.tci
vlan.vid
vlan.pcp
vlan.present
ip.proto
ip.dscp
ip.ecn
ip.ttl
ip.frag
@@ -547,96 +699,426 @@
icmp4.type
icmp4.code
icmp6.type
icmp6.code
nd.target
nd.sll
nd.tll
+ -
+
+ ct_state
, which has the following Boolean subfields:
+
+
+ ct.new
: True for a new flow
+ ct.est
: True for an established flow
+ ct.rel
: True for a related flow
+ ct.rpl
: True for a reply flow
+ ct.inv
: True for a connection entry in a bad state
+
+
+ ct_state
and its subfields are initialized by the
+ ct_next
action, described below.
+
+
+
+ The following predicates are supported:
+
+
+
+ eth.bcast
expands to eth.dst == ff:ff:ff:ff:ff:ff
+ eth.mcast
expands to eth.dst[40]
+ vlan.present
expands to vlan.tci[12]
+ ip4
expands to eth.type == 0x800
+ ip4.mcast
expands to ip4.dst[28..31] == 0xe
+ ip6
expands to eth.type == 0x86dd
+ ip
expands to ip4 || ip6
+ icmp4
expands to ip4 && ip.proto == 1
+ icmp6
expands to ip6 && ip.proto == 58
+ icmp
expands to icmp4 || icmp6
+ ip.is_frag
expands to ip.frag[0]
+ ip.later_frag
expands to ip.frag[1]
+ ip.first_frag
expands to ip.is_frag && !ip.later_frag
+ arp
expands to eth.type == 0x806
+ nd
expands to icmp6.type == {135, 136} && icmp6.code == 0
+ tcp
expands to ip.proto == 6
+ udp
expands to ip.proto == 17
+ sctp
expands to ip.proto == 132
+
- Logical datapath actions, to be executed when the logical flow
- represented by this row is the highest-priority match.
+ Logical datapath actions, to be executed when the logical flow
+ represented by this row is the highest-priority match.
- Actions share lexical syntax with the column. An
- empty set of actions (or one that contains just white space or
- comments), or a set of actions that consists of just
- drop;
, causes the matched packets to be dropped.
- Otherwise, the column should contain a sequence of actions, each
- terminated by a semicolon.
+ Actions share lexical syntax with the column. An
+ empty set of actions (or one that contains just white space or
+ comments), or a set of actions that consists of just
+ drop;
, causes the matched packets to be dropped.
+ Otherwise, the column should contain a sequence of actions, each
+ terminated by a semicolon.
- The following actions will be initially supported:
+ The following actions are defined:
output;
-
- Outputs the packet to the logical port current designated by
-
outport
. Output to the ingress port is implicitly
- dropped, that is, output
becomes a no-op if
- outport
== inport
.
-
+
+ In the ingress pipeline, this action executes the
+ egress
pipeline as a subroutine. If
+ outport
names a logical port, the egress pipeline
+ executes once; if it is a multicast group, the egress pipeline runs
+ once for each logical port in the group.
+
+
+
+ In the egress pipeline, this action performs the actual
+ output to the outport
logical port. (In the egress
+ pipeline, outport
never names a multicast group.)
+
+
+
+ Output to the input port is implicitly dropped, that is,
+ output
becomes a no-op if outport
==
+ inport
. Occasionally it may be useful to override
+ this behavior, e.g. to send an ARP reply to an ARP request; to do
+ so, use inport = "";
to set the logical input port to
+ an empty string (which should not be used as the name of any
+ logical port).
+
+
next;
+ next(table);
-
- Executes the next logical datapath table as a subroutine.
-
+ Executes another logical datapath table as a subroutine. By default,
+ the table after the current one is executed. Specify
+ table to jump to a specific table in the same pipeline.
+
field = constant;
-
- Sets data or metadata field field to constant value
- constant, e.g.
outport = "vif0";
to set the
- logical output port. Assigning to a field with prerequisites
- implicitly adds those prerequisites to ; thus,
- for example, a flow that sets tcp.dst
applies only to
- TCP flows, regardless of whether its mentions
- any TCP field. To set only a subset of bits in a field,
- field may be a subfield or constant may be
- masked, e.g. vlan.pcp[2] = 1;
and vlan.pcp =
- 4/4;
both set the most sigificant bit of the VLAN PCP. Not
- all fields are modifiable (e.g. eth.type
and
- ip.proto
are read-only), and not all modifiable fields
- may be partially modified (e.g. ip.ttl
must assigned as
- a whole).
-
+
+ Sets data or metadata field field to constant value
+ constant, e.g. outport = "vif0";
to set the
+ logical output port. To set only a subset of bits in a field,
+ specify a subfield for field or a masked
+ constant, e.g. one may use vlan.pcp[2] = 1;
+ or vlan.pcp = 4/4;
to set the most sigificant bit of
+ the VLAN PCP.
+
+
+
+ Assigning to a field with prerequisites implicitly adds those
+ prerequisites to ; thus, for example, a flow
+ that sets tcp.dst
applies only to TCP flows,
+ regardless of whether its mentions any TCP
+ field.
+
+
+
+ Not all fields are modifiable (e.g. eth.type
and
+ ip.proto
are read-only), and not all modifiable fields
+ may be partially modified (e.g. ip.ttl
must assigned
+ as a whole). The outport
field is modifiable in the
+ ingress
pipeline but not in the egress
+ pipeline.
+
+
+
+ field1 = field2;
+ -
+
+ Sets data or metadata field field1 to the value of data
+ or metadata field field2, e.g. reg0 =
+ ip4.src;
copies ip4.src
into reg0
.
+ To modify only a subset of a field's bits, specify a subfield for
+ field1 or field2 or both, e.g. vlan.pcp
+ = reg0[0..2];
copies the least-significant bits of
+ reg0
into the VLAN PCP.
+
+
+
+ field1 and field2 must be the same type,
+ either both string or both integer fields. If they are both
+ integer fields, they must have the same width.
+
+
+
+ If field1 or field2 has prerequisites, they
+ are added implicitly to . It is possible to
+ write an assignment with contradictory prerequisites, such as
+ ip4.src = ip6.src[0..31];
, but the contradiction means
+ that a logical flow with such an assignment will never be matched.
+
+
+
+ field1 <-> field2;
+ -
+
+ Similar to field1 = field2;
+ except that the two values are exchanged instead of copied. Both
+ field1 and field2 must modifiable.
+
+
+
+ ip.ttl--;
+ -
+
+ Decrements the IPv4 or IPv6 TTL. If this would make the TTL zero
+ or negative, then processing of the packet halts; no further
+ actions are processed. (To properly handle such cases, a
+ higher-priority flow should match on
+ ip.ttl == {0, 1};
.)
+
+
+ Prerequisite: ip
+
+
+ ct_next;
+ -
+
+ Apply connection tracking to the flow, initializing
+ ct_state
for matching in later tables.
+ Automatically moves on to the next table, as if followed by
+ next
.
+
+
+
+ As a side effect, IP fragments will be reassembled for matching.
+ If a fragmented packet is output, then it will be sent with any
+ overlapping fragments squashed. The connection tracking state is
+ scoped by the logical port, so overlapping addresses may be used.
+ To allow traffic related to the matched flow, execute
+ ct_commit
.
+
+
+
+ It is possible to have actions follow ct_next
,
+ but they will not have access to any of its side-effects and
+ is not generally useful.
+
+
+
+ ct_commit;
+ -
+ Commit the flow to the connection tracking entry associated
+ with it by a previous call to
ct_next
.
+
- The following actions will likely be useful later, but they have not
- been thought out carefully.
+ The following actions will likely be useful later, but they have not
+ been thought out carefully.
- field1 = field2;
- -
- Extends the assignment action to allow copying between fields.
-
- learn
+ arp { action;
... };
+ -
+
+ Temporarily replaces the IPv4 packet being processed by an ARP
+ packet and executes each nested action on the ARP
+ packet. Actions following the arp action, if any, apply
+ to the original, unmodified packet.
+
+
+
+ The ARP packet that this action operates on is initialized based on
+ the IPv4 packet being processed, as follows. These are default
+ values that the nested actions will probably want to change:
+
- conntrack
+
+ eth.src
unchanged
+ eth.dst
unchanged
+ eth.type = 0x0806
+ arp.op = 1
(ARP request)
+ arp.sha
copied from eth.src
+ arp.spa
copied from ip4.src
+ arp.tha = 00:00:00:00:00:00
+ arp.tpa
copied from ip4.dst
+
+
+ Prerequisite: ip4
+
- dec_ttl { action,
... } { action;
...};
+ icmp4 { action;
... };
-
- decrement TTL; execute first set of actions if
- successful, second set if TTL decrement fails
+
+ Temporarily replaces the IPv4 packet being processed by an ICMPv4
+ packet and executes each nested action on the ICMPv4
+ packet. Actions following the icmp4 action, if any,
+ apply to the original, unmodified packet.
+
+
+
+ The ICMPv4 packet that this action operates on is initialized based
+ on the IPv4 packet being processed, as follows. These are default
+ values that the nested actions will probably want to change.
+ Ethernet and IPv4 fields not listed here are not changed:
+
+
+
+ ip.proto = 1
(ICMPv4)
+ ip.frag = 0
(not a fragment)
+ icmp4.type = 3
(destination unreachable)
+ icmp4.code = 1
(host unreachable)
+
+
+
+ Details TBD.
+
+
+ Prerequisite: ip4
- icmp_reply { action,
... };
- - generate ICMP reply from packet, execute actions
+ tcp_reset;
+ -
+
+ This action transforms the current TCP packet according to the
+ following pseudocode:
+
+
+
+if (tcp.ack) {
+ tcp.seq = tcp.ack;
+} else {
+ tcp.ack = tcp.seq + length(tcp.payload);
+ tcp.seq = 0;
+}
+tcp.flags = RST;
+
- arp { action,
... }
- - generate ARP from packet, execute actions
+
+ Then, the action drops all TCP options and payload data, and
+ updates the TCP checksum.
+
+
+
+ Details TBD.
+
+
+ Prerequisite: tcp
+
+
+
+ Human-readable name for this flow's stage in the pipeline.
+
+
+
+ The overall purpose of these columns is described under Common
+ Columns
at the beginning of this document.
+
+
+
-