It appears that miniflow_extract() in emc_processing() spends a lot of
cycles waiting for the packet's data to be read.
Prefetching the next packet's data while parsing removes this delay.
For a single flow pipeline the throughput improves by ~10%. With a
more realistic pipeline the change has a much smaller effect (~0.5%
improvement)
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
+ if (i != cnt - 1) {
+ /* Prefetch next packet data */
+ OVS_PREFETCH(dp_packet_data(packets[i+1]));
+ }
+
miniflow_extract(packets[i], &key.mf);
key.len = 0; /* Not computed yet. */
key.hash = dpif_netdev_packet_get_rss_hash(packets[i], &key.mf);
miniflow_extract(packets[i], &key.mf);
key.len = 0; /* Not computed yet. */
key.hash = dpif_netdev_packet_get_rss_hash(packets[i], &key.mf);