cascardo/linux.git
8 years agonet/mlx5: E-Switch, Add SR-IOV (FDB) support
Saeed Mahameed [Tue, 1 Dec 2015 16:03:20 +0000 (18:03 +0200)]
net/mlx5: E-Switch, Add SR-IOV (FDB) support

Enabling E-Switch SRIOV for nvfs+1 vports.

Create E-Switch FDB for L2 UC/MC mac steering between VFs/PF and
external vport (Uplink).

FDB contains forwarding rules such as:
UC MAC0 -> vport0(PF).
UC MAC1 -> vport1.
UC MAC2 -> vport2.
MC MACX -> vport0, vport2, Uplink.
MC MACY -> vport1, Uplink.

For unmatched traffic FDB has the following default rules:
Unmached Traffic (src vport != Uplink) -> Uplink.
Unmached Traffic (src vport == Uplink) -> vport0(PF).

FDB rules population:
Each NIC vport (VF) will notify E-Switch manager of its UC/MC vport
context changes via modify vport context command, which will be
translated to an event that will be handled by E-Switch manager (PF)
which will update FDB table accordingly.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: E-Switch, Introduce FDB hardware capabilities
Saeed Mahameed [Tue, 1 Dec 2015 16:03:19 +0000 (18:03 +0200)]
net/mlx5: E-Switch, Introduce FDB hardware capabilities

Define needed hardware structures and capabilities needed
for E-Switch FDB flow tables and read them on driver load.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Introducing E-Switch and l2 table
Saeed Mahameed [Tue, 1 Dec 2015 16:03:18 +0000 (18:03 +0200)]
net/mlx5: Introducing E-Switch and l2 table

E-Switch is the software entity that represents and manages ConnectX4
inter-HCA ethernet l2 switching.

E-Switch has its own Virtual Ports, each Vport/vNIC/VF can be
connected to the device through a vport of an e-switch.

Each e-switch is managed by one vNIC identified by
HCA_CAP.vport_group_manager (usually it is the PF/vport[0]),
and its main responsibility is to forward each packet to the
right vport.

e-Switch needs to manage its own l2-table and FDB tables.

L2 table is a flow table that is managed by FW, it is needed for
Multi-host (Multi PF) configuration for inter HCA switching between
PFs.

FDB table is a flow table that is totally managed by e-Switch driver,
its main responsibility is to switch packets between e-Swtich internal
vports and uplink vport that belong to the same.

This patch introduces only e-Swtich l2 table management, FDB managemnt
will come later when ethernet SRIOV/VFs will be enabled.

preperation for ethernet sriov and l2 table management.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Write vlan list into vport context
Saeed Mahameed [Tue, 1 Dec 2015 16:03:17 +0000 (18:03 +0200)]
net/mlx5e: Write vlan list into vport context

Each Vport/vNIC must notify underlying e-Switch layer
for vlan table changes in-order to update SR-IOV FDB tables.

We do that at vlan_rx_add_vid and vlan_rx_kill_vid ndos.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Write UC/MC list and promisc mode into vport context
Saeed Mahameed [Tue, 1 Dec 2015 16:03:16 +0000 (18:03 +0200)]
net/mlx5e: Write UC/MC list and promisc mode into vport context

Each Vport/vNIC must notify underlying e-Switch layer
for UC/MC list and promisc mode updates, in-order to update
l2 tables and SR-IOV FDB tables.

We do that at set_rx_mode ndo.

preperation for ethernet-SRIOV and l2 table management.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Introduce access functions to modify/query vport vlans
Saeed Mahameed [Tue, 1 Dec 2015 16:03:15 +0000 (18:03 +0200)]
net/mlx5: Introduce access functions to modify/query vport vlans

Those functions are needed to notify the upcoming L2 table and SR-IOV
E-Switch(FDB) manager(PF), of the NIC vport (vf) vlan table changes.

preperation for ethernet sriov and l2 table management.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Introduce access functions to modify/query vport promisc mode
Saeed Mahameed [Tue, 1 Dec 2015 16:03:14 +0000 (18:03 +0200)]
net/mlx5: Introduce access functions to modify/query vport promisc mode

Those functions are needed to notify the upcoming SR-IOV
E-Switch(FDB) manager(PF), of the NIC vport (vf) promisc mode changes.

Preperation for ethernet sriov and l2 table management.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Introduce access functions to modify/query vport state
Saeed Mahameed [Tue, 1 Dec 2015 16:03:13 +0000 (18:03 +0200)]
net/mlx5: Introduce access functions to modify/query vport state

In preparation for SR-IOV we add here an API to enable each e-switch
manager (PF) to configure its VFs link states in e-switch

preparation for ethernet sriov.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Introduce access functions to modify/query vport mac lists
Saeed Mahameed [Tue, 1 Dec 2015 16:03:12 +0000 (18:03 +0200)]
net/mlx5: Introduce access functions to modify/query vport mac lists

Those functions are needed to notify the upcoming L2 table and SR-IOV
E-Switch(FDB) manager(PF), of the NIC vport (vf) UC/MC mac lists
changes.

preperation for ethernet sriov and l2 table management.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Update access functions to Query/Modify vport MAC address
Saeed Mahameed [Tue, 1 Dec 2015 16:03:11 +0000 (18:03 +0200)]
net/mlx5: Update access functions to Query/Modify vport MAC address

In preparation for SR-IOV we add here an API to enable each e-switch
client (PF/VF) to configure its L2 MAC addresses and for the e-switch
manager (usually the PF) to access them in order to be able to
configure them into the e-switch.
Therefore we now pass vport num parameter to
mlx5_query_nic_vport_context, so PF can access other vports contexts.

preperation for ethernet sriov and l2 table management.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Add HW capabilities and structs for SR-IOV E-Switch
Saeed Mahameed [Tue, 1 Dec 2015 16:03:10 +0000 (18:03 +0200)]
net/mlx5: Add HW capabilities and structs for SR-IOV E-Switch

Update HCA capabilities and HW struct to include needed
capabilities for upcoming Ethernet Switch (SR-IOV E-Switch).

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5_core: Add base sriov support
Eli Cohen [Tue, 1 Dec 2015 16:03:09 +0000 (18:03 +0200)]
net/mlx5_core: Add base sriov support

This patch adds SRIOV base support for mlx5 supported devices. The same
driver is used for both PFs and VFs; VFs are identified by the driver
through the flag MLX5_PCI_DEV_IS_VF added to the pci table entries.
Virtual functions are created as usual through writing a value to the
sriov_numvs sysfs file of the PF device. Upon instantiating VFs, they will
all be probed by the driver on the hypervisor. One can gracefully unbind
them through /sys/bus/pci/drivers/mlx5_core/unbind.

mlx5_wait_for_vf_pages() was added to ensure that when a VF dies without
executing proper teardown, the hypervisor driver waits till all of the
pages that were allocated at the hypervisor to maintain its operation
are returned.

In order for the VF to be operational, the PF needs to call enable_hca
for it. This can be done before the VFs are created through a call to
pci_enable_sriov.

If the there are VFs assigned to a VMs when the driver of the PF is
unloaded, all the VF will experience system error and PF driver unloads
cleanly; in this case pci_disable_sriov is not called and the devices
will show when running lspci. Once the PF driver is reloaded, it will
sync its data structures which maintain state on its VFs.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5_core: Modify enable/disable hca functions
Eli Cohen [Tue, 1 Dec 2015 16:03:08 +0000 (18:03 +0200)]
net/mlx5_core: Modify enable/disable hca functions

Modify these functions to have func_id argument to state which device we
are referring to. This is done as a preparation for SRIOV support where
a PF driver needs to control its virtual functions.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoalx: remove pointless assignment
Jarod Wilson [Mon, 30 Nov 2015 22:33:36 +0000 (17:33 -0500)]
alx: remove pointless assignment

Reasonably sure this doesn't serve any purpose.

CC: Jay Cliburn <jcliburn@gmail.com>
CC: Chris Snook <chris.snook@gmail.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'bonding-team-offload'
David S. Miller [Thu, 3 Dec 2015 16:49:30 +0000 (11:49 -0500)]
Merge branch 'bonding-team-offload'

Jiri Pirko says:

====================
bonding/team offload + mlxsw implementation

This patchset introduces needed infrastructure for link aggregation
offload - for both team and bonding. It also implements the offload
in mlxsw driver.

Particulary, this patchset introduces possibility for upper driver
(bond/team/bridge/..) to pass type-specific info down to notifier listeners.
Info is passed along with NETDEV_CHANGEUPPER/NETDEV_PRECHANGEUPPER
notifiers. Listeners (drivers of netdevs being enslaved) can react
accordingly.

Other extension is for run-time use. This patchset introduces
new netdev notifier type - NETDEV_CHANGELOWERSTATE. Along with this
notification, the upper driver (bond/team/bridge/..) can pass some
information about lower device change, particulary link-up and
TX-enabled states. Listeners (drivers of netdevs being enslaved)
can react accordingly.

The last part of the patchset is implementation of LAG offload in mlxsw,
using both previously introduced infrastructre extensions.

Note that bond-speficic (and ugly) NETDEV_BONDING_INFO used by mlx4
can be removed and mlx4 can use the extensions this patchset adds.
I plan to convert it and get rid of NETDEV_BONDING_INFO in
a follow-up patchset.

v2->v3:
- one small fix in patch 1
v1->v2:
- added patch 1 and 2 per Andy's request
- couple of more or less cosmetic changes described in couple other patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: spectrum: Implement LAG tx enabled lower state change
Jiri Pirko [Thu, 3 Dec 2015 11:12:30 +0000 (12:12 +0100)]
mlxsw: spectrum: Implement LAG tx enabled lower state change

Enabling/disabling TX on a LAG port means enabling/disabling distribution
in our HW.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: spectrum: Implement FDB add/remove/dump for LAG
Jiri Pirko [Thu, 3 Dec 2015 11:12:29 +0000 (12:12 +0100)]
mlxsw: spectrum: Implement FDB add/remove/dump for LAG

Implement FDB offloading for lagged ports, including learning LAG FDB
entries, adding/removing static FDB entries and dumping existing LAG FDB
entries.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: spectrum: Implement LAG port join/leave
Jiri Pirko [Thu, 3 Dec 2015 11:12:28 +0000 (12:12 +0100)]
mlxsw: spectrum: Implement LAG port join/leave

Implement basic procedures for joining/leaving port to/from LAG. That
includes HW setup of collector, core LAG mapping setup.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: reg: Add definition of LAG unicast record for SFN register
Jiri Pirko [Thu, 3 Dec 2015 11:12:27 +0000 (12:12 +0100)]
mlxsw: reg: Add definition of LAG unicast record for SFN register

LAG-related records have specific format in SFN register.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: reg: Add definition of LAG unicast record for SFD register
Jiri Pirko [Thu, 3 Dec 2015 11:12:26 +0000 (12:12 +0100)]
mlxsw: reg: Add definition of LAG unicast record for SFD register

LAG-related records have specific format in SFD register.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: reg: Add link aggregation configuration registers definitions
Jiri Pirko [Thu, 3 Dec 2015 11:12:25 +0000 (12:12 +0100)]
mlxsw: reg: Add link aggregation configuration registers definitions

Add definitions of SLDR, SLCR2, SLCOR registers that are used to
configure LAG.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: pci: Implement LAG processing for received packets
Jiri Pirko [Thu, 3 Dec 2015 11:12:24 +0000 (12:12 +0100)]
mlxsw: pci: Implement LAG processing for received packets

Completion queue element for receive queue provides information if the
packet was received via LAG port. Extract this info and pass it along
to core.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: core: Add support for packets received from LAG port
Jiri Pirko [Thu, 3 Dec 2015 11:12:23 +0000 (12:12 +0100)]
mlxsw: core: Add support for packets received from LAG port

Lower layer (pci) has information if the packet is received via LAG port.
If that is the case, it fills up rx_info accordingly. However upper
layer does not care about lag_id/port_index for received packets so
convert it to local_port before passing it up. For that conversion, lag
mapping array is introduced. Upper layer is responsible for setting up
the mapping according to what is set in HW.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: spectrum: Add set_rx_mode ndo stub
Jiri Pirko [Thu, 3 Dec 2015 11:12:22 +0000 (12:12 +0100)]
mlxsw: spectrum: Add set_rx_mode ndo stub

Add just a stub for now. This allows to pass check in dev_ifsioc,
SIOCADDMULTI and SIOCDELMULTI cases. Teamd is using these to add LACP
slow MAC.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobonding: set inactive flags on release
Jiri Pirko [Thu, 3 Dec 2015 11:12:21 +0000 (12:12 +0100)]
bonding: set inactive flags on release

Be correct and symmetric to enslave and set inactive flags during release.
That gives LAG offload drivers - lower state change listeners - possibility
to do proper cleanup.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobonding: implement lower state change propagation
Jiri Pirko [Thu, 3 Dec 2015 11:12:20 +0000 (12:12 +0100)]
bonding: implement lower state change propagation

Let netdev notifier listeners know about link and slave state change.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobonding: allow notifications for bond_set_slave_link_state
Jiri Pirko [Thu, 3 Dec 2015 11:12:19 +0000 (12:12 +0100)]
bonding: allow notifications for bond_set_slave_link_state

Similar to state notifications.

We allow caller to indicate if the notification should happen now or later,
depending on if he holds rtnl mutex or not. Introduce bond_slave_link_notify
function (similar to bond_slave_state_notify) which is later on called
with rtnl mutex and goes over slaves and executes delayed notification.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoteam: implement lower state change propagation
Jiri Pirko [Thu, 3 Dec 2015 11:12:18 +0000 (12:12 +0100)]
team: implement lower state change propagation

Let netdev notifier listeners know about link-up and port-enable state
changes.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoteam: rtnl_lock for options set
Jiri Pirko [Thu, 3 Dec 2015 11:12:17 +0000 (12:12 +0100)]
team: rtnl_lock for options set

During options set, there will be needed to hold rtnl_mutex in order to
safely call netdev notifiers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: introduce lower state changed info structure for LAG lowers
Jiri Pirko [Thu, 3 Dec 2015 11:12:16 +0000 (12:12 +0100)]
net: introduce lower state changed info structure for LAG lowers

This is shared info structure for bonding and team. Serves to pass down
info about link state and port activity to notification listeners.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: introduce change lower state notifier
Jiri Pirko [Thu, 3 Dec 2015 11:12:15 +0000 (12:12 +0100)]
net: introduce change lower state notifier

When lower device like bonding slave, team/bridge port, etc changes its
state, it is useful for others to notice this change. Currently this is
implemented specificly for bonding as NETDEV_BONDING_INFO notifier. This
patch aims to replace this specific usage and make this more generic to
be used for all upper-lower devices.

Introduce NETDEV_CHANGELOWERSTATE netdev notifier type and
netdev_lower_state_changed() helper.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobonding: fill-up LAG changeupper info struct and pass it along
Jiri Pirko [Thu, 3 Dec 2015 11:12:14 +0000 (12:12 +0100)]
bonding: fill-up LAG changeupper info struct and pass it along

Initialize netdev_lag_upper_info structure by TX type according to
current bonding mode and pass it along via netdev_master_upper_dev_link.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoteam: fill-up LAG changeupper info struct and pass it along
Jiri Pirko [Thu, 3 Dec 2015 11:12:13 +0000 (12:12 +0100)]
team: fill-up LAG changeupper info struct and pass it along

Initialize netdev_lag_upper_info structure by TX type according to
current team mode and pass it along via netdev_master_upper_dev_link.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: add info struct for LAG changeupper
Jiri Pirko [Thu, 3 Dec 2015 11:12:12 +0000 (12:12 +0100)]
net: add info struct for LAG changeupper

This struct will be shared by bonding and team to pass internal
information to notifier listeners.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: add possibility to pass information about upper device via notifier
Jiri Pirko [Thu, 3 Dec 2015 11:12:11 +0000 (12:12 +0100)]
net: add possibility to pass information about upper device via notifier

Sometimes the drivers and other code would find it handy to know some
internal information about upper device being changed. So allow upper-code
to pass information down to notifier listeners during linking.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: propagate upper priv via netdev_master_upper_dev_link
Jiri Pirko [Thu, 3 Dec 2015 11:12:10 +0000 (12:12 +0100)]
net: propagate upper priv via netdev_master_upper_dev_link

Eliminate netdev_master_upper_dev_link_private and pass priv directly as
a parameter of netdev_master_upper_dev_link.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: add netif_is_lag_port helper
Jiri Pirko [Thu, 3 Dec 2015 11:12:09 +0000 (12:12 +0100)]
net: add netif_is_lag_port helper

Some code does not mind if a device is bond slave or team port and treats
them the same, as generic LAG ports.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: add netif_is_lag_master helper
Jiri Pirko [Thu, 3 Dec 2015 11:12:08 +0000 (12:12 +0100)]
net: add netif_is_lag_master helper

Some code does not mind if the master is bond or team and treats them
the same, as generic LAG.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: add netif_is_team_port helper
Jiri Pirko [Thu, 3 Dec 2015 11:12:07 +0000 (12:12 +0100)]
net: add netif_is_team_port helper

Similar to other helpers, caller can use this to find out if device is
team port.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: add netif_is_team_master helper
Jiri Pirko [Thu, 3 Dec 2015 11:12:06 +0000 (12:12 +0100)]
net: add netif_is_team_master helper

Similar to other helpers, caller can use this to find out if device is
team master.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobonding: add 802.3ad support for 100G speeds
Jiri Pirko [Thu, 3 Dec 2015 11:12:05 +0000 (12:12 +0100)]
bonding: add 802.3ad support for 100G speeds

Similar to other speeds, add 100G to bonding 802.3ad code.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: Add support for CHANGEUPPER notifier error injection
Ido Schimmel [Thu, 3 Dec 2015 11:12:04 +0000 (12:12 +0100)]
net: Add support for CHANGEUPPER notifier error injection

Since CHANGEUPPER can now fail, add support for it in the newly
introduced netdev notifier error injection infrastructure.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: Check CHANGEUPPER notifier return value
Ido Schimmel [Thu, 3 Dec 2015 11:12:03 +0000 (12:12 +0100)]
net: Check CHANGEUPPER notifier return value

switchdev drivers reflect the newly requested topology to hardware when
CHANGEUPPER is received, after software links were already formed.
However, the operation can fail and user will not be notified, as the
return value of the notifier is not checked.

Add this check and rollback software links if necessary.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoi40e: Fix i40e_print_features() VEB mode output
Joe Perches [Thu, 3 Dec 2015 12:20:57 +0000 (04:20 -0800)]
i40e: Fix i40e_print_features() VEB mode output

Commit 7fd89545f337 ("i40e: remove BUG_ON from feature string building")
added defective output when I40E_FLAG_VEB_MODE_ENABLED was set in
function i40e_print_features.

Fix it.

Miscellanea:

- Remove unnecessary string variable
- Add space before not after fixed strings
- Use kmalloc not kzalloc
- Don't initialize i to 0, use result of first snprintf

Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Thu, 3 Dec 2015 16:43:32 +0000 (11:43 -0500)]
Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-12-03

This series contains updates to ixgbe and ixgbevf only.

Mark cleans up ixgbe_init_phy_ops_x550em, since this was designed to
initialize function pointers only and moves the KR PHY reset to the
ixgbe_setup_internal_phy_t_x550em which was designed to detect which
mode the PHY operates in and set it up.  Added the new thermal alarm
type support used with newer X550EM_x devices.  Fixed both ixgbe and
ixgbevf to use a private work queue to avoid hangs, which would
possibly occur when creating and destroying many VFS repeatedly.
Updated ixgbe PTP implementation to accommodate X550EM_x devices,
which handle clocking differently.  Fixed specification violations
in the datasheet, which was reported by Dan Streetman.  Fixed ixgbe
to check for and handle IPv6 extended headers so that Tx checksum
offload can be done, which was reported by Tom Herbert.  Fixed ixgbe
link issue for some systems with X540 or X550 by only inhibiting the
turning PHY power off when manageability is present.

Alex Duyck refactors the MAC address configuration code, which in
turns fixes an issue where once 63 entries had been used, you could no
longer add additional filters.  Updated ixgbe to use __dev_uc_sync
which also resolved an issue in which you could not remove an FDB
address without having to reset the port.  Updated the ixgbe driver
to make use of all the free RAR entries for FDB use if needed.

v2: updated patch 13 to "Alex Duyck Approved" version, in the original
    submission, I had grabbed a previous version of the patch and did not
    catch it was superseded by a later version
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoixgbevf: Handle extended IPv6 headers in Tx path
Mark Rustad [Thu, 19 Nov 2015 21:56:30 +0000 (13:56 -0800)]
ixgbevf: Handle extended IPv6 headers in Tx path

Check for and handle IPv6 extended headers so that Tx checksum
offload can be done. Also use skb_checksum_help for unexpected
cases. Thanks to Tom Herbert for noticing these problems. Thanks
to Alexander Duyck for seeing how to coalesce the error handling
into one location.

Reported-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Darin Miller <darin.j.miller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Always turn PHY power on when requested
Mark Rustad [Thu, 5 Nov 2015 19:02:14 +0000 (11:02 -0800)]
ixgbe: Always turn PHY power on when requested

Instead of inhibiting PHY power control when manageability is
present, only inhibit turning PHY power off when manageability
is present. Consequently, PHY power will always be turned on when
requested. Without this patch, some systems with X540 or X550
devices in some conditions will never get link.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Handle extended IPv6 headers in Tx path
Mark Rustad [Wed, 18 Nov 2015 17:21:28 +0000 (09:21 -0800)]
ixgbe: Handle extended IPv6 headers in Tx path

Check for and handle IPv6 extended headers so that Tx checksum
offload can be done. Also use skb_checksum_help for unexpected
cases. Thanks to Tom Herbert for noticing these problems. Thanks
to Alexander Duyck for recognizing problems with the first version
of this patch and recognizing how to coalesce error conditions
into a single location.

Reported-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Save VF info and take references
Mark Rustad [Fri, 30 Oct 2015 22:29:34 +0000 (15:29 -0700)]
ixgbe: Save VF info and take references

Save VF device pointers and take references to speed accesses used
to monitor the device behavior to avoid slot resets. The saved
information avoids lock contention during the search used to access
each of the VFs.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Darin Miller <darin.j.miller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Wait for master disable to be set
Mark Rustad [Tue, 27 Oct 2015 20:23:23 +0000 (13:23 -0700)]
ixgbe: Wait for master disable to be set

According to the datasheets, the driver should wait for the master
disable bit to read as being set before checking the status
register for master disable.

Reported-by: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Darin Miller <darin.j.miller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Correct spec violations by waiting after reset
Mark Rustad [Tue, 27 Oct 2015 20:23:14 +0000 (13:23 -0700)]
ixgbe: Correct spec violations by waiting after reset

The ixgbe driver was violating the specification in the datasheet
by not waiting 1ms before checking for the reset bit clearing. This
is called out for devices supported by ixgbe, so implement the
required delay.

Reported-by: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Darin Miller <darin.j.miller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Update PTP to support X550EM_x devices
Mark Rustad [Tue, 27 Oct 2015 16:58:07 +0000 (09:58 -0700)]
ixgbe: Update PTP to support X550EM_x devices

The X550EM_x devices handle clocking differently, so update the
PTP implementation to accommodate them. This involves significant
changes to ixgbe's PTP code to accommodate the new range of
behaviors including things like non-power-of-2 clock wrapping.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Darin Miller <darin.j.miller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Allow FDB entries access to more RAR filters
Alexander Duyck [Thu, 22 Oct 2015 23:26:42 +0000 (16:26 -0700)]
ixgbe: Allow FDB entries access to more RAR filters

This change makes it so that we allow the PF to make use of all free RAR
entries for FDB use if needed.

Previously the code limited us to 16 unicast entries, however this was
shared between MACVLAN which wasn't limited and the FDB code which was.  So
instead of treating the FDB code as a second class citizen I have updated
it so that it has access to just as many entries as the MACVLAN filters.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Darin Miller <darin.j.miller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Use __dev_uc_sync and __dev_uc_unsync for unicast addresses
Alexander Duyck [Thu, 22 Oct 2015 23:26:36 +0000 (16:26 -0700)]
ixgbe: Use __dev_uc_sync and __dev_uc_unsync for unicast addresses

This change replaces the ixgbe_write_uc_addr_list call in ixgbe_set_rx_mode
with a call to __dev_uc_sync instead.  This works much better with the MAC
addr list code that was already in place and solves an issue in which you
couldn't remove an FDB address without having to reset the port.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Darin Miller <darin.j.miller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Refactor MAC address configuration code
Alexander Duyck [Thu, 22 Oct 2015 23:26:30 +0000 (16:26 -0700)]
ixgbe: Refactor MAC address configuration code

In the process of tracking down a memory leak when adding/removing FDB
entries I had to go through the MAC address configuration code for ixgbe.
In the process of doing so I found a number of issues that impacted
readability and performance.  This change updates the code in general to
clean it up so it becomes clear what each step is doing.  From what I can
tell there a couple of bugs cleaned up in this code.

First is the fact that the MAC addresses were being double counted for the
PF.  As a result once entries up to 63 had been used you could no longer
add additional filters.

A simple test case for this:
  for i in `seq 0 96`
  do
    ip link add link ens8 name mv$i type macvlan
    ip link set dev mv$i up
  done

Test script:
  ethregs -s 0:8.0 | grep -e "RAH" | grep 8000....$

When things are working correctly RAL/H registers 1 - 97 will be consumed.
In the failing case it will stop at 63 and prevent any further filters from
being added.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Darin Miller <darin.j.miller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbevf: Minor cleanups
Mark Rustad [Thu, 22 Oct 2015 00:21:20 +0000 (17:21 -0700)]
ixgbevf: Minor cleanups

Make some minor cleanups, such as simplifying return paths, deleting
unneeded initializations, return values more directly and so forth.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Darin Miller <darin.j.miller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbevf: Use a private workqueue to avoid certain possible hangs
Mark Rustad [Thu, 22 Oct 2015 00:21:15 +0000 (17:21 -0700)]
ixgbevf: Use a private workqueue to avoid certain possible hangs

Use a private workqueue to avoid hangs that were otherwise possible
when performing stress tests, such as creating and destroying many
VFS repeatedly.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Darin Miller <darin.j.miller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Use private workqueue to avoid certain possible hangs
Mark Rustad [Thu, 22 Oct 2015 00:21:10 +0000 (17:21 -0700)]
ixgbe: Use private workqueue to avoid certain possible hangs

Use a private workqueue to avoid hangs that were otherwise possible
when performing stress tests, such as creating and destroying many
VFS repeatedly.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Darin Miller <darin.j.miller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Add support for newer thermal alarm
Mark Rustad [Mon, 19 Oct 2015 16:22:14 +0000 (09:22 -0700)]
ixgbe: Add support for newer thermal alarm

The newer copper PHY implementation used with newer X550EM_x
devices uses a different thermal alarm type than the earlier
one. Make changes to support both types.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Darin Miller <darin.j.miller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Prevent KR PHY reset in ixgbe_init_phy_ops_x550em
Mark Rustad [Fri, 16 Oct 2015 20:27:49 +0000 (13:27 -0700)]
ixgbe: Prevent KR PHY reset in ixgbe_init_phy_ops_x550em

This patch removes KR PHY reset from ixgbe_init_phy_ops_x550em,
since this function is meant to initialize function pointers for
the detected PHY type. Internal PHY reset was moved to
ixgbe_setup_internal_phy_t_x550em which will now detect which
mode the internal PHY operates in and set it up as required.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Darin Miller <darin.j.miller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agosfc: use ALIGN macro for aligning frame sizes
Jarod Wilson [Mon, 30 Nov 2015 22:12:21 +0000 (17:12 -0500)]
sfc: use ALIGN macro for aligning frame sizes

Don't open-code it.

CC: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
CC: Shradha Shah <sshah@solarflare.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: suppress too verbose messages in tcp_send_ack()
Eric Dumazet [Mon, 30 Nov 2015 16:57:28 +0000 (08:57 -0800)]
tcp: suppress too verbose messages in tcp_send_ack()

If tcp_send_ack() can not allocate skb, we properly handle this
and setup a timer to try later.

Use __GFP_NOWARN to avoid polluting syslog in the case host is
under memory pressure, so that pertinent messages are not lost under
a flood of useless information.

sk_gfp_atomic() can use its gfp_mask argument (all callers currently
were using GFP_ATOMIC before this patch)

We rename sk_gfp_atomic() to sk_gfp_mask() to clearly express this
function now takes into account its second argument (gfp_mask)

Note that when tcp_transmit_skb() is called with clone_it set to false,
we do not attempt memory allocations, so can pass a 0 gfp_mask, which
most compilers can emit faster than a non zero or constant value.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'hv_netvsc-less-headroom'
David S. Miller [Thu, 3 Dec 2015 04:43:48 +0000 (23:43 -0500)]
Merge branch 'hv_netvsc-less-headroom'

Merge branch 'hv_netvsc-less-headroom'

K. Y. Srinivasan says:

====================
hv_netvsc: Eliminate the additional head room

In an attempt to avoid having to allocate memory on the send path, the netvsc
driver was requesting additional head room so that both rndis header and the
netvsc packet (the state that had to persist) could be placed in the skb.
Since the amount of head room requested was exceeding the default head room
as set in LL_MAX_HEADER, we were forcing a reallocation of skb.

With this patch-set, I have reduced the size of the netvsc packet to less
than 20 bytes and with this reduction we don't need to ask for any additional
headroom. We place the rndis header in the skb head room and we place the
netvsc packet in control buffer area in the skb.

V2:  - Addressed  review comments:
     - Eliminated more fields from netvsc packet structure.

V3:  - Fixed a typo in patch: hv_netvsc: Don't ask for additional head room in the skb.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: Eliminate vlan_tci from struct hv_netvsc_packet
KY Srinivasan [Wed, 2 Dec 2015 00:43:19 +0000 (16:43 -0800)]
hv_netvsc: Eliminate vlan_tci from struct hv_netvsc_packet

Eliminate vlan_tci from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: Eliminate status from struct hv_netvsc_packet
KY Srinivasan [Wed, 2 Dec 2015 00:43:18 +0000 (16:43 -0800)]
hv_netvsc: Eliminate status from struct hv_netvsc_packet

Eliminate status from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: Eliminate xmit_more from struct hv_netvsc_packet
KY Srinivasan [Wed, 2 Dec 2015 00:43:17 +0000 (16:43 -0800)]
hv_netvsc: Eliminate xmit_more from struct hv_netvsc_packet

Eliminate xmit_more from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: Eliminate completion_func from struct hv_netvsc_packet
KY Srinivasan [Wed, 2 Dec 2015 00:43:16 +0000 (16:43 -0800)]
hv_netvsc: Eliminate completion_func from struct hv_netvsc_packet

Eliminate completion_func from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: Eliminate is_data_pkt from struct hv_netvsc_packet
KY Srinivasan [Wed, 2 Dec 2015 00:43:15 +0000 (16:43 -0800)]
hv_netvsc: Eliminate is_data_pkt from struct hv_netvsc_packet

Eliminate is_data_pkt from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: Eliminate send_completion_tid from struct hv_netvsc_packet
KY Srinivasan [Wed, 2 Dec 2015 00:43:14 +0000 (16:43 -0800)]
hv_netvsc: Eliminate send_completion_tid from struct hv_netvsc_packet

Eliminate send_completion_tid from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: Eliminate page_buf from struct hv_netvsc_packet
KY Srinivasan [Wed, 2 Dec 2015 00:43:13 +0000 (16:43 -0800)]
hv_netvsc: Eliminate page_buf from struct hv_netvsc_packet

Eliminate page_buf from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: remove locking in netvsc_send()
Vitaly Kuznetsov [Wed, 2 Dec 2015 00:43:12 +0000 (16:43 -0800)]
hv_netvsc: remove locking in netvsc_send()

Packet scheduler guarantees there won't be multiple senders for the same
queue and as we use q_idx for multi_send_data the spinlock is redundant.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: move subchannel existence check to netvsc_select_queue()
Vitaly Kuznetsov [Wed, 2 Dec 2015 00:43:11 +0000 (16:43 -0800)]
hv_netvsc: move subchannel existence check to netvsc_select_queue()

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: Don't ask for additional head room in the skb
KY Srinivasan [Wed, 2 Dec 2015 00:43:10 +0000 (16:43 -0800)]
hv_netvsc: Don't ask for additional head room in the skb

The rndis header is 116 bytes big and can be placed in the default
head room that will be available in the skb. Since the netvsc packet
is less than 48 bytes, we can use the skb control buffer
for the netvsc packet. With these changes we don't need to
ask for additional head room.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: Eliminate send_completion_ctx from struct hv_netvsc_packet
KY Srinivasan [Wed, 2 Dec 2015 00:43:09 +0000 (16:43 -0800)]
hv_netvsc: Eliminate send_completion_ctx from struct hv_netvsc_packet

Eliminate send_completion_ctx from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: Eliminate send_completion from struct hv_netvsc_packet
KY Srinivasan [Wed, 2 Dec 2015 00:43:08 +0000 (16:43 -0800)]
hv_netvsc: Eliminate send_completion from struct hv_netvsc_packet

Eliminate send_completion from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: Eliminatte the data field from struct hv_netvsc_packet
KY Srinivasan [Wed, 2 Dec 2015 00:43:07 +0000 (16:43 -0800)]
hv_netvsc: Eliminatte the data field from struct hv_netvsc_packet

Eliminatte the data field from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: Eliminate rndis_msg pointer from hv_netvsc_packet structure
KY Srinivasan [Wed, 2 Dec 2015 00:43:06 +0000 (16:43 -0800)]
hv_netvsc: Eliminate rndis_msg pointer from hv_netvsc_packet structure

Eliminate rndis_msg pointer from hv_netvsc_packet structure.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: Eliminate the channel field in hv_netvsc_packet structure
KY Srinivasan [Wed, 2 Dec 2015 00:43:05 +0000 (16:43 -0800)]
hv_netvsc: Eliminate the channel field in hv_netvsc_packet structure

Eliminate the channel field in hv_netvsc_packet structure.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: Rearrange the hv_negtvsc_packet to be space efficient
KY Srinivasan [Wed, 2 Dec 2015 00:43:04 +0000 (16:43 -0800)]
hv_netvsc: Rearrange the hv_negtvsc_packet to be space efficient

Rearrange the elements of struct hv_negtvsc_packet for optimal layout -
eliminate unnecessary padding.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: Resize some of the variables in hv_netvsc_packet
KY Srinivasan [Wed, 2 Dec 2015 00:43:03 +0000 (16:43 -0800)]
hv_netvsc: Resize some of the variables in hv_netvsc_packet

As part of reducing the size of the hv_netvsc_packet, resize some of the
variables based on their usage.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Wed, 2 Dec 2015 20:32:50 +0000 (15:32 -0500)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-12-01

This series contains updates to i40e and i40evf only.

Helin adds new fields to i40e_vsi to store user configured RSS config data
and the code to use it.  Also renamed RSS items to clarify functionality
and scope to users.  Fixed a confusing kernel message of enabling RSS size
by reporting it together with the hardware maximum RSS size.

Anjali fixes the issue of forcing writeback too often causing us to not
benefit from NAPI.

Jesse adds a prefetch for data early in the transmit path to help immensely
for pktgen and forwarding workloads.  Fixed the i40e driver that was
possibly sleeping inside critical section of code.

Carolyn fixes an issue where adminq init failures always provided a message
that NVM was newer than expected, when this is not always the case for
init_adminq failures.  Fixed by adding a check for that specific error
condition and a different helpful message otherwise.

Mitch fixes error message by telling the user which VF is being naughty,
rather than making them guess.  Updated the queue_vector array from a
statically-sized member of the adapter structure, to a dynamically-allocated
and -sized array.  This reduces the size of the adapter structure and allows
us to support any number of queue vectors in the future without changing the
code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoi40e: remove unused argument
Jesse Brandeburg [Fri, 6 Nov 2015 01:01:02 +0000 (17:01 -0800)]
i40e: remove unused argument

With the final edition of the patches to remove sleeps from
the driver's entry points, the grab_rtnl argument is no
longer needed, so partially revert the commit that added it.

Change-ID: Ib9778476242586cc9e58b670f5f48d415cb59003
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e: fix: do not sleep in netdev_ops
Jesse Brandeburg [Fri, 6 Nov 2015 01:01:01 +0000 (17:01 -0800)]
i40e: fix: do not sleep in netdev_ops

The driver was being called by VLAN, bonding, teaming operations
that expected to be able to hold locks like rcu_read_lock().

This causes the driver to be held to the requirement to not sleep,
and was found by the kernel debug options for checking sleep
inside critical section, and the locking validator.

Change-ID: Ibc68c835f5ffa8ffe0638ffe910a66fc5649a7f7
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e/i40evf: Bump i40e version to 1.4.4 and i40evf to 1.4.1
Catherine Sullivan [Mon, 26 Oct 2015 23:44:41 +0000 (19:44 -0400)]
i40e/i40evf: Bump i40e version to 1.4.4 and i40evf to 1.4.1

Bump.

Change-ID: I00ebbb2e5e5572f947502b8f6db4d94f666d6b14
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40evf: allocate ring structs dynamically
Mitch Williams [Mon, 26 Oct 2015 23:44:40 +0000 (19:44 -0400)]
i40evf: allocate ring structs dynamically

Instead of awkwardly keeping a fixed array of pointers in the adapter
struct and then allocating ring structs individually, just keep a single
pointer and allocate a single blob for the arrays. This simplifies code,
shrinks the adapter structure, and future-proofs the driver by not
limiting the number of rings we can handle.

Change-ID: I31334ff911a6474954232cfe4bc98ccca3c769ff
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40evf: allocate queue vectors dynamically
Mitch Williams [Mon, 26 Oct 2015 23:44:39 +0000 (19:44 -0400)]
i40evf: allocate queue vectors dynamically

Change the queue_vector array from a statically-sized member of the
adapter structure to a dynamically-allocated and -sized array.

This reduces the size of the adapter structure, and allows us to support
any number of queue vectors in the future without changing the code.

Change-ID: I08dc622cb2f2ad01e832e51c1ad9b86524730693
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40evf: quoth the VF driver, Nevermore
Mitch Williams [Mon, 26 Oct 2015 23:44:38 +0000 (19:44 -0400)]
i40evf: quoth the VF driver, Nevermore

If, upon a midnight dreary, the PF returns ERR_PARAM when the VF is
requesting resources, that's fatal. Either the firmware or NVM is badly,
badly misconfigured, or this VF has been disabled due to a previous VF
driver sending a bunch of bogus messages.

Either way, there is no recovery from this. Don't ponder weak and weary,
just quit.

Change-ID: I09d9f16cc4ee7fec3b57646a289d33838c1c5bf5
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e: make error message more useful
Mitch Williams [Mon, 26 Oct 2015 23:44:37 +0000 (19:44 -0400)]
i40e: make error message more useful

If we get an invalid message from a VF, we should tell the user which VF
is being naughty, rather than making them guess.

Change-ID: I9252cef7baea3d8584043ed6ff12619a94e2f99c
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e: fix confusing message
Helin Zhang [Mon, 26 Oct 2015 23:44:36 +0000 (19:44 -0400)]
i40e: fix confusing message

This patch fixes the confusing kernel message of enabled RSS size,
by reporting it together with the hardware maximum RSS size.

Change-ID: I64864dbfbc13beccc180a7871680def1f3d5a339
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e: Update error messaging
Carolyn Wyborny [Mon, 26 Oct 2015 23:44:35 +0000 (19:44 -0400)]
i40e: Update error messaging

This patch fixes an issue where adminq init failures always provided
a message that NVM was newer than expected.  This is not always the
case for init_adminq failures. Without this patch, if adminq init
fails for any reason, newer NVM message would be given.  This
problem is fixed by adding  a check for that specific error
condition and a different hopefully helpful message otherwise.

Change-ID: Iaeaebee4e398989eae40bb70f943ab66a3a521a5
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40evf: add new fields to store user configuration of RSS
Helin Zhang [Mon, 26 Oct 2015 23:44:34 +0000 (19:44 -0400)]
i40evf: add new fields to store user configuration of RSS

This patch adds new fields to i40e_vsi to store user configured
RSS config data and code to use it.

Change-ID: Ic5d3db8d9df52182b560248f8cdca9c5c7546879
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40evf: create a generic get RSS function
Helin Zhang [Mon, 26 Oct 2015 23:44:33 +0000 (19:44 -0400)]
i40evf: create a generic get RSS function

There are two ways to get RSS, this patch implements two functions
with the same input parameters, and creates a more generic function
for getting RSS configuration.

Change-ID: I12d3b712c21455d47dd0a5aae58fc9b7c680db59
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40evf: create a generic config RSS function
Helin Zhang [Tue, 27 Oct 2015 20:15:06 +0000 (16:15 -0400)]
i40evf: create a generic config RSS function

There are two ways to configure RSS, this patch adjusts those two
functions with the same input parameters, and creates a more
generic function for configuring RSS.

Change-ID: Iace73bdeba4831909979bef221011060ab327f71
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40evf: rename VF adapter specific RSS function
Helin Zhang [Mon, 26 Oct 2015 23:44:31 +0000 (19:44 -0400)]
i40evf: rename VF adapter specific RSS function

This patch renames old VF adapter specific RSS function to clarify
its scope.

Change-ID: Ie5253083a44c677ebb7709a8a3a18402ad2dc6a6
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e/i40evf: prefetch skb data on transmit
Jesse Brandeburg [Mon, 26 Oct 2015 23:44:30 +0000 (19:44 -0400)]
i40e/i40evf: prefetch skb data on transmit

Issue a prefetch for data early in the transmit path.
This should not be generally needed for Tx traffic, but
it helps immensely for pktgen workloads and should help
for forwarding workloads as well.

Change-ID: Iefee870c20599e0c4240e1d8637e4f16b625f83a
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e/i40evf: Fix RS bit update in Tx path and disable force WB workaround
Anjali Singhai Jain [Mon, 26 Oct 2015 23:44:29 +0000 (19:44 -0400)]
i40e/i40evf: Fix RS bit update in Tx path and disable force WB workaround

This patch fixes the issue of forcing WB too often causing us to not
benefit from NAPI.

Without this patch we were forcing WB/arming interrupt too often taking
away the benefits of NAPI and causing a performance impact.

With this patch we disable force WB in the clean routine for X710
and XL710 adapters. X722 adapters do not enable interrupt to force
a WB and benefit from WB_ON_ITR and hence force WB is left enabled
for those adapters.
For XL710 and X710 adapters if we have less than 4 packets pending
a software Interrupt triggered from service task will force a WB.

This patch also changes the conditions for setting RS bit as described
in code comments. This optimizes when the HW does a tail bump amd when
it does a WB. It also optimizes when we do a wmb.

Change-ID: Id831e1ae7d3e2ec3f52cd0917b41ce1d22d75d9d
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e: rename rss_size to alloc_rss_size in i40e_pf
Helin Zhang [Mon, 26 Oct 2015 23:44:28 +0000 (19:44 -0400)]
i40e: rename rss_size to alloc_rss_size in i40e_pf

This patch renames rss_size to alloc_rss_size in i40e_pf, which is
clearer and avoids confusion. It also adds comments to the other
related structure members to help clarify usage.

Change-ID: Ia90090609d006ab589cb639975bb8a0af795d16f
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e: add new fields to store user configuration
Helin Zhang [Mon, 26 Oct 2015 23:44:27 +0000 (19:44 -0400)]
i40e: add new fields to store user configuration

This patch adds new fields to i40e_vsi to store user configured
RSS config data and code to use it.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Change-ID: I73886469dca9e9f6b16d842182a87f3f4009f95d
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoisdn: remove spellcaster driver
Arnd Bergmann [Mon, 30 Nov 2015 10:34:09 +0000 (11:34 +0100)]
isdn: remove spellcaster driver

The 'sc' ISDN driver relies on using readl() to access ISA I/O memory.
This has been deprecated and produced warnings since linux-2.3.23,
disabled by default since 2.4.10 and finally removed in 2.6.5.

I found this because the compiling the driver for ARM produces
a warning:

In file included from ../drivers/isdn/sc/includes.h:8:0,
                 from ../drivers/isdn/sc/init.c:13:
../arch/arm/include/asm/io.h:115:21: note: expected 'const volatile void *' but argument is of type 'long unsigned int'

It is pretty clear that this driver has not been used for a long time
and there is no point fixing it now, so let's remove it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agostmmac: support Reg_9 to get HW level information
Giuseppe CAVALLARO [Mon, 30 Nov 2015 10:33:10 +0000 (11:33 +0100)]
stmmac: support Reg_9 to get HW level information

For GMAC newer than 3.40a there is a new register (Reg_9) that provides the
status of all modules of the transmit and receive paths and FIFO status.
These can be exposed via ethtool.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>