cascardo/linux.git
7 years agoIB/hfi1: Validate SDMA user request index
Dean Luick [Thu, 28 Jul 2016 19:21:14 +0000 (15:21 -0400)]
IB/hfi1: Validate SDMA user request index

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Use the same capability state for all shared contexts
Dean Luick [Thu, 28 Jul 2016 19:21:13 +0000 (15:21 -0400)]
IB/hfi1: Use the same capability state for all shared contexts

Save the current capability state at user context creation
time.  Report this saved value for all shared contexts.

Also get rid of unnecessary hfi1_get_base_kinfo function.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Prevent null pointer dereference
Ira Weiny [Thu, 28 Jul 2016 19:21:12 +0000 (15:21 -0400)]
IB/hfi1: Prevent null pointer dereference

If a context has not been assigned or assignment failed, pq may be NULL.
Move the unregister within the protection of the null check.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Rename TID mmu_rb_* functions
Dean Luick [Thu, 28 Jul 2016 16:27:37 +0000 (12:27 -0400)]
IB/hfi1: Rename TID mmu_rb_* functions

Clarify the names of the TID mmu functions.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister()
Dean Luick [Thu, 28 Jul 2016 16:27:36 +0000 (12:27 -0400)]
IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister()

Checking if the rb tree is empty is redundant with the while loop which is
emptying the rb tree.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Restructure hfi1_file_open
Ira Weiny [Thu, 28 Jul 2016 16:27:35 +0000 (12:27 -0400)]
IB/hfi1: Restructure hfi1_file_open

Rearrange the file open call in prep for new changes.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Make iovec loop index easy to understand
Dean Luick [Thu, 28 Jul 2016 16:27:34 +0000 (12:27 -0400)]
IB/hfi1: Make iovec loop index easy to understand

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Use "false" not 0
Ira Weiny [Thu, 28 Jul 2016 16:27:33 +0000 (12:27 -0400)]
IB/hfi1: Use "false" not 0

For bool parameters "false" should be used

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Remove unused sub-context parameter
Ira Weiny [Thu, 28 Jul 2016 16:27:32 +0000 (12:27 -0400)]
IB/hfi1: Remove unused sub-context parameter

subctxt is not used, just remove it.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Consolidate __mmu_rb_remove and hfi1_mmu_rb_remove
Ira Weiny [Thu, 28 Jul 2016 16:27:31 +0000 (12:27 -0400)]
IB/hfi1: Consolidate __mmu_rb_remove and hfi1_mmu_rb_remove

__mmu_rb_remove was called in only 1 place which was a very simple
call site.  Combine this function into its caller.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Always expect ops functions
Dean Luick [Thu, 28 Jul 2016 16:27:30 +0000 (12:27 -0400)]
IB/hfi1: Always expect ops functions

Remove, insert, and invalidate are always provided.  No
need to test.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Add parameter names to callback declarations
Ira Weiny [Thu, 28 Jul 2016 16:27:29 +0000 (12:27 -0400)]
IB/hfi1: Add parameter names to callback declarations

This makes it more clear what these functions are
operating on.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Add parameter names to function declarations
Ira Weiny [Thu, 28 Jul 2016 16:27:28 +0000 (12:27 -0400)]
IB/hfi1: Add parameter names to function declarations

Parameter names to function declarations make it more clear
what those parameters do.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Remove unused function hfi1_mmu_rb_search
Dean Luick [Thu, 28 Jul 2016 16:27:27 +0000 (12:27 -0400)]
IB/hfi1: Remove unused function hfi1_mmu_rb_search

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Remove unused uctxt->subpid and uctxt->pid
Dean Luick [Thu, 28 Jul 2016 16:27:26 +0000 (12:27 -0400)]
IB/hfi1: Remove unused uctxt->subpid and uctxt->pid

These are no longer needed.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Fix minor format error
Ira Weiny [Thu, 28 Jul 2016 16:27:25 +0000 (12:27 -0400)]
IB/hfi1: Fix minor format error

Brackets should be on the next line of a function

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Expand reported serial number
Ira Weiny [Thu, 28 Jul 2016 01:09:40 +0000 (21:09 -0400)]
IB/hfi1: Expand reported serial number

Expand the serial number space by using more bits
from the GUID.

Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Allow for non-double word multiple message sizes for user SDMA
Ira Weiny [Thu, 28 Jul 2016 01:08:42 +0000 (21:08 -0400)]
IB/hfi1: Allow for non-double word multiple message sizes for user SDMA

The driver pads non-double word multiple message sizes but it doesn't
account for this padding when the packet length is calculated. Also, the
data length is miscalculated for message sizes less than 4 bytes due to
the bit representation in LRH. And there's a check for non-double word
multiple message sizes that prevents these messages from being sent.
This patch fixes length miscalculations and enables the functionality to
send non-double word multiple message sizes.

Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Eliminate redundant opcode test in mr ref clear
Ira Weiny [Thu, 28 Jul 2016 01:07:36 +0000 (21:07 -0400)]
IB/rdmavt: Eliminate redundant opcode test in mr ref clear

The use of the specific opcode test is redundant since
all ack entry users correctly manipulate the mr pointer
to selectively trigger the reference clearing.

The overly specific test hinders the use of implementation
specific operations.

The change needs to get rid of the union to insure that
an atomic value is not seen as an MR pointer.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Handle kzalloc failure in init_pervl_scs
Ira Weiny [Thu, 28 Jul 2016 01:06:15 +0000 (21:06 -0400)]
IB/hfi1: Handle kzalloc failure in init_pervl_scs

Checking the return value of the memory allocation call in
init_pervl_scs() was missed.  Recently the kmalloc() was changed to
kzalloc() which identified the problem.

While fixing this issue 2 other bugs were noticed.  First, the array
being allocated is accessed in the nomem path which can be reached before
it is allocated.  Second, kernel_send_context was not released on error.
Fix both of these by creating a more common memory unwind label structure.

Fixes: 35f6befc8441 ("staging/rdma/hfi1: Add qp to send context mapping for PIO")
Reported-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/qib, IB/hfi1: Fix grh creation in ud loopback
Dasaratharaman Chandramouli [Mon, 25 Jul 2016 20:40:40 +0000 (13:40 -0700)]
IB/qib, IB/hfi1: Fix grh creation in ud loopback

Instead of copying the actual GRH of type struct ib_grh, existing code
copies the struct ib_global_route into the sge. This patch fixes that
and constructs the actual GRH from ib_global_route and copies the GRH
into the sge.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Use hdr2sc function to calculate 5-bit SC
Dasaratharaman Chandramouli [Mon, 25 Jul 2016 20:40:34 +0000 (13:40 -0700)]
IB/hfi1: Use hdr2sc function to calculate 5-bit SC

The interface is used to compute the 5-bit SC field from the
LRH and the RHF bits. Modify code to use the interface instead.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Cleanup UD packet handler.
Dasaratharaman Chandramouli [Mon, 25 Jul 2016 20:40:28 +0000 (13:40 -0700)]
IB/hfi1: Cleanup UD packet handler.

Cleanup hfi1_ud_rcv to not have to look at the packet
header fields multiple times. The fields are looked up
once and used throughout the function. Also fix sc
computation when validating MAD packets.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Rename hfi1_pio_header to hfi1_sdma_header.
Don Hiatt [Mon, 25 Jul 2016 20:40:22 +0000 (13:40 -0700)]
IB/hfi1: Rename hfi1_pio_header to hfi1_sdma_header.

hfi1_pio_header should really be called hfi1_sdma_header
as it is only used for sdma transmits.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Rename struct ahg_ib_header to struct hfi1_ahg_info
Dasaratharaman Chandramouli [Mon, 25 Jul 2016 20:40:16 +0000 (13:40 -0700)]
IB/hfi1: Rename struct ahg_ib_header to struct hfi1_ahg_info

struct ahg_ib_header has no header specific information.
Rename it to struct hfi1_ahg_info

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Remove unused elements from struct ahg_ib_header
Dasaratharaman Chandramouli [Mon, 25 Jul 2016 20:40:10 +0000 (13:40 -0700)]
IB/hfi1: Remove unused elements from struct ahg_ib_header

sde and hfi1_ib_header are not used anymore.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Reset QSFP on every run through channel tuning
Easwar Hariharan [Mon, 25 Jul 2016 20:40:03 +0000 (13:40 -0700)]
IB/hfi1: Reset QSFP on every run through channel tuning

Active QSFP cables were reset only every alternate iteration of the
channel tuning algorithm instead of every iteration due to incorrect
reset of the flag that controlled QSFP reset, resulting in using stale
QSFP status in the channel tuning algorithm.

Fixes: 8ebd4cf1852a ("Add active and optical cable support")
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Ignore QSFP interrupts until power stabilizes
Easwar Hariharan [Mon, 25 Jul 2016 20:39:57 +0000 (13:39 -0700)]
IB/hfi1: Ignore QSFP interrupts until power stabilizes

Some QSFP cables assert the interrupt line as a side effect of module
plug-in and power up. This causes the SerDes and QSFP tuning algorithm
to begin cable initialization by reading the QSFP memory map over I2C,
which fails. This patch ignores any interrupt line assertion until
the module has completed power up and voltage rails have stabilized,
which can take a maximum of 500 ms per the SFF-8679 specification.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Disable external device configuration requests
Easwar Hariharan [Mon, 25 Jul 2016 20:39:51 +0000 (13:39 -0700)]
IB/hfi1: Disable external device configuration requests

QSFP CDR enablement is now controlled by determining power class
and the configuration file. We disable the DC 8051 from requesting
enablement or disabling of TX and RX CDRs by removing the code
that allowed the DC 8051 to request changes.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt, hfi1: Fix NFSoRDMA failure with FRMR enabled
Jianxin Xiong [Mon, 25 Jul 2016 20:39:45 +0000 (13:39 -0700)]
IB/rdmavt, hfi1: Fix NFSoRDMA failure with FRMR enabled

Hanging has been observed while writing a file over NFSoRDMA. Dmesg on
the server contains messages like these:

[  931.992501] svcrdma: Error -22 posting RDMA_READ
[  952.076879] svcrdma: Error -22 posting RDMA_READ
[  982.154127] svcrdma: Error -22 posting RDMA_READ
[ 1012.235884] svcrdma: Error -22 posting RDMA_READ
[ 1042.319194] svcrdma: Error -22 posting RDMA_READ

Here is why:

With the base memory management extension enabled, FRMR is used instead
of FMR. The xprtrdma server issues each RDMA read request as the following
bundle:

(1)IB_WR_REG_MR, signaled;
(2)IB_WR_RDMA_READ, signaled;
(3)IB_WR_LOCAL_INV, signaled & fencing.

These requests are signaled. In order to generate completion, the fast
register work request is processed by the hfi1 send engine after being
posted to the work queue, and the corresponding lkey is not valid until
the request is processed. However, the rdmavt driver validates lkey when
the RDMA read request is posted and thus it fails immediately with error
-EINVAL (-22).

This patch changes the work flow of local operations (fast register and
local invalidate) so that fast register work requests are always
processed immediately to ensure that the corresponding lkey is valid
when subsequent work requests are posted. Local invalidate requests are
processed immediately if fencing is not required and no previous local
invalidate request is pending.

To allow completion generation for signaled local operations that have
been processed before posting to the work queue, an internal send flag
RVT_SEND_COMPLETION_ONLY is added. The hfi1 send engine checks this flag
and only generates completion for such requests.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Add the capability for reserved operations
Mike Marciniszyn [Mon, 25 Jul 2016 20:39:39 +0000 (13:39 -0700)]
IB/hfi1: Add the capability for reserved operations

This fix allows for support of in-kernel reserved operations
without impacting the ULP user.

The low level driver can register a non-zero value which
will be transparently added to the send queue size and hidden
from the ULP in every respect.

ULP post sends will never see a full queue due to a reserved
post send and reserved operations will never exceed that
registered value.

The s_avail will continue to track the ULP swqe availability
and the difference between the reserved value and the reserved
in use will track reserved availabity.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Fix trace message units
Grzegorz Heldt [Mon, 25 Jul 2016 20:39:33 +0000 (13:39 -0700)]
IB/hfi1: Fix trace message units

Trace shows incorrect amount of allocated memory.
Fix trace to display memory in KB.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Grzegorz Heldt <grzegorz.heldt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Add sysfs entry to override SDMA interrupt affinity
Tadeusz Struk [Mon, 25 Jul 2016 20:39:27 +0000 (13:39 -0700)]
IB/hfi1: Add sysfs entry to override SDMA interrupt affinity

Add sysfs entry to allow user to override affinity for SDMA
engine interrupts.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Add static PCIe Gen3 CTLE tuning
Dean Luick [Mon, 25 Jul 2016 20:39:21 +0000 (13:39 -0700)]
IB/hfi1: Add static PCIe Gen3 CTLE tuning

Enhance the PCIe Gen3 recipe to support static CTLE tuning,
and add a switch to choose between static and dynamic
approaches.  Make discrete chips default to static CTLE
tuning.

Reviewed-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Fix "suspicious rcu_dereference_check() usage" warnings
Jianxin Xiong [Mon, 25 Jul 2016 20:39:14 +0000 (13:39 -0700)]
IB/hfi1: Fix "suspicious rcu_dereference_check() usage" warnings

This fixes the following warnings with PROVE_LOCKING and PROVE_RCU
enabled in the kernel:

case (1):
[ INFO: suspicious RCU usage. ]
drivers/infiniband/hw/hfi1/init.c:532
suspicious rcu_dereference_check() usage!

case (2):
[ INFO: suspicious RCU usage. ]
drivers/infiniband/hw/hfi1/hfi.h:1624
suspicious rcu_dereference_check() usage!

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Add missing spin_lock_init call for rdi->n_cqs_lock
Jianxin Xiong [Mon, 25 Jul 2016 20:39:08 +0000 (13:39 -0700)]
IB/rdmavt: Add missing spin_lock_init call for rdi->n_cqs_lock

This fixes the following warning with PROV_LOCKING enabled kernel:

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 15 PID: 12286 Comm: modprobe Not tainted 4.7.0-rc5.prove_rcu+ #1
Hardware name: Intel Corporation S2600WT2R/S2600WT2R,
......
Call Trace:
[<ffffffff8139ec0d>] dump_stack+0x85/0xc8
[<ffffffff810eb765>] register_lock_class+0x415/0x4b0
[<ffffffff810ede1c>] ? __lock_acquire+0x40c/0x1960
[<ffffffff810edaa9>] __lock_acquire+0x99/0x1960
[<ffffffff8120ab62>] ? find_vmap_area+0x42/0x60
[<ffffffff8120ab39>] ? find_vmap_area+0x19/0x60
[<ffffffff810ef9d3>] lock_acquire+0xd3/0x200
[<ffffffffa049d598>] ? rvt_create_cq+0xc8/0x250 [rdmavt]
[<ffffffff81763391>] _raw_spin_lock+0x31/0x40
[<ffffffffa049d598>] ? rvt_create_cq+0xc8/0x250 [rdmavt]
[<ffffffffa049d598>] rvt_create_cq+0xc8/0x250 [rdmavt]
[<ffffffff810ead46>] ? static_obj+0x36/0x50
[<ffffffffa0469e39>] ib_alloc_cq+0x49/0x180 [ib_core]
[<ffffffffa047bed4>] ib_mad_init_device+0x204/0x6d0 [ib_core]
[<ffffffff810e968f>] ? up_write+0x1f/0x40
[<ffffffffa046e2c0>] ib_register_device+0x3d0/0x510 [ib_core]
[<ffffffffa0752410>] ? read_cc_setting_bin+0x200/0x200 [hfi1]
[<ffffffff810ead46>] ? static_obj+0x36/0x50
[<ffffffff810eb888>] ? lockdep_init_map+0x88/0x200
[<ffffffffa049cbff>] rvt_register_device+0x17f/0x320 [rdmavt]
[<ffffffffa0766caa>] hfi1_register_ib_device+0x6ca/0x7c0 [hfi1]
[<ffffffffa0733de4>] init_one+0x2b4/0x430 [hfi1]
[<ffffffff813e40a5>] local_pci_probe+0x45/0xa0
[<ffffffff813e5110>] ? pci_match_device+0xe0/0x110
[<ffffffff813e550c>] pci_device_probe+0xfc/0x140
[<ffffffff814daee9>] driver_probe_device+0x239/0x460
[<ffffffff814db1dd>] __driver_attach+0xcd/0xf0
[<ffffffff814db110>] ? driver_probe_device+0x460/0x460
[<ffffffff814d89b3>] bus_for_each_dev+0x73/0xc0
[<ffffffff814da74e>] driver_attach+0x1e/0x20
[<ffffffff814da1b3>] bus_add_driver+0x1d3/0x290
[<ffffffffa04cc114>] ? dev_init+0x114/0x114 [hfi1]
[<ffffffff814dbf60>] driver_register+0x60/0xe0
[<ffffffffa04cc114>] ? dev_init+0x114/0x114 [hfi1]
[<ffffffff813e39d0>] __pci_register_driver+0x60/0x70
[<ffffffffa04cc2aa>] hfi1_mod_init+0x196/0x1fe [hfi1]
[<ffffffff81002190>] do_one_initcall+0x50/0x190
[<ffffffff8110be72>] ? rcu_read_lock_sched_held+0x62/0x70
[<ffffffff8122d4aa>] ? kmem_cache_alloc_trace+0x23a/0x2a0
[<ffffffff811c1881>] ? do_init_module+0x27/0x1dc
[<ffffffff811c18ba>] do_init_module+0x60/0x1dc
[<ffffffff811360cc>] load_module+0x132c/0x1ac0
[<ffffffff81132c40>] ? __symbol_put+0x60/0x60
[<ffffffff8133e50d>] ? ima_post_read_file+0x3d/0x80

Cc: Stable <stable@vger.kernel.org> # 4.6+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Read all firmware versions
Dean Luick [Mon, 25 Jul 2016 20:39:02 +0000 (13:39 -0700)]
IB/hfi1: Read all firmware versions

Read the version of the SBus, PCIe SerDes, and Fabric Serdes
firmwares at driver load time.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Explain state complete frame details
Dean Luick [Mon, 25 Jul 2016 20:38:56 +0000 (13:38 -0700)]
IB/hfi1: Explain state complete frame details

When link up fails in LNI, the local and peer state complete
frames are reported as numbers.  Explain what the values mean
so the operator can better diagnose the problem.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Modify the default number of kernel receive conexts
Harish Chegondi [Mon, 25 Jul 2016 20:38:50 +0000 (13:38 -0700)]
IB/hfi1: Modify the default number of kernel receive conexts

Currently, the default number of kernel receive contexts is set to the
number of NUMA nodes on the system plus one for control context. However,
the systems that have a single socket and/or have NUMA disabled in the BIOS
will have only one receive context by default. This patch would ensure that
by default there will be at least two kernel receive contexts plus one for
control context regardless of the number of NUMA nodes on the system. The
user can override the default number of kernel receive contexts with the
krcvqs module parameter.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Add support for extended memory management
Jianxin Xiong [Mon, 25 Jul 2016 20:38:43 +0000 (13:38 -0700)]
IB/hfi1: Add support for extended memory management

Advertise and add the capability of handing all aspects of IBTA extended
memory management support in post send.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Work request processing for fast register mr and invalidate
Jianxin Xiong [Mon, 25 Jul 2016 20:38:37 +0000 (13:38 -0700)]
IB/hfi1: Work request processing for fast register mr and invalidate

In order to support extended memory management support, add send side
processing of work requests of type IB_WR_REG_MR, IB_WR_LOCAL_INV, and
IB_WR_SEND_WITH_INV. The first two are local operations and are supported
for both RC and UC. Send with invalidate is only supported for RC because
the corresponding IB opcodes are not defined for UC.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Handle send with invalidate opcode in the RC recv path
Jianxin Xiong [Mon, 25 Jul 2016 20:38:31 +0000 (13:38 -0700)]
IB/hfi1: Handle send with invalidate opcode in the RC recv path

As part of enabling extended memory management support, add the processing
of the RC send with invalidate.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Handle local operations in post send
Jianxin Xiong [Mon, 25 Jul 2016 20:38:25 +0000 (13:38 -0700)]
IB/rdmavt: Handle local operations in post send

Some work requests are local operations, such as IB_WR_REG_MR and
IB_WR_LOCAL_INV. They differ from non-local operations in that:

(1) Local operations can be processed immediately without being posted
to the send queue if neither fencing nor completion generation is needed.
However, to ensure correct ordering, once a local operation is posted to
the work queue due to fencing or completion requiement, all subsequent
local operations must also be posted to the work queue until all the
local operations on the work queue have completed.

(2) Local operations don't send packets over the wire and thus don't
need (and shouldn't update) the packet sequence numbers.

Define a new a flag bit for the post send table to identify local
operations.

Add a new field to the QP structure to track the number of local
operations on the send queue to determine if direct processing of new
local operations should be enabled/disabled.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Add mechanism to invalidate MR keys
Jianxin Xiong [Mon, 25 Jul 2016 20:38:19 +0000 (13:38 -0700)]
IB/rdmavt: Add mechanism to invalidate MR keys

In order to support extended memory management, add the mechanism to
invalidate MR keys. This includes a flag "lkey_invalid" in the MR data
structure that is to be checked when validating access to the MR via
the associated key, and two utility functions to perform fast memory
registration and memory key invalidate operations.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Add support for ib_map_mr_sg
Jianxin Xiong [Mon, 25 Jul 2016 20:38:13 +0000 (13:38 -0700)]
IB/rdmavt: Add support for ib_map_mr_sg

This implements the device specific function needed by the verbs
API function ib_map_mr_sg().

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Pull FECN/BECN processing to a common place
Mitko Haralanov [Mon, 25 Jul 2016 20:38:07 +0000 (13:38 -0700)]
IB/hfi1: Pull FECN/BECN processing to a common place

There were multiple places where FECN/BECN processing was
being done for the different types of QPs. All of that code
was very similar, which meant that it could be pulled into
a single function used by the different QP types.

To retain the performance in the fastpath, the common code
starts with an inline function, which only calls the slow
path if the packet has any of the [FB]ECN bits set.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Fix to fully initialize send context area
Tymoteusz Kielan [Mon, 25 Jul 2016 20:38:01 +0000 (13:38 -0700)]
IB/hfi1: Fix to fully initialize send context area

While handling buffer control MAD, partially initialized
dd->kernel_send_context area may cause potential dereference
of uninitialized pointers. Fix by using kzalloc_node()
instead of kmalloc_node().

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com>
Signed-off-by: Tymoteusz Kielan <tymoteusz.kielan@intel.com>
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Fix integrity errors counter value calculation
Jakub Pawlak [Mon, 25 Jul 2016 20:37:54 +0000 (13:37 -0700)]
IB/hfi1: Fix integrity errors counter value calculation

PMA should not sum TX and RX replay counts when reporting
local link integrity errors. Fixed by removing C_DC_TX_REPLAY
counter from calculation of the link integrity errors counter
value.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Use new driver specific post send table
Mike Marciniszyn [Fri, 1 Jul 2016 23:02:24 +0000 (16:02 -0700)]
IB/rdmavt: Use new driver specific post send table

Change rvt_post_one_wr to use the new table mechanism for
post send.

Validate that each low level driver specifies the table.

Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/qib: Add qib post send table
Mike Marciniszyn [Fri, 1 Jul 2016 23:02:18 +0000 (16:02 -0700)]
IB/qib: Add qib post send table

Add initial table for table driven post_send support.

Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Add hfi1 post send tables
Mike Marciniszyn [Fri, 1 Jul 2016 23:02:13 +0000 (16:02 -0700)]
IB/hfi1: Add hfi1 post send tables

Add initial table for table driven post_send support.

Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Add data structures and routines for table driven post send
Mike Marciniszyn [Fri, 1 Jul 2016 23:02:07 +0000 (16:02 -0700)]
IB/rdmavt: Add data structures and routines for table driven post send

Add flexibility for driver dependent operations in post send
because different drivers will have differing post send
operation support.

This includes data structure definitions to support a table
driven scheme along with the necessary validation routine
using the new table.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Correct receive packet handler assignment
Jakub Pawlak [Fri, 1 Jul 2016 23:02:02 +0000 (16:02 -0700)]
IB/hfi1: Correct receive packet handler assignment

Prevent processing receive packet in case when opcode is
accepted by QP but handler for this type of packet is not
defined.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Improve SDMA engine assignment for user SDMA
Jianxin Xiong [Fri, 1 Jul 2016 23:01:56 +0000 (16:01 -0700)]
IB/hfi1: Improve SDMA engine assignment for user SDMA

Currently each user context is assigned a single SDMA engine
based on the VL, context id, and subcontext id. That means for
MPI applications, each rank can only use one SDMA engine for
all messages. This may create unwanted backup for independent
messages going to different destinations upon congestion at one
destination.

This patch adds the packet "dlid" to the formula of SDMA engine
selection for user SDMA requests. A simple hash table is used
to maintain even distribution among the available SDMA engines
regardless how the "dlid" values are distributed.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Remove TWSI references
Dean Luick [Fri, 1 Jul 2016 23:01:50 +0000 (16:01 -0700)]
IB/hfi1: Remove TWSI references

Remove the TWSI code.  The driver now uses the kernel's built-in
i2c bit bus module.

Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Use built-in i2c bit-shift bus adapter
Dean Luick [Wed, 6 Jul 2016 21:28:52 +0000 (17:28 -0400)]
IB/hfi1: Use built-in i2c bit-shift bus adapter

Use built-in i2c bit-shift bus adapter to control the
i2c busses on the chip.

Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Refine user process affinity algorithm
Sebastian Sanchez [Mon, 25 Jul 2016 14:54:57 +0000 (07:54 -0700)]
IB/hfi1: Refine user process affinity algorithm

When performing process affinity recommendations for MPI ranks, the current
algorithm doesn't take into account multiple HFI units. Also, real
cores and HT cores are not distinguished from one another. Therefore,
all HT cores are recommended to be assigned first within the local NUMA
node before recommending the assignments of cores in other NUMA nodes.
It's ideal to assign all real cores across all NUMA nodes first, then all
HT 1 cores, then all HT 2 cores, and so on to balance CPU workload. CPU
cores in other NUMA nodes could be running interrupt handlers, and this is
not taken into account.

To balance the CPU workload for user processes, the following
recommendation algorithm is used:

 For each user process that is opening a context on HFI Y:
  a) If all cores are assigned to user processes, start assignments all
 over from the first core
  b) Assign real cores first, then HT cores (First set of HT cores on
 all physical cores, then second set of HT cores, and, so on) in the
 following order:

 1. Same NUMA node as HFI Y and not running an IRQ handler
 2. Same NUMA node as HFI Y and running an IRQ handler
 3. Different NUMA node to HFI Y and not running an IRQ handler
 4. Different NUMA node to HFI Y and running an IRQ handler
  c) Mark core as assigned in the global affinity structure. As user
 processes are done, remove core assignments from global affinity
 structure.

This implementation allows an arbitrary number of HT cores and provides
support for multiple HFIs.

This is being included in the kernel rather than user space due to the
fact that user space has no way of knowing the CPU recommendations for
contexts running as part of other jobs.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Reserve and collapse CPU cores for contexts
Sebastian Sanchez [Mon, 25 Jul 2016 14:54:48 +0000 (07:54 -0700)]
IB/hfi1: Reserve and collapse CPU cores for contexts

Kernel receive queues oversubscribe CPU cores on multi-HFI systems.
To prevent this, the kernel receive queues are separated onto
different cores, and the SDMA engine interrupts are constrained to
a lesser number of cores.

hfi1s_on_numa_node*krcvqs is the number of CPU cores that are
reserved for kernel receive queues for all HFIs. Each HFI initializes
its kernel receive queues to one of the reserved CPU cores. If there
ends up being 0 CPU cores leftover for SDMA engines, use the same
CPU cores as receive contexts.

In addition, general and control contexts are assigned to their own
CPU core, however, both types of contexts tend to have low traffic.
To save CPU cores, collapse general and control contexts to one CPU
core for all HFI units. This change prevents SDMA engine interrupts
from wrapping around general contexts.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Add global structure for affinity assignments
Dennis Dalessandro [Mon, 25 Jul 2016 14:52:36 +0000 (07:52 -0700)]
IB/hfi1: Add global structure for affinity assignments

When HFI units get initialized, they each use their own mask copy for
affinity assignments. On a multi-HFI system, affinity assignments
overbook CPU cores as each HFI doesn't have knowledge of affinity
assignments for other HFI units. Therefore, some CPU cores are never
used for interrupt handlers in systems with high number of CPU cores
per NUMA node.

For multi-HFI systems, SDMA engine interrupt assignments start all over
from the first CPU in the local NUMA node after the first HFI
initialization. This change allows assignments to continue where the
last HFI unit left off.

Add global structure for affinity assignments for multiple HFIs to share
affinity mask.

Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Add counter to track unsupported packets drop
Jakub Pawlak [Fri, 1 Jul 2016 23:01:22 +0000 (16:01 -0700)]
IB/hfi1: Add counter to track unsupported packets drop

Add sw counter to track dropped unsupported packets.
Report unsupported packets drop as the RcvError.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Add VL XmitDiscards counters to the opapmaquery
Jakub Pawlak [Fri, 1 Jul 2016 23:01:17 +0000 (16:01 -0700)]
IB/hfi1: Add VL XmitDiscards counters to the opapmaquery

Add per VL XmitDiscards counters to the opapmaquery
status and error response.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Fix trace sparse errors
Mike Marciniszyn [Fri, 1 Jul 2016 23:01:11 +0000 (16:01 -0700)]
IB/hfi1: Fix trace sparse errors

Fix sparse errors by making sure the fast assign destinations
are host cpu typed.

For the void __iomem *, just make the field match source
data.

Fix a bug where the hw_free trace printed the pointer vs.
the dereferenced value.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Separate tracepoints into specific headers
Sebastian Sanchez [Fri, 1 Jul 2016 23:01:06 +0000 (16:01 -0700)]
IB/hfi1: Separate tracepoints into specific headers

The ftrace infrastructure used to evaluate the TRACE_SYSTEM
macro on every DEFINE_EVENT() macro. Now the TRACE_SYSTEM
macro only gets evaluated when trace/define_trace.h is
included, so the group event information is lost. This was
introduced in
commit acd388fd3af3 ("tracing: Give system name a pointer")
Therefore, each system tracepoint must be on its own file.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Fix typo
Tadeusz Struk [Fri, 1 Jul 2016 23:01:00 +0000 (16:01 -0700)]
IB/hfi1: Fix typo

Fix a copy and paste typo in comment.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Remove unnecessary done label in hfi1_write_iter
Ira Weiny [Fri, 1 Jul 2016 23:00:55 +0000 (16:00 -0700)]
IB/hfi1: Remove unnecessary done label in hfi1_write_iter

Simple code clean up of hfi1_write_iter.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Clean up port state structure definition
Ira Weiny [Fri, 1 Jul 2016 23:00:49 +0000 (16:00 -0700)]
IB/hfi1: Clean up port state structure definition

The definition of port state changed mid development and the
old structure was kept accidentally.  Remove this dead code.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoLinux 4.7 v4.7
Linus Torvalds [Sun, 24 Jul 2016 19:23:50 +0000 (12:23 -0700)]
Linux 4.7

7 years agoMerge tag 'ceph-for-4.7-rc8' of git://github.com/ceph/ceph-client
Linus Torvalds [Sun, 24 Jul 2016 01:00:31 +0000 (10:00 +0900)]
Merge tag 'ceph-for-4.7-rc8' of git://github.com/ceph/ceph-client

Pull ceph fix from Ilya Dryomov:
 "A fix for a long-standing bug in the incremental osdmap handling code
  that caused misdirected requests, tagged for stable"

  The tag is signed with a brand new key - Sage is on vacation and I
  didn't anticipate this"

* tag 'ceph-for-4.7-rc8' of git://github.com/ceph/ceph-client:
  libceph: apply new_state before new_up_client on incrementals

7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sat, 23 Jul 2016 06:44:31 +0000 (15:44 +0900)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix memory leak in nftables, from Liping Zhang.

 2) Need to check result of vlan_insert_tag() in batman-adv otherwise we
    risk NULL skb derefs, from Sven Eckelmann.

 3) Check for dev_alloc_skb() failures in cfg80211, from Gregory
    Greenman.

 4) Handle properly when we have ppp_unregister_channel() happening in
    parallel with ppp_connect_channel(), from WANG Cong.

 5) Fix DCCP deadlock, from Eric Dumazet.

 6) Bail out properly in UDP if sk_filter() truncates the packet to be
    smaller than even the space that the protocol headers need.  From
    Michal Kubecek.

 7) Similarly for rose, dccp, and sctp, from Willem de Bruijn.

 8) Make TCP challenge ACKs less predictable, from Eric Dumazet.

 9) Fix infinite loop in bgmac_dma_tx_add() from Florian Fainelli.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
  packet: propagate sock_cmsg_send() error
  net/mlx5e: Fix del vxlan port command buffer memset
  packet: fix second argument of sock_tx_timestamp()
  net: switchdev: change ageing_time type to clock_t
  Update maintainer for EHEA driver.
  net/mlx4_en: Add resilience in low memory systems
  net/mlx4_en: Move filters cleanup to a proper location
  sctp: load transport header after sk_filter
  net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int
  net: cavium: liquidio: Avoid dma_unmap_single on uninitialized ndata
  net: nb8800: Fix SKB leak in nb8800_receive()
  et131x: Fix logical vs bitwise check in et131x_tx_timeout()
  vlan: use a valid default mtu value for vlan over macsec
  net: bgmac: Fix infinite loop in bgmac_dma_tx_add()
  mlxsw: spectrum: Prevent invalid ingress buffer mapping
  mlxsw: spectrum: Prevent overwrite of DCB capability fields
  mlxsw: spectrum: Don't emit errors when PFC is disabled
  mlxsw: spectrum: Indicate support for autonegotiation
  mlxsw: spectrum: Force link training according to admin state
  r8152: add MODULE_VERSION
  ...

7 years agoMerge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszer...
Linus Torvalds [Sat, 23 Jul 2016 05:25:02 +0000 (14:25 +0900)]
Merge branch 'overlayfs-linus' of git://git./linux/kernel/git/mszeredi/vfs

Pull overlayfs fixes from Miklos Szeredi:
 "This contains a fix for a potential crash/corruption issue and another
  where the suid/sgid bits weren't cleared on write"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: verify upper dentry in ovl_remove_and_whiteout()
  ovl: Copy up underlying inode's ->i_mode to overlay inode
  ovl: handle ATTR_KILL*

7 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 23 Jul 2016 03:54:20 +0000 (12:54 +0900)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "Five fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  pps: do not crash when failed to register
  tools/vm/slabinfo: fix an unintentional printf
  testing/radix-tree: fix a macro expansion bug
  radix-tree: fix radix_tree_iter_retry() for tagged iterators.
  mm: memcontrol: fix cgroup creation failure after many small jobs

7 years agoMerge tag 'drm-fixes-for-v4.7-rc8-intel-kbl' of git://people.freedesktop.org/~airlied...
Linus Torvalds [Sat, 23 Jul 2016 03:51:52 +0000 (12:51 +0900)]
Merge tag 'drm-fixes-for-v4.7-rc8-intel-kbl' of git://people.freedesktop.org/~airlied/linux

Pull intel kabylake drm fixes from Dave Airlie:
 "As mentioned Intel has gathered all the Kabylake fixes from -next,
  which we've enabled in 4.7 for the first time, these are pretty much
  limited in scope to only affects kabylake, which is hw that isn't
  shipping yet.  So I'm mostly okay with it going in now.

  If we don't land this, it might be a good idea to disable kabylake
  support in 4.7 before we ship"

* tag 'drm-fixes-for-v4.7-rc8-intel-kbl' of git://people.freedesktop.org/~airlied/linux: (28 commits)
  drm/i915/kbl: Introduce the first official DMC for Kabylake.
  drm/i915: Introduce Kabypoint PCH for Kabylake H/DT.
  drm/i915/gen9: implement WaConextSwitchWithConcurrentTLBInvalidate
  drm/i915/gen9: Add WaFbcHighMemBwCorruptionAvoidance
  drm/i195/fbc: Add WaFbcNukeOnHostModify
  drm/i915/gen9: Add WaFbcWakeMemOn
  drm/i915/gen9: Add WaFbcTurnOffFbcWatermark
  drm/i915/kbl: Add WaClearSlmSpaceAtContextSwitch
  drm/i915/gen9: Add WaEnableChickenDCPR
  drm/i915/kbl: Add WaDisableSbeCacheDispatchPortSharing
  drm/i915/kbl: Add WaDisableGafsUnitClkGating
  drm/i915/kbl: Add WaForGAMHang
  drm/i915: Add WaInsertDummyPushConstP for bxt and kbl
  drm/i915/kbl: Add WaDisableDynamicCreditSharing
  drm/i915/kbl: Add WaDisableGamClockGating
  drm/i915/gen9: Enable must set chicken bits in config0 reg
  drm/i915/kbl: Add WaDisableLSQCROPERFforOCL
  drm/i915/kbl: Add WaDisableSDEUnitClockGating
  drm/i915/kbl: Add WaDisableFenceDestinationToSLM for A0
  drm/i915/kbl: Add WaEnableGapsTsvCreditFix
  ...

7 years agoMerge tag 'drm-fixes-for-v4.7-rc8-intel' of git://people.freedesktop.org/~airlied...
Linus Torvalds [Sat, 23 Jul 2016 03:46:42 +0000 (12:46 +0900)]
Merge tag 'drm-fixes-for-v4.7-rc8-intel' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "Two i915 regression fixes.

  Intel have submitted some Kabylake fixes I'll send separately, since
  this is the first kernel with kabylake support and they don't go much
  outside that area I think they should be fine"

* tag 'drm-fixes-for-v4.7-rc8-intel' of git://people.freedesktop.org/~airlied/linux:
  drm/i915: add missing condition for committing planes on crtc
  drm/i915: Treat eDP as always connected, again

7 years agoMerge tag 'm68k-for-v4.8-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert...
Linus Torvalds [Sat, 23 Jul 2016 03:39:08 +0000 (12:39 +0900)]
Merge tag 'm68k-for-v4.8-tag1' of git://git./linux/kernel/git/geert/linux-m68k

Pull m68k upddates from Geert Uytterhoeven:
 - assorted spelling fixes
 - defconfig updates

* tag 'm68k-for-v4.8-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k/defconfig: Update defconfigs for v4.7-rc2
  m68k: Assorted spelling fixes

7 years agoMerge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Linus Torvalds [Sat, 23 Jul 2016 03:32:50 +0000 (12:32 +0900)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "A handful of fixes before final release:

  Marvell Armada:
   - One to fix a typo in the devicetree specifying memory ranges for
     the crypto engine
   - Two to deal with marking PCI and device-memory as strongly ordered
     to avoid hardware deadlocks, in particular when enabling above
     crypto driver.
   - Compile fix for PM

  Allwinner:
   - DT clock fixes to deal with u-boot-enabled framebuffer (simplefb).
   - Make R8 (C.H.I.P. SoC) inherit system compatibility from A13 to
     make clocks register proper.

  Tegra:
   - Fix SD card voltage setting on the Tegra3 Beaver dev board

  Misc:
   - Two maintainers updates for STM32 and STi platforms"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: tegra: beaver: Allow SD card voltage to be changed
  MAINTAINERS: update STi maintainer list
  MAINTAINERS: update STM32 maintainers list
  ARM: mvebu: compile pm code conditionally
  ARM: dts: sun7i: Fix pll3x2 and pll7x2 not having a parent clock
  ARM: dts: sunxi: Add pll3 to simplefb nodes clocks lists
  ARM: dts: armada-38x: fix MBUS_ID for crypto SRAM on Armada 385 Linksys
  ARM: mvebu: map PCI I/O regions strongly ordered
  ARM: mvebu: fix HW I/O coherency related deadlocks
  ARM: sunxi/dt: make the CHIP inherit from allwinner,sun5i-a13

7 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Sat, 23 Jul 2016 03:20:55 +0000 (12:20 +0900)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "This fixes a sporadic build failure in the qat driver as well as a
  memory corruption bug in rsa-pkcs1pad"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: rsa-pkcs1pad - fix rsa-pkcs1pad request struct
  crypto: qat - make qat_asym_algs.o depend on asn1 headers

7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Sat, 23 Jul 2016 03:15:48 +0000 (12:15 +0900)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security

Pull key handling fixes from James Morris:
 "Quoting David Howells:

  Here are three miscellaneous fixes:

  (1) Fix a panic in some debugging code in PKCS#7.  This can only
      happen by explicitly inserting a #define DEBUG into the code.

  (2) Fix the calculation of the digest length in the PE file parser.
      This causes a failure where there should be a success.

  (3) Fix the case where an X.509 cert can be added as an asymmetric key
      to a trusted keyring with no trust restriction if no AKID is
      supplied.

  Bugs (1) and (2) aren't particularly problematic, but (3) allows a
  security check to be bypassed.  Happily, this is a recent regression
  and never made it into a released kernel"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  KEYS: Fix for erroneous trust of incorrectly signed X.509 certs
  pefile: Fix the failure of calculation for digest
  PKCS#7: Fix panic when referring to the empty AKID when DEBUG defined

7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Sat, 23 Jul 2016 03:10:48 +0000 (12:10 +0900)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:
 "A few more fixes for the input subsystem:

   - restore naming for tsc2005 touchscreens as some userspace match on it
   - fix out of bound access in legacy keyboard driver
   - fixup in RMI4 driver

  Everything is tagged for stable as well"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: tsc200x - report proper input_dev name
  tty/vt/keyboard: fix OOB access in do_compute_shiftstate()
  Input: synaptics-rmi4 - fix maximum size check for F12 control register 8

7 years agoMerge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdim...
Linus Torvalds [Sat, 23 Jul 2016 03:07:37 +0000 (12:07 +0900)]
Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fix from Dan Williams:
 "This contains a regression fix for a problem that was introduced in
  v4.7-rc6.

  In 4.7-rc1 we introduced auto-probing for the ACPI DSM (device-
  specific-method) format that the platform firmware implements for
  nvdimm devices.  We initially fixed a regression in probing the QEMU
  DSM implementation by making acpi_check_dsm() tolerant of the way QEMU
  reports the "0 DSMs supported" condition.

  However, that broke HPE platforms since that tolerance caused the
  driver to mistakenly match the 1-zero-byte response those platforms
  give to "unknown" commands.  Instead, we simply make the driver
  tolerant of not finding any supported DSMs.  This has been tested to
  work with both QEMU and HPE platforms.

  This commit has appeared in a -next release with no reported issues"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  nfit: make DIMM DSMs optional

7 years agoMerge tag 'gpio-v4.7-6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux...
Linus Torvalds [Sat, 23 Jul 2016 03:03:21 +0000 (12:03 +0900)]
Merge tag 'gpio-v4.7-6' of git://git./linux/kernel/git/linusw/linux-gpio

Pull GPIO fix from Linus Walleij:
 "Compile problem fix for Tegra,

  Sorry to send this in the last minute but Ingo says this build failure
  is very prominent so I'm not going to wait for v4.7 before sending it.

  It is a case of COMPILE_TEST causing more problems than it solves and
  I'm already swearing about me shooting myself in the foot with that
  gun :("

* tag 'gpio-v4.7-6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: tegra: don't auto-enable for COMPILE_TEST

7 years agoMerge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 23 Jul 2016 02:55:20 +0000 (11:55 +0900)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux

Pull clk fixes from Michael Turquette:
 "Fix a bug in the at91 clk driver, two compile time warnings in sunxi
  clk drivers, and one bug in a sunxi clk driver introduced in the 4.7
  merge window"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: at91: fix clk_programmable_set_parent()
  clk: sunxi: remove unused variable
  clk: sunxi: display: Add per-clock flags
  clk: sunxi: tcon-ch1: Do not return a negative error in get_parent

7 years agoMerge branch 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj...
Linus Torvalds [Sat, 23 Jul 2016 02:46:59 +0000 (11:46 +0900)]
Merge branch 'for-4.7-fixes' of git://git./linux/kernel/git/tj/libata

Pull libata fix from Tejun Heo:
 "Another fallout from max_sectors bump a couple years ago.  The lite-on
  optical drive times out on large requests"

* 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata: LITE-ON CX1-JB256-HP needs lower max_sectors

7 years agoMerge tag 'mmc-v4.7-rc7' of git://git.linaro.org/people/ulf.hansson/mmc
Linus Torvalds [Sat, 23 Jul 2016 02:43:17 +0000 (11:43 +0900)]
Merge tag 'mmc-v4.7-rc7' of git://git.linaro.org/people/ulf.hansson/mmc

Pull MMC fixes from Ulf Hansson:
 "Here are a few late mmc fixes intended for v4.7 final.

  MMC core:
   - Fix eMMC packed command header endianness
   - Fix free of uninitialized buffer for mmc ioctl

  MMC host:
   - pxamci: Fix potential oops in ->probe()"

* tag 'mmc-v4.7-rc7' of git://git.linaro.org/people/ulf.hansson/mmc:
  mmc: pxamci: fix potential oops
  mmc: block: fix packed command header endianness
  mmc: block: fix free of uninitialized 'idata->buf'

7 years agoMerge tag 'sound-4.7-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Sat, 23 Jul 2016 02:28:06 +0000 (11:28 +0900)]
Merge tag 'sound-4.7-fix2' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "No surprise, just a few small fixes: a couple of changes are seen in
  the core part, and both of them are rather for unusual error paths.

  The rest are the regular HD-audio fixes and one USB-audio regression
  fix"

* tag 'sound-4.7-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: usb-audio: Fix quirks code is not called
  ALSA: hda: add AMD Stoney PCI ID with proper driver caps
  ALSA: hda - fix use-after-free after module unload
  ALSA: pcm: Free chmap at PCM free callback, too
  ALSA: ctl: Stop notification after disconnection
  ALSA: hda/realtek - add new pin definition in alc225 pin quirk table

7 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 23 Jul 2016 02:22:37 +0000 (11:22 +0900)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull NVMe fix from Jens Axboe:
 "Late addition here, it's basically a revert of a patch that was added
  in this merge window, but has proven to cause problems.

  This is swapping out the RCU based namespace protection with a good
  old mutex instead"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nvme: Remove RCU namespace protection

7 years agopps: do not crash when failed to register
Jiri Slaby [Wed, 20 Jul 2016 22:45:08 +0000 (15:45 -0700)]
pps: do not crash when failed to register

With this command sequence:

  modprobe plip
  modprobe pps_parport
  rmmod pps_parport

the partport_pps modules causes this crash:

  BUG: unable to handle kernel NULL pointer dereference at (null)
  IP: parport_detach+0x1d/0x60 [pps_parport]
  Oops: 0000 [#1] SMP
  ...
  Call Trace:
    parport_unregister_driver+0x65/0xc0 [parport]
    SyS_delete_module+0x187/0x210

The sequence that builds up to this is:

 1) plip is loaded and takes the parport device for exclusive use:

    plip0: Parallel port at 0x378, using IRQ 7.

 2) pps_parport then fails to grab the device:

    pps_parport: parallel port PPS client
    parport0: cannot grant exclusive access for device pps_parport
    pps_parport: couldn't register with parport0

 3) rmmod of pps_parport is then killed because it tries to access
    pardev->name, but pardev (taken from port->cad) is NULL.

So add a check for NULL in the test there too.

Link: http://lkml.kernel.org/r/20160714115245.12651-1-jslaby@suse.cz
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Rodolfo Giometti <giometti@enneenne.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agotools/vm/slabinfo: fix an unintentional printf
Dan Carpenter [Wed, 20 Jul 2016 22:45:05 +0000 (15:45 -0700)]
tools/vm/slabinfo: fix an unintentional printf

The curly braces are missing here so we print stuff unintentionally.

Fixes: 9da4714a2d44 ('slub: slabinfo update for cmpxchg handling')
Link: http://lkml.kernel.org/r/20160715211243.GE19522@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agotesting/radix-tree: fix a macro expansion bug
Dan Carpenter [Wed, 20 Jul 2016 22:45:03 +0000 (15:45 -0700)]
testing/radix-tree: fix a macro expansion bug

There are no parentheses around this macro and it causes a problem when
we do:

index = rand() % THRASH_SIZE;

Link: http://lkml.kernel.org/r/20160715210953.GC19522@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix-tree: fix radix_tree_iter_retry() for tagged iterators.
Andrey Ryabinin [Wed, 20 Jul 2016 22:45:00 +0000 (15:45 -0700)]
radix-tree: fix radix_tree_iter_retry() for tagged iterators.

radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags.
Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot()
leading to crash:

  RIP: radix_tree_next_slot include/linux/radix-tree.h:473
    find_get_pages_tag+0x334/0x930 mm/filemap.c:1452
  ....
  Call Trace:
    pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960
    mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516
    ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736
    do_writepages+0x97/0x100 mm/page-writeback.c:2364
    __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300
    filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490
    ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115
    vfs_fsync_range+0x10a/0x250 fs/sync.c:195
    vfs_fsync fs/sync.c:209
    do_fsync+0x42/0x70 fs/sync.c:219
    SYSC_fdatasync fs/sync.c:232
    SyS_fdatasync+0x19/0x20 fs/sync.c:230
    entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207

We must reset iterator's tags to bail out from radix_tree_next_slot()
and go to the slow-path in radix_tree_next_chunk().

Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup")
Link: http://lkml.kernel.org/r/1468495196-10604-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: memcontrol: fix cgroup creation failure after many small jobs
Johannes Weiner [Wed, 20 Jul 2016 22:44:57 +0000 (15:44 -0700)]
mm: memcontrol: fix cgroup creation failure after many small jobs

The memory controller has quite a bit of state that usually outlives the
cgroup and pins its CSS until said state disappears.  At the same time
it imposes a 16-bit limit on the CSS ID space to economically store IDs
in the wild.  Consequently, when we use cgroups to contain frequent but
small and short-lived jobs that leave behind some page cache, we quickly
run into the 64k limitations of outstanding CSSs.  Creating a new cgroup
fails with -ENOSPC while there are only a few, or even no user-visible
cgroups in existence.

Although pinning CSSs past cgroup removal is common, there are only two
instances that actually need an ID after a cgroup is deleted: cache
shadow entries and swapout records.

Cache shadow entries reference the ID weakly and can deal with the CSS
having disappeared when it's looked up later.  They pose no hurdle.

Swap-out records do need to pin the css to hierarchically attribute
swapins after the cgroup has been deleted; though the only pages that
remain swapped out after offlining are tmpfs/shmem pages.  And those
references are under the user's control, so they are manageable.

This patch introduces a private 16-bit memcg ID and switches swap and
cache shadow entries over to using that.  This ID can then be recycled
after offlining when the CSS remains pinned only by objects that don't
specifically need it.

This script demonstrates the problem by faulting one cache page in a new
cgroup and deleting it again:

  set -e
  mkdir -p pages
  for x in `seq 128000`; do
    [ $((x % 1000)) -eq 0 ] && echo $x
    mkdir /cgroup/foo
    echo $$ >/cgroup/foo/cgroup.procs
    echo trex >pages/$x
    echo $$ >/cgroup/cgroup.procs
    rmdir /cgroup/foo
  done

When run on an unpatched kernel, we eventually run out of possible IDs
even though there are no visible cgroups:

  [root@ham ~]# ./cssidstress.sh
  [...]
  65000
  mkdir: cannot create directory '/cgroup/foo': No space left on device

After this patch, the IDs get released upon cgroup destruction and the
cache and css objects get released once memory reclaim kicks in.

[hannes@cmpxchg.org: init the IDR]
Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org
Fixes: b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined groups")
Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: John Garcia <john.garcia@mesosphere.io>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Nikolay Borisov <kernel@kyup.com>
Cc: <stable@vger.kernel.org> [3.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agogpio: tegra: don't auto-enable for COMPILE_TEST
Arnd Bergmann [Wed, 6 Jul 2016 12:54:03 +0000 (14:54 +0200)]
gpio: tegra: don't auto-enable for COMPILE_TEST

I stumbled over a build error with COMPILE_TEST and CONFIG_OF
disabled:

drivers/gpio/gpio-tegra.c: In function 'tegra_gpio_probe':
drivers/gpio/gpio-tegra.c:603:9: error: 'struct gpio_chip' has no member named 'of_node'

The problem is that the newly added GPIO_TEGRA Kconfig symbol
does not have a dependency on CONFIG_OF. However, there is another
problem here as the driver gets enabled unconditionally whenever
COMPILE_TEST is set.

This fixes both problems, by making the symbol user-visible
when COMPILE_TEST is set and default-enabled for ARCH_TEGRA=y.

As a side-effect, it is now possible to compile-test a Tegra
kernel with GPIO support disabled, which is harmless.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 4dd4dd1d2120 ("gpio: tegra: Allow compile test")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
7 years agolibceph: apply new_state before new_up_client on incrementals
Ilya Dryomov [Tue, 19 Jul 2016 01:50:28 +0000 (03:50 +0200)]
libceph: apply new_state before new_up_client on incrementals

Currently, osd_weight and osd_state fields are updated in the encoding
order.  This is wrong, because an incremental map may look like e.g.

    new_up_client: { osd=6, addr=... } # set osd_state and addr
    new_state: { osd=6, xorstate=EXISTS } # clear osd_state

Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down).  After
applying new_up_client, osd_state is changed to EXISTS | UP.  Carrying
on with the new_state update, we flip EXISTS and leave osd6 in a weird
"!EXISTS but UP" state.  A non-existent OSD is considered down by the
mapping code

2087    for (i = 0; i < pg->pg_temp.len; i++) {
2088            if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) {
2089                    if (ceph_can_shift_osds(pi))
2090                            continue;
2091
2092                    temp->osds[temp->size++] = CRUSH_ITEM_NONE;

and so requests get directed to the second OSD in the set instead of
the first, resulting in OSD-side errors like:

[WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680

and hung rbds on the client:

[  493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
[  493.566805] rbd: rbd0:   result -6 xferred 400000
[  493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688

The fix is to decouple application from the decoding and:
- apply new_weight first
- apply new_state before new_up_client
- twiddle osd_state flags if marking in
- clear out some of the state if osd is destroyed

Fixes: http://tracker.ceph.com/issues/14901

Cc: stable@vger.kernel.org # 3.15+: 6dd74e44dc1d: libceph: set 'exists' flag for newly up osd
Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
7 years agocrypto: rsa-pkcs1pad - fix rsa-pkcs1pad request struct
Herbert Xu [Fri, 22 Jul 2016 09:58:21 +0000 (17:58 +0800)]
crypto: rsa-pkcs1pad - fix rsa-pkcs1pad request struct

To allow for child request context the struct akcipher_request child_req
needs to be at the end of the structure.

Cc: stable@vger.kernel.org
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
7 years agoovl: verify upper dentry in ovl_remove_and_whiteout()
Maxim Patlasov [Fri, 22 Jul 2016 01:24:26 +0000 (18:24 -0700)]
ovl: verify upper dentry in ovl_remove_and_whiteout()

The upper dentry may become stale before we call ovl_lock_rename_workdir.
For example, someone could (mistakenly or maliciously) manually unlink(2)
it directly from upperdir.

To ensure it is not stale, let's lookup it after ovl_lock_rename_workdir
and and check if it matches the upper dentry.

Essentially, it is the same problem and similar solution as in
commit 11f3710417d0 ("ovl: verify upper dentry before unlink and rename").

Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
7 years agopacket: propagate sock_cmsg_send() error
Soheil Hassas Yeganeh [Wed, 20 Jul 2016 22:01:18 +0000 (18:01 -0400)]
packet: propagate sock_cmsg_send() error

sock_cmsg_send() can return different error codes and not only
-EINVAL, and we should properly propagate them.

Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocrypto: qat - make qat_asym_algs.o depend on asn1 headers
Jan Stancek [Thu, 30 Jun 2016 10:23:51 +0000 (12:23 +0200)]
crypto: qat - make qat_asym_algs.o depend on asn1 headers

Parallel build can sporadically fail because asn1 headers may
not be built yet by the time qat_asym_algs.o is compiled:
  drivers/crypto/qat/qat_common/qat_asym_algs.c:55:32: fatal error: qat_rsapubkey-asn1.h: No such file or directory
   #include "qat_rsapubkey-asn1.h"

Cc: stable@vger.kernel.org
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
7 years agoInput: tsc200x - report proper input_dev name
Michael Welling [Wed, 20 Jul 2016 17:02:07 +0000 (10:02 -0700)]
Input: tsc200x - report proper input_dev name

Passes input_id struct to the common probe function for the tsc200x drivers
instead of just the bustype.

This allows for the use of the product variable to set the input_dev->name
variable according to the type of touchscreen used. Note that when we
introduced support for TSC2004 we started calling everything TSC200X, so
let's keep this quirk.

Signed-off-by: Michael Welling <mwelling@ieee.org>
Cc: stable@vger.kernel.org
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
7 years agotty/vt/keyboard: fix OOB access in do_compute_shiftstate()
Dmitry Torokhov [Mon, 27 Jun 2016 21:12:34 +0000 (14:12 -0700)]
tty/vt/keyboard: fix OOB access in do_compute_shiftstate()

The size of individual keymap in drivers/tty/vt/keyboard.c is NR_KEYS,
which is currently 256, whereas number of keys/buttons in input device (and
therefor in key_down) is much larger - KEY_CNT - 768, and that can cause
out-of-bound access when we do

sym = U(key_maps[0][k]);

with large 'k'.

To fix it we should not attempt iterating beyond smaller of NR_KEYS and
KEY_CNT.

Also while at it let's switch to for_each_set_bit() instead of open-coding
it.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
7 years agonet/mlx5e: Fix del vxlan port command buffer memset
Saeed Mahameed [Wed, 20 Jul 2016 21:39:53 +0000 (00:39 +0300)]
net/mlx5e: Fix del vxlan port command buffer memset

memset the command buffers rather than the pointers to them.

Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agopacket: fix second argument of sock_tx_timestamp()
Yoshihiro Shimoda [Tue, 19 Jul 2016 05:40:51 +0000 (14:40 +0900)]
packet: fix second argument of sock_tx_timestamp()

This patch fixes an issue that a syscall (e.g. sendto syscall) cannot
work correctly. Since the sendto syscall doesn't have msg_control buffer,
the sock_tx_timestamp() in packet_snd() cannot work correctly because
the socks.tsflags is set to 0.
So, this patch sets the socks.tsflags to sk->sk_tsflags as default.

Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
Reported-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Reported-by: Keita Kobayashi <keita.kobayashi.ym@renesas.com>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>