git.cascardo.eti.br Git - cascardo/linux.git/log

staging: lustre: o2iblnd: Put back work queue check previously removed

The previous patch, http://review.whamcloud.com/21304/, removed
a check needed until LU-5718 is properly addressed. With
the check, LU-5718 results in an error message and a lost
RDMA operation. Without it, we have memory corruption and
a crash (much harder to debug).

Putting the check back in case LU-5718 is not fixed soon.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7650
Reviewed-on: http://review.whamcloud.com/22281
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Olaf Weber <olaf@sgi.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: lnet: Enable setting per NI peer_credits

The code to allow peer_credits to be set per NI was originally
"left inactive" because there were concerns about peer_credits
interfering with the ability for IB nodes to connect to each
other when peer_credits are not the same (peer_credits controls
the queue depth for IB). With LU-3322, the values do not have
to match so it is now safe to enable this code so peer_credits
can be set per NI.

This patch enables existing code for setting per NI peer_credits.

Second this patch fixes a long standing bug in that the conf data
was not being used to set variables in the lnet_ni structure until
after lnd_startup() was called which meant LND drivers were
ignoring struct lnet_ni tunable values being set. Now we change
struct lnet_ni data fields based on conf data before calling
lnd_startup().

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8507
Reviewed-on: http://review.whamcloud.com/21948
Reviewed-by: Olaf Weber <olaf@sgi.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: lnet: Ensure routing is turned on first time

In lnet_rtrpools_enable(), a mistake was made and routing
was not being turned on when the rtrpools are being allocated
for the first time.

This patch fixes that routine so we remember to turn on
routing after allocating the rtrpools.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8501
Reviewed-on: http://review.whamcloud.com/21934
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: lnet: check if ni is in current net namespace

Add new 'ni_net_ns' field to struct lnet_ni to hold a reference
to original net namespace in which ni is created.
In LNetDist(), check if ni was created in same net namespace as
current's one. If not, assign order above 0xffff0000, to make
this ni not a priority.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7845
Reviewed-on: http://review.whamcloud.com/21884
Reviewed-by: Olaf Weber <olaf@sgi.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: lnet: potential deadlock in lnet

Fixes potential deadlock in LNetMDAttach

Signed-off-by: Quentin Bouget <quentin.bouget.ocre@cea.fr>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8249
Reviewed-on: http://review.whamcloud.com/20676
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: lmv: fix parent FID for migration

If the migrating directory is under striped directory, it needs
to set right stripe FID for its parent.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6263
Reviewed-on: http://review.whamcloud.com/13817
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: mdc: cl_default_mds_easize not refreshed

The client_obd::cl_default_mds_easize field should track the largest
observed EA size advertised by the MDT, subject to a reasonable upper
bound. The MDC uses cl_default_mds_easize to calculate the initial
size of request buffers.  The default value should be small enough to
avoid wasted memory and excessive use of vmalloc(), yet large enough
to accommodate the common use case.

In the current code, the default value is only updated if
client_obd::cl_max_mds_easize is strictly less than
mdt_body::mbo_max_mdsize. This condition is almost never met, because
client_obd::cl_max_mds_easize is computed at client mount-time based
on the number of OSTs in the filesystem, so the MDT won't ever observe
and advertise an EA size larger than that.

As a result, client_obd::cl_default_mds_easize indefinitely retains
its initial value, which is computed at client mount-time based on
the filesystem's default stripe width. Any getattr() requests for
widely striped files will consequently allocate a request buffer
that is too small, forcing reallocations on both the client and
server side. To avoid this, update client_obd::cl_default_mds_easize
independently of the value of client_obd::cl_max_mds_easize.

In addition, this patch includes these changes:

- Add comments to the client_obd structure to clarify what the
   cl_{default,max}_mds_{cookie,ea}size values mean.

- Prevent mdc_get_info() from storing uninitialized data in
   client_obd::cl_max_mds_cookiesize.

- Use 4096 as an upper bound for the default values.  The former
   bound of PAGE_CACHE_SIZE is too large on 64k-page platforms
   (i.e. PPC), so it fails to prevent the vmalloc() spinlock
   contention described in LU-3338. The new value was chosen to
   be large enough to accommodate common use cases while staying
   well below the 16k threshold at which allocations start using
   vmalloc().

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Kyle Blatter <kyleblatter@llnl.gov>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5549
Reviewed-on: http://review.whamcloud.com/11614
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: make default_easize writeable in /sysfs

Allow default_easize to be tuned via /sysfs. A system administrator
might want this if a rare access to widely striped files drives up the
value on a filesystem where narrowly striped files are the more common
case. In practice, however, this is wanted primarily to facilitate
a test case for LU-5549.

- Plumb the necessary interfaces through the LMV and MDC layers
to expose write access to this value by higher layers.

- Add block comments to modified functions.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5549
Reviewed-on: http://review.whamcloud.com/13112
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: mdt: add indexing option to default dir stripe

Add indexing option to default dirstripe EA. If MDT find
out the client send the create req to the wrong MDT because
of default stripeEA, it will return -EREMOTE, then client
will retrieve default stripeEA through xattr cache, and
re-create the object.

Also merged patch for LU-6341 to resolve the following problem.
Use ll_dir_getstripe to get default stripeEA in ll_new_node(),
Because ll_getxattr_common requires admin rights for retrieving
default LMVEA (because of trusted- prefix), which might cause
mkdir (from normal user) failure.

If parent does not have default stripeEA, then child should always
be in the same MDT for mkdir. Otherwise MDT should return -EREMOTE,
then client will refresh the default stripe index, and recreate
the object.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5523
Reviewed-on: http://review.whamcloud.com/13360
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6341
Reviewed-on: http://review.whamcloud.com/13990
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ptlrpc: prevent request timeout grow due to recovery

Patch fixes the issue seen on the client with growing request
timeout which occurred after the server side patch landed for
LU-5079. While commit itself is correct, it reveals another
issue. If request is being processed for a long time on server
then client adaptive timeouts will adapt to that after receiving
reply and new requests will have bigger timeout. Another problem
is that server AT history is corrupted by recovery request
processing time which not pure service time but includes
also waiting time for clients to recover.

Patch prevents the AT stats update from early replies on client and
from recovering requests processing time on server.
The ptlrpc_at_recv_early_reply() still updates the current request
timeout as asked by server, but don't include this into AT stats.
The real reply will bring that data from server after all.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6084
Reviewed-on: http://review.whamcloud.com/13520
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: obd: use proper flags for call_usermodehelper

When a parameter is permanently changed on the MGS the
MGS send a changelog packet to the proper nodes that
are affected by the change. Once the nodes receive the
change they then call the userland utility lctl to
change its local value. When calling a userland
application from the kernel you specify a flag to
control the interaction with the application. Originally
by default the flag was set to 0 which is UMH_NO_WAIT
which meant lctl was being called asynchronously. In
older kernels this was fine since UHM_NO_WAIT and
UHM_WAIT_PROC had nearly the same logic. This changed
with newer kernels which broke updating our parameters.
Plus doing a UHM_NO_WAIT doesn't report back a error
if something goes wrong with lctl. The fix is to set
the flag to UHM_WAIT_PROC so kernel space waits until
lctl has finished and we get a proper error code if
something does go wrong with lctl.

Signed-off-by: James Simmons <uja.ornl@gmail.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6063
Reviewed-on: http://review.whamcloud.com/13677
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ptlrpc: remove unnecessary EXPORT_SYMBOL

A lot of symbols don't need to be exported at all because they are
only used in the module they belong to.

Signed-off-by: frank zago <fzago@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5829
Reviewed-on: http://review.whamcloud.com/12510
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: lock the inode to be migrated

Because the inode and its connected dentries will be cleared
out of the cache after migration, the inode needs to be locked
during the migration.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4712
Reviewed-on: http://review.whamcloud.com/9689
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: obdclass: remove unnecessary EXPORT_SYMBOL

A lot of symbols don't need to be exported at all because they are
only used in the module they belong to.

Signed-off-by: frank zago <fzago@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5829
Reviewed-on: http://review.whamcloud.com/13323
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: misc: remove unnecessary EXPORT_SYMBOL

A lot of symbols don't need to be exported at all because they are
only used in the module they belong to.

Signed-off-by: frank zago <fzago@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5829
Reviewed-on: http://review.whamcloud.com/13321
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: grant: quiet message on grant waiting timeout

Use at_max in osc_enter_cache() to bound how long we wait for grant
space before switching to synchronous I/Os. Do not print a message
on the console when the timeout is hit since such long wait can
be legitimate with flaky network (i.e. BRW is resent multiple times).

Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5521
Reviewed-on: http://review.whamcloud.com/12146
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: lmv: Do not revalidate stripes with master lock

Do not revalidate slave stripes while holding master lock.
Otherwise if the revalidating slaves are blocked, then the
master lock can not be released in time.

Remove some unnecesary merging in ll_revalidate_slave(), and
the attributes will be stored in each stripe, only
merging them if required.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6088
Reviewed-on: http://review.whamcloud.com/13432
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: client: Fix mkdir -i 1 from DNE2 client to DNE1 server

After DNE phase 2 has been added to client it sends
create request to slave MDT. DNT1-only server doesn't
expect request to slave MDT from client. It expects
only cross-mdt request from master MDT. Thus if DNE2
client tries to "mkdir -i 1" on DNE1 server, then
LBUG happened.

This patch adds OBD_CONNECT_DIR_STRIPE connection
flag check on client side. If striped directories are not
supported by server, then create requrest is sent to
master MDT.

Signed-off-by: Artem Blagodarenko <artem_blagodarenko@xyratex.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6071
Xyratex-bug-id: MRP-2319
Reviewed-on: http://review.whamcloud.com/13189
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: wang di <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: clio: pass fid for OST setattr

Store inode's fid in cl_setattr_ost() and OSC packs this info on the
wire (via lustre_set_wire_obdo) so that OST can use.

NOTE: currently lu_fid::f_ver and obdo::o_parent_ver are not used on
OFD device, and we use obdo::o_stripe_idx as
filter_fid::ff_parent::f_ver and save it to the device.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1154
Reviewed-on: http://review.whamcloud.com/12902
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: clio: rename coo_attr_set to coo_attr_update

coo_attr_set() is used to update object's attribute but its name
makes confusion that people intuitively think that it is used to
pass object's attribute down to server sides.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1154
Reviewed-on: http://review.whamcloud.com/12888
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: pack suppgid to MDS correctly

The ll_lookup_it() may trigger IT_OPEN RPC to open a file by name.
But at that time, the client does not know the target file's GID,
so it cannot pack the necessary supplementary group ID in the RPC.
Because of missing the supplementary group ID, the RPC maybe fail
for open permission check on the MDS. Under such case, MDS should
return the target file's GID, if the current thread on the client
in the right group (according to the file's GID), the client will
try the IT_OPEN RPC again with the right supplementary group ID.

This patch is also helpful if some other(s) changed the file's GID
after current RPC sent to the MDS with the suppgid as the original
GID by race.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5423
Reviewed-on: http://review.whamcloud.com/12476
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: remove lustre/include/linux/

Merge the contents of lustre/include/linux/lvfs.h into
lustre/include/lvfs.h. Merge lustre/include/linux/lustre_user.h into
lustre/include/lustre/lustre_user.h. Move lustre_compat25.h and
lustre_patchless_compat.h from lustre/include/linux/ to
lustre/include/ and rename lustre_compat25.h to lustre_compat.h.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/13271
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: libcfs: check mask returned by cpumask_of_node

cpumask_of_node can return NULL if NUMA node is unavailable,
in this case cfs_node_to_cpumask will try to copy from NULL
and cause kernel panic.

Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5751
Reviewed-on: http://review.whamcloud.com/13207
Reviewed-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: obd: change type of cl_conn_count to size_t

Change type of cl_conn_count to size_t.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/13125
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: unlock inode size in ll_lov_setstripe_ea_info()

In ll_lov_setstripe_ea_info() release the inode size lock on all
appropriate exit paths.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6059
Reviewed-on: http://review.whamcloud.com/13167
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: lprocfs: cleanup stats locking code

Add comment blocks on lprocfs_stats_lock() and lprocfs_stats_unlock().
Move common NOPERCPU code out of the switch() statements to reduce
code size and complexity, since it doesn't depend on the opc at all.

Replace switch() in lprocfs_stats_unlock() with a simple if/else,
since the lock opc was already checked in lprocfs_stats_lock().

Add an enum for the lprocfs_stats_lock() operations to make it clear
what the valid values are and allow compiler checking.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5946
Reviewed-on: http://review.whamcloud.com/12872
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: osc: change cl_extent_tax and *grants to unsigned

Change the type accordant usage and remove warnings.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/12386
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: osc: osc_object_ast_clear() LBUG

An OSC object could be destroyed with AGL locks waiting for granted,
so we'd get rid of the osc_object_ast_clear() assertion that its
dlm locks all getting granted.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6042
Reviewed-on: http://review.whamcloud.com/13163
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: mgc: add nid iteration

mgc_apply_recover_logs use only first nid from entry,
this could be the problem for a cluster with several network
address for a one node.

Signed-off-by: Alexander Boyko <alexander.boyko@seagate.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5950
Xyratex-bug-id: MRP-2255
Reviewed-on: http://review.whamcloud.com/12829
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Ann Koehler <amk@cray.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ptlrpc: fix race between connect vs resend

Buggy code at ptlrpc_connect_interpret()
finish:
    rc = ptlrpc_import_recovery_state_machine(imp);
    ...
    Set import connection flags
When import has FULL state ptlrpc_import_recovery_state_machine()
wakeup all waiters on import and all delayed request, which was
resented. And it could happened that request was send without
updated flags and AT is disabled. If such request is in progress
on the server, server drop the new instance, and could do early reply
for it. But this early reply confuse client, cause it wait real
reply(no AT for this request). Client try to touch buffer outside
reply and got EPROTO error.
The same bug existed for initital connect too. Import became FULL
before import connection flags was set.

Signed-off-by: Alexander Boyko <alexander_boyko@xyratex.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5528
Xyratex-bug-id: MRP-2034
Reviewed-on: http://review.whamcloud.com/11723
Reviewed-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Alexander Boyko <alexander.boyko@seagate.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: lov: flatten struct lov_stripe_md

Flatten out the lsm_wire struct from the middle of struct
lov_stripe_md and remove the member name macros.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5814
Reviewed-on: http://review.whamcloud.com/12581
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ldlm: move LDLM_GID_ANY to lustre_dlm.h

lustre_idl.h only includes wire data; lustre_dlm.h is the
right place for LDLM_GID_ANY.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6028
Reviewed-on: http://review.whamcloud.com/13074
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ptlrpc: fix comparison between signed and unsigned

Change return type and size argiments of lustre_msg_hdr_size(),
lustre_msg_buf{len,count}() and req_capsule_*_size() to __u32.
Change type of req_format->rf_idx and req_format->rf_fields.nr
to size_t. Also return zero for incorrect message magic instead
of -EINVAL. This will be more robust because of few of them after
LASSERTF(0, "...") and will not be returned. In the rest places
it return zero size instead of huge number after implicit
unsigned conversion.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/12475
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: clio: add coo_getstripe interface

Use cl_object_operations::coo_getstripe() to handle
LL_IOC_LOV_GETSTRIPE ops.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5823
Reviewed-on: http://review.whamcloud.com/12452
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: obdclass: change cl_fault_io->ft_nob to size_t

Change the type accordant usage.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/12380
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: obd: change brw_page->count to unsigned

Pages count is unsigned. So, change the type accordant usage.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/12378
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ldlm: Recalculate interval in ldlm_pool_recalc()

Instead of rechecking a static value, recalculate to see if pool stats
need to be updated.
Add newline so message will print instead of warning about missing
newline.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4536
Reviewed-on: http://review.whamcloud.com/12547
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ptlrpc: Suppress error message when imp_sec is freed

There is a race condition on client reconnect when the import
is being destroyed. Some outstanding client bound requests
are being processed when the imp_sec has alread been freed.
Ensure to suppress the error message in import_sec_validate_get()
in that case

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3353
Reviewed-on: http://review.whamcloud.com/10200
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: obdclass: eliminate NULL error return

Always return an ERR_PTR() on errors, never return a NULL,
in lu_object_find_slice(). Also clean up callers who
no longer need special case handling of NULL returns.

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5858
Reviewed-on: http://review.whamcloud.com/12554
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: obdclass: change loop indexes to unsigned

Cleanup warnings about comparison between signed and unsigned.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/12387
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: fiemap: set FIEMAP_EXTENT_LAST correctly

When we've collected enough extents as user requested, we'd check one
further to decide whether we've reached the last extent of the file.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5933
Reviewed-on: http://review.whamcloud.com/12781
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ldlm: evict clients returning errors on ASTs

To test proper behavior of clients returning errors on ASTs
we can induce a failure with setting OBD_FAIL_LDLM_BL_CALLBACK_NET.
Handle the new additonal case of cfs_fail_err being set as well
so that the cfs_fail_err can be sent back in a reply.

Signed-off-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5581
Xyratex-bug-id: MRP-2041
Reviewed-on: http://review.whamcloud.com/11752
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: mdc: Proper accessing struct lov_user_md

In mdc_setattr_pack() access the members of struct lov_user_md by
little endian byte order.

Signed-off-by: Yoshifumi Uemura <kogexe@gmail.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5889
Reviewed-on: http://review.whamcloud.com/12683
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: obdclass: lu_htable_order() return type to long

Change the type accordant usage.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/12385
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: fix dup flags names

The name 'xattr' is used for two different ll_flags bits.
Change the names to be distinct and different, reflecting
the names of the bits used in LL_SBI_xbitnamex #defines.

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5586
Reviewed-on: http://review.whamcloud.com/12892
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llog: prevent out-of-bound index

llog_process_thread() can be called from llog_cat_process_cb with an
index already out of bound, leading to the following crash:

LustreError: 3773:0:(llog.c:310:llog_process_thread())
ASSERTION(index <= last_index + 1 ) failed:
LustreError: 3773:0:(llog.c:310:llog_process_thread()) LBUG

#0 [ffff8801144bf900] machine_kexec at ffffffff81038f3b
#1 [ffff8801144bf960] crash_kexec at ffffffff810c5d82
#2 [ffff8801144bfa30] panic at ffffffff8152798a
#3 [ffff8801144bfab0] lbug_with_loc at ffffffffa02f8eeb [libcfs]
#4 [ffff8801144bfad0] llog_process_thread at ffffffffa0413fff [obdclass]
#5 [ffff8801144bfb80] llog_process_or_fork at ffffffffa041585f [obdclass]
#6 [ffff8801144bfbd0] llog_cat_process_cb at ffffffffa0418612 [obdclass]
#7 [ffff8801144bfc30] llog_process_thread at ffffffffa0413c22 [obdclass]
#8 [ffff8801144bfce0] llog_process_or_fork at ffffffffa041585f [obdclass]
#9 [ffff8801144bfd30] llog_cat_process_or_fork at ffffffffa0416b9d [obdclass]

If index is too big, simply return success.

Signed-off-by: frank zago <fzago@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5635
Reviewed-on: http://review.whamcloud.com/12161
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ptlrpc: quiet errors on initial connection

It may be that a client or MDS is trying to connect to a target (OST
or peer MDT) before that target is finished setup. Rather than
spamming the console logs during initial connection, only print a
console error message if there are repeated failures trying to
connect to the target, which may indicate an error on that node.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3456
Reviewed-on: http://review.whamcloud.com/10057
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ldlm: revert the changes for lock canceling policy

The changes for LRU lock policy was introduced by commit bfae5a4e,
where I was trying to revise the policy to pick locks for canceling.

However, this caused two problems as mentioned in LU-5727. The first
problem is that the lock can only be picked for canceling only if
the number of LRU locks is over preset LRU number AND it's aged; the
second problem is that mdc_cancel_weight() tends to not cancel OPEN
locks, therefore open locks can be kept forever and finally exhausts
memory on the MDT side.

The commit 7b2d26b0 ("revert changes to ldlm_cancel_aged_policy") fixed
the first problem. This patch will revert the rest of changes related
to LRU policy revise.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5727
Reviewed-on: http://review.whamcloud.com/12733
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: recovery: don't replay closed open

To avoid scanning the replay open list every time in the
ptlrpc_free_committed(), the fix of LU-2613 (4322e0f9) changed
the ptlrpc_free_committed() to skip the open list unless the
import generation is changed. That introduced a race which could
make a closed open being replayed:

1. Application calls ll_close_inode_openhandle()-> mdc_close(),
   to close file, rq_replay is cleared, but the open request is
   still on the imp_committed_list;

2. Before the md_clear_open_replay_data() is called for close,
   client start replay, and that closed open will be replayed
   mistakenly;

3. Open replay interpret callback (mdc_replay_open) could race
   with the mdc_clear_open_replay_data() at the end;

This patch fix the ptlrpc_free_committed() to make sure the
open list is scanned on recovery to prevent the closed open request
from being replayed.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5507
Reviewed-on: http://review.whamcloud.com/12667
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: changelog: Proper record remapping

Fixed changelog_remap_rec() to correctly remap records emitted
with jobid_var=disabled, i.e. delivered by new servers but with
no jobid field.

Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5862
Reviewed-on: http://review.whamcloud.com/12574
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Robert Read <robert.read@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: remove ll_objects_destroy()

Remove ll_objects_destroy(). This function is not needed for
interoperability with servers of version 2.4 or higher.

Remove the then unused function lov_destroy() and its supporting
functions. Remove the lsm_destroy method of struct lsm_operations.

Remove the unused struct lov_stripe_md, MD export, and capa parameters
from obd_destroy() and its implementations.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5814
Reviewed-on: http://review.whamcloud.com/12618
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: echo: replace lov_stripe_md with lov_oinfo

In echo_client replace uses of struct lov_stripe_md with struct
lov_oinfo (since the instances of the former really only contained a
single instance of the latter). Remove the then unneccessary functions
echo_alloc_memmd(), echo_free_memmd(), osc_unpackmd(), and
obd_alloc_memmd(). Remove the struct lov_stripe_md * parameter from
obd_create().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5418
Reviewed-on: http://review.whamcloud.com/12447
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: obd: remove unused obd methods

Remove no longer used osc_packmd() and osc_getstripe().
Several ioctls cases that are no longer used are removed.
Remove no longer used adjust_kms() infrastructure.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2785
Reviewed-on: http://review.whamcloud.com/8547
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: statahead: small fixes and cleanup

small fixes:
* when 'unplug' is set for ll_statahead(), sa_put() shouldn't kill
   the entry found, because its inflight RPC may not finish yet.
* remove 'sai_generation', add 'lli_sa_generation' because the
   former one is not safe to access without lock.
* revalidate_statahead_dentry() may fail to wait for statahead
   entry to become ready, in this case it should not release this
   entry, because it may be used by inflight statahead RPC.

cleanups:
* rename ll_statahead_enter() to ll_statahead().
* move dentry 'lld_sa_generation' update to ll_statahead() to
   simplify code and logic.
* other small cleanups.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3270
Reviewed-on: http://review.whamcloud.com/9667
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6222
Reviewed-on: http://review.whamcloud.com/13708
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: fix messages with missing newlines

Restore the trailing newline in the definition of OSC_DUMP_GRANT().
Remove an unnecessary CDEBUG() from ldlm_pool_recalc().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5551
Reviewed-on: http://review.whamcloud.com/11996
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: lov: remove unused {get, set}_info handlers

In LOV and OSC remove handlers for the obsolete get and set info keys:
KEY_CAPA_KEY, KEY_CONNECT_FLAG, KEY_EVICT_BY_NID, KEY_LAST_ID,
KEY_LOCK_TO_STRIPE, KEY_MDS_CONN, KEY_NEXT_ID.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5814
Reviewed-on: http://review.whamcloud.com/12445
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: changelog: fix comparison between signed and unsigned

Change type of changelog_*{namelen,size}() to size_t.
Fixed string specifier for unsigned types.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/12474
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: lov: remove LL_IOC_RECREATE_{FID, OBJ}

Remove the obsolete ioctls LL_IOC_RECREATE_FID and LL_IOC_RECREATE_OBJ
along with their handlers in llite. Remove the then unused OBD method
lov_create(). Remove OBD_FL_RECREATE_OBJS handling from osc_create().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5814
Reviewed-on: http://review.whamcloud.com/12442
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: obdclass: change lu_site->ls_purge_start to unsigned

Change the type accordant usage.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/12384
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: lu_dirent_calc_size() return type to size_t

Change the type accordant usage.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/12383
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ldlm: count of pools is unsigned long

Function ldlm_pools_count() return unsigned long but counter is int.
Use ldlm_pool_granted() everywhere.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/12304
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: replace direct HZ access with kernel APIs

On some customer's systems, kernel was compiled with HZ defined to
100, instead of 1000. This improves performance for HPC applications.
However, to use these systems with Lustre, customers have to re-build
Lustre for the kernel because Lustre directly uses the defined
constant HZ.

Since kernel 2.6.21, some non-HZ dependent timing APIs become non-
inline functions, which can be used in Lustre codes to replace the
direct HZ access.

These kernel APIs include:
  jiffies_to_msecs()
  jiffies_to_usecs()
  jiffies_to_timespec()
  msecs_to_jiffies()
  usecs_to_jiffies()
  timespec_to_jiffies()

And here are some samples of the replacement:
  HZ            -> msecs_to_jiffies(MSEC_PER_SEC)
  n * HZ        -> msecs_to_jiffies(n * MSEC_PER_SEC)
  HZ / n        -> msecs_to_jiffies(MSEC_PER_SEC / n)
  n / HZ        -> jiffies_to_msecs(n) / MSEC_PER_SEC
  n / HZ * 1000 -> jiffies_to_msecs(n)

This patch replaces the direct HZ access in lustre modules.

Signed-off-by: Jian Yu <jian.yu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5443
Reviewed-on: http://review.whamcloud.com/12052
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: obd: cleanup struct md_op_data and uses

Make the following changes in or around struct md_op_data:

* rename enum op_cli_flags to enum md_cli_flags.

* Change to type of the op_flags member from __u32 to enum
   md_op_flags.

* Remove the used but never set member op_npages.

* Remove the set but never used member op_stripe_offset (an alias for
   op_ioepoch).

* Remove the op_max_pages alias for op_valid and add a op_max_pages
   member.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/11734
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: mdc: fix comparison between signed and unsigned

Change type of client_obd->*_mds_*size from int to __u32 and
argumanets of related create/rename/setattr functions.
Change type of op_data->op_namelen to size_t.
Change type of argument size for all mdc_*_pack() to size_t.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/11379
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: at: net AT after connect

Once connected, the previously gathered AT statistics is not valid
anymore because may reflect other routing, etc. The connect by itself
could take a long time due to different reasons (e.g. server was not
ready) and net latency got very high (see import_select_connection())
what does not reflect the current situation.

Take into account only the current (re-)CONNECT rpc latency.

Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Signed-off-by: Alexander Boyko <alexander_boyko@xyratex.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5380
Xyratex-bug-id: MRP-1285
Reviewed-on: http://review.whamcloud.com/11155
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: statahead: race in start/stop statahead

When starting statahead thread, it should check whether current
lli_opendir_key was deauthorized in the mean time by another
process.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3270
Reviewed-on: http://review.whamcloud.com/9666
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: statahead: ll_intent_drop_lock() called in spinlock

ll_intent_drop_lock() may sleep, which should not be called inside
spinlock.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2272
Reviewed-on: http://review.whamcloud.com/9665
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: statahead: use dcache-like interface for sa entry

Rename ll_sa_entry to sa_entry, and manage sa_entry cache with
dcache-like interfaces.

sa_entry is not needed to be refcounted, because only scanner
can free it, so after it's put in stat list, statahead thread
shouldn't access it any longer.

ll_statahead_interpret() doesn't need to take sai refcount,
because statahead thread will wait for all inflight RPC to
finish.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3270
Reviewed-on: http://review.whamcloud.com/9664
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: allow setting stripes to specify OSTs

Extend the llite layer to support specifying individual target
OSTs. Support specifying OSTs for regular files only. Directory
support will be implemented later in a separate project. With
this a file could have for example a OST index layout of
2,4,5,9,11. In addition, duplicate indices will be eliminated
automatically. Calculate the max easize by ld_active_tgt_count
instead of ld_tgt_count. However this may introduce problems
when the OSTs are in recovery because non sufficient buffer
may be allocated to store EA.

Signed-off-by: Jian Yu <jian.yu@intel.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4665
Reviewed-on: http://review.whamcloud.com/9383
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: Add ioctl to get parent fids from link EA.

Added LL_IOC_GETPARENT to retrieve the <parent_fid>/name(s) of a given
entry, based on its link EA. This saves multiple calls to
path2fid/fid2path.

Merged with second later patch that does various cleanups.
Avoid unneeded allocation. Get read-only attributes from the user
getparent structure and write the modified attributes only, instead
of populating a whole structure in kernel and copying it back.

Signed-off-by: Thomas Leibovici <thomas.leibovici@cea.fr>
Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3613
Reviewed-on: http://review.whamcloud.com/7069
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5837
Reviewed-on: http://review.whamcloud.com/12527
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: frank zago <fzago@cray.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: obd: restore linkea support

Original linkea was only used for the lustre server code
so it was removed from the upstream client. Now it needs
to be restored for client work that uses this infrastructure.

Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: lmv: add testing for bad name hash

Enable testing of the lfsck recovery feature in the
client code for the case when name hash for some
entry becomes corrupt.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5519
Reviewed-on: http://review.whamcloud.com/11846
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: ensure all data flush out when umount

Write out all extents when clear inode. Otherwise we
may lose data while umount.

Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5584
Reviewed-on: http://review.whamcloud.com/12103
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ldlm: per-export lock callback timeout

The lock callback timeout is calculated as an average per namespace.
This does not reflect individual client behavior.
Instead, we should calculate it on a per-export basis.

This is the client side changes for upstream client.

Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4942
Reviewed-by: Andriy Skulysh <Andriy_Skulysh@xyratex.com>
Reviewed-by: Alexey Lyashkov <Alexey_Lyashkov@xyratex.com>
Xyratex-bug-id: MRP-417
Reviewed-on: http://review.whamcloud.com/9336
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: lmv: move some inline functions to lustre_lmv.h

Move some inline code out of lmv core into lustre_lmv.h.
This is to prepare for use outside of the lmv layer in
the future of these functions. Change from passing in
struct lmv_stripe_md to just int for lmv_is_known_hash_type.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5519
Reviewed-on: http://review.whamcloud.com/11845
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: Flexible changelog format.

Added jobid fields to Changelog records (and extended records). The
CLF_JOBID flags allows to check if the field is present or not (old
format) when reading an entry. Jobids are expressed as 32 chars long,
zero-terminated strings. Updated test_205 in sanity.sh.

Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1996
Reviewed-on: http://review.whamcloud.com/4060
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Aurelien Degremont <aurelien.degremont@cea.fr>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: enforce pool name length limit

The pool related codes have some inconsistency about the length
of pool name. Creating and setting a pool name of length 16
to a directory will succeed. However, creating a file under
that directory will fail.

This patch disables any pool name which is longer or equal to
16. And it changes LOV_MAXPOOLNAME from 16 to 15 which might
cause some invalid LLOG records of OST pools with 16 byte names.
It is not a problem since invalid LLOG records are just ignored.
And OST pools with 16 byte names won't work well anyway on the
old versions. There will be problem of inconsistency if part of
the servers have this patch and part of the servers don't. But
it would be safe to assume that this is not a normal
configuration.

Signed-off-by: Li Xi <lixi@ddn.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5054
Reviewed-on: http://review.whamcloud.com/10306
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: handle concurrent use of cob_transient_pages

With the lockless __generic_file_aio_write introduced in LU-1669,
ll_direct_IO_26 is no longer protected by the inode i_isem.

This renders obsoltete checks that all transient pages have been
handled before and after entry, and requires atomic access to their
counter.

Signed-off-by: Stephen Champion <schamp@sgi.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5700
Reviewed-on: http://review.whamcloud.com/12179
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: lmv: remove dead code

The member lmv_obd->server_timeout and function lmv_set_timeouts()
are not used.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-991
Reviewed-on: http://review.whamcloud.com/11880
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: misc: Reduce exposure to overflow on page counters.

When the number of an object in use or circulation is tied to memory
size of the system, very large memory systems can overflow 32 bit
counters. This patch addresses overflow on page counters in the osc LRU
and obd accounting.

Signed-off-by: Stephen Champion <schamp@sgi.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4856
Reviewed-on: http://review.whamcloud.com/10537
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: lmv: change type of lmv_obd->tgts_size to u32

tgts_size is used as unsigned.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/11881
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: obd: change type of lmv_tgt_desc->ltd_idx to u32

ltd_idx is used as unsigned.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/11879
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: don't call make_bad_inode() on an old inode

In ll_iget() if ll_update_inode() fails then do not call
make_bad_inode() on the inode since it may still be in use.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5468
Reviewed-on: http://review.whamcloud.com/11609
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: obd: rename LUSTRE_STRIPE_MAXBYTES

Rename LUSTRE_STRIPE_MAXBYTES to LUSTRE_EXT3_STRIPE_MAXBYTES and
correct the comment describing its use.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/11800
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: remove lustre_lite.h

Move several definition only used in lustre/llite/ to
lustre/llite/llite_internal.h.
Remove lustre/include/{,linux/}lustre_lite.h and fixup
the missing includes in other headers that this exposes.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/11501
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: osc: debug to match extent to brw RPC

Currently, it's difficult to match brw RPCs to objects and
extents from client logs. This patch adds a D_RPCTRACE
debug message giving the necessary information.

Signed-off-by: Patrick Farrell <paf@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5531
Reviewed-on: http://review.whamcloud.com/11548
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@seagate.com>
Reviewed-by: Ann Koehler <amk@cray.com>
Reviewed-by: Ryan Haasken <haasken@cray.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: cleanup lustre_lib.h

Remove some unused declarations from lustre_lib.h and move
some others to more natural headers.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/11500
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: no need to check dentry is NULL

We are already touching dentry in CDEBUG macros so it
will crash long before these checks. Since this is the
case no need to do an additional check.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/10769
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: style cleanup for ll_mkdir

Style cleanup to make the code readable.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/10769
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: turn mode to umode_t for ll_new_inode()

Change int mode to umode_t.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/10769
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: remove mode from ll_create_it()

Remove the unused mode parameter from ll_create_it().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/10769
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: remove lookup_flags from ll_lookup_it()

Remove the effectively unused lookup_flags parameter from
ll_lookup_it().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/10769
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: vvp: Use lockless __generic_file_aio_write

Testing multi-threaded single shard file write performance has shown
the inode mutex to be a limiting factor when using the
generic_file_write_iter function. To work around this bottle neck, this
change replaces the locked version of that call with the lock less
version, specifically, __generic_file_write_iter.

In order to maintain posix consistency, Lustre must now employ it's
own locking mechanism in the higher layers. Currently writes are
protected using the lli_write_mutex in the ll_inode_info structure.
To protect against simultaneous write and truncate operations, since
we no longer take the inode mutex during writes, we must down the
lli_trunc_sem semaphore.

Unfortunately, this change by itself does not garner any performance
benefits. Using FIO on a single machine with 32 GB of RAM, write
performance tests were ran with and without this change applied; the
results are below:

    +---------+-----------+---------+--------+--------+
    |     fio v2.0.13     |   Write Bandwidth (KB/s)  |
    +---------+-----------+---------+--------+--------+
    | # Tasks | GB / Task | Test 1  | Test 2 | Test 3 |
    +---------+-----------+---------+--------+--------+
    |    1    |    64     |  452446 | 454623 | 457653 |
    |    2    |    32     |  850318 | 565373 | 602498 |
    |    4    |    16     | 1058900 | 463546 | 529107 |
    |    8    |     8     | 1026300 | 468190 | 576451 |
    |   16    |     4     | 1065500 | 503160 | 462902 |
    |   32    |     2     | 1068600 | 462228 | 466963 |
    |   64    |     1     |  991830 | 556618 | 557863 |
    +---------+-----------+---------+--------+--------+

* Test 1: Lustre client running 04ec54f. File per process write
           workload. This test was used as a baseline for what we
           _could_ achieve in the single shared file tests if the
           bottle necks were removed.

* Test 2: Lustre client running 04ec54f. Single shared file
           workload, each task writing to a unique region.

* Test 3: Lustre client running 04ec54f + this patch. Single shared
           file workload, each task writing to a unique region.

In order to garner any real performance benefits out of a single
shared file workload, the lli_write_mutex needs to be broken up into a
range lock. That would allow write operations to unique regions of a
file to be executed concurrently. This work is left to be done in a
follow up patch.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1669
Reviewed-on: http://review.whamcloud.com/6672
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: Replace write mutex with range lock

Testing has shown the ll_inode_inode's lli_write_mutex to be a
limiting factor with single shared file write performance, when using
many writing threads on a single machine. Even if each thread is
writing to a unique portion of the file, the lli_write_mutex will
prevent no more than a single thread to ever write to the file
simultaneously.

This change attempts to remove this bottle neck, by replacing this
mutex with a range lock. This should allow multiple threads to write
to a single file simultaneously iff the threads are writing to unique
regions of the file.

Performance testing shows this change to garner a significant
performance boost to write bandwidth. Using FIO on a single machine
with 32 GB of RAM, write performance tests were run with and without
this change applied; the results are below:

    +---------+-----------+---------+--------+--------+--------+
    |     fio v2.0.13     |        Write Bandwidth (KB/s)      |
    +---------+-----------+---------+--------+--------+--------+
    | # Tasks | GB / Task | Test 1  | Test 2 | Test 3 | Test 4 |
    +---------+-----------+---------+--------+--------+--------+
    |    1    |    64     |  452446 | 454623 | 457653 | 463737 |
    |    2    |    32     |  850318 | 565373 | 602498 | 733027 |
    |    4    |    16     | 1058900 | 463546 | 529107 | 976284 |
    |    8    |     8     | 1026300 | 468190 | 576451 | 963404 |
    |   16    |     4     | 1065500 | 503160 | 462902 | 830065 |
    |   32    |     2     | 1068600 | 462228 | 466963 | 749733 |
    |   64    |     1     |  991830 | 556618 | 557863 | 710912 |
    +---------+-----------+---------+--------+--------+--------+

* Test 1: Lustre client running 04ec54f. File per process write
           workload. This test was used as a baseline for what we
           _could_ achieve in the single shared file tests if the
           bottle necks were removed.

* Test 2: Lustre client running 04ec54f. Single shared file
           workload, each task writing to a unique region.

* Test 3: Lustre client running 04ec54f + I0023132b. Single shared
           file workload, each task writing to a unique region.

* Test 4: Lustre client running 04ec54f + this patch.
           Single shared file workload, each task writing to a unique
           region.

Direct IO does not use the page cache like normal IO, so
concurrent direct IO reads of the same pages are not safe.

As a result, direct IO reads must take the range lock
in ll_file_io_generic, otherwise they will attempt to work
on the same pages and hit assertions like:
(osc_request.c:1219:osc_brw_prep_request())
ASSERTION( i == 0 || pg->off > pg_prev->off ) failed:
i 3 p_c 10 pg ffffea00017a5208 [pri 0 ind 2771] off 16384
prev_pg ffffea00017a51d0 [pri 0 ind 2256] off 16384

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Patrick Farrell <paf@cray.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1669
Reviewed-on: http://review.whamcloud.com/6320
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6227
Reviewed-on: http://review.whamcloud.com/14385
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Alexander Boyko <alexander.boyko@seagate.com>
Reviewed-by: Hiroya Nozaki <nozaki.hiroya@jp.fujitsu.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ldlm: restore some of the interval functionality

Earlier a bunch of interval handling got removed since it wasn't
used by the upstream client. Now some of it is needed again for
the client code so this patch restores what is needed.

Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ldlm: resend AST callbacks

While clients will resend client->server RPCs, servers would not
resend server->client RPCs such as LDLM callbacks (blocking
or completion callbacks/ASTs). This could result in clients being
evicted from the server if blocking callbacks were dropped by the
network (a failed router or lossy network) and the client did not
cancel the requested lock in time.
In order to fix this problem, this patch adds the ability to resend
LDLM callbacks from the server and give the client a chance to
respond within the timeout period before it is evicted:

- resend BL AST within lock callback timeout period;
- still do not resend CANCEL_ON_BLOCK;
- regular resend for CP AST without BL AST embedded;
- prolong lock callback timeout on resend;

some fixes:
- recovery-small test_10 to actually evict the client
with dropped BL AST;
- ETIMEDOUT to be returned if send limit is expired;

Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5520
Reviewed-by: Alexey Lyashkov <Alexey_Lyashkov@xyratex.com>
Reviewed-by: Andriy Skulysh <Andriy_Skulysh@xyratex.com>
Xyratex-bug-id: MRP-417
Reviewed-on: http://review.whamcloud.com/9335
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ldlm: reconstruct proper flags on enqueue resend

otherwise, waiting lock may get granted as no BLOCKED_GRANTED
flag is returned

Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Xyratex-bug-id: MRP-1944
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5496
Reviewed-on: http://review.whamcloud.com/11644
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: statahead: statahead thread wait for RPCs to finish

Statahead thread should wait for inflight stat RPCs to finish in
case statahead RPC callback may access data allocated in statahead
thread context.

ll_sa_entry_fini() should keep old entry if stat RPC is not
finished yet.

Simplify sai refcounting:
* newly allocated sai will hold one refcount, and it will put it
  after starting statahead thread.
* statahead thread holds one refcount.
* agl thread holds one refcount.
* stat process calls do_statahead_enter() which will try to get
  sai, and if it's valid, it will revalidate from statahead cache,
  and put refcount after use.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3270
Reviewed-on: http://review.whamcloud.com/9663
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>