Sergey Senozhatsky [Tue, 26 Jul 2016 22:22:59 +0000 (15:22 -0700)]
zram: drop gfp_t from zcomp_strm_alloc()
We now allocate streams from CPU_UP hot-plug path, there are no
context-dependent stream allocations anymore and we can schedule from
zcomp_strm_alloc(). Use GFP_KERNEL directly and drop a gfp_t parameter.
Link: http://lkml.kernel.org/r/20160531122017.2878-9-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergey Senozhatsky [Tue, 26 Jul 2016 22:22:56 +0000 (15:22 -0700)]
zram: add more compression algorithms
Add "deflate", "lz4hc", "842" algorithms to the list of known
compression backends. The real availability of those algorithms,
however, depends on the corresponding CONFIG_CRYPTO_FOO config options.
[sergey.senozhatsky@gmail.com: zram-add-more-compression-algorithms-v3]
Link: http://lkml.kernel.org/r/20160604024902.11778-7-sergey.senozhatsky@gmail.com
Link: http://lkml.kernel.org/r/20160531122017.2878-8-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergey Senozhatsky [Tue, 26 Jul 2016 22:22:54 +0000 (15:22 -0700)]
zram: delete custom lzo/lz4
Remove lzo/lz4 backends, we use crypto API now.
[sergey.senozhatsky@gmail.com: zram-delete-custom-lzo-lz4-v3]
Link: http://lkml.kernel.org/r/20160604024902.11778-6-sergey.senozhatsky@gmail.com
Link: http://lkml.kernel.org/r/20160531122017.2878-7-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergey Senozhatsky [Tue, 26 Jul 2016 22:22:51 +0000 (15:22 -0700)]
zram: cosmetic: cleanup documentation
zram documentation is a mix of different styles: spaces, tabs, tabs +
spaces, etc. Clean it up.
Link: http://lkml.kernel.org/r/20160531122017.2878-6-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergey Senozhatsky [Tue, 26 Jul 2016 22:22:48 +0000 (15:22 -0700)]
zram: use crypto api to check alg availability
There is no way to get a string with all the crypto comp algorithms
supported by the crypto comp engine, so we need to maintain our own
backends list. At the same time we additionally need to use
crypto_has_comp() to make sure that the user has requested a compression
algorithm that is recognized by the crypto comp engine. Relying on
/proc/crypto is not an options here, because it does not show
not-yet-inserted compression modules.
Example:
modprobe zram
cat /proc/crypto | grep -i lz4
modprobe lz4
cat /proc/crypto | grep -i lz4
name : lz4
driver : lz4-generic
module : lz4
So the user can't tell exactly if the lz4 is really supported from
/proc/crypto output, unless someone or something has loaded it.
This patch also adds crypto_has_comp() to zcomp_available_show(). We
store all the compression algorithms names in zcomp's `backends' array,
regardless the CONFIG_CRYPTO_FOO configuration, but show only those that
are also supported by crypto engine. This helps user to know the exact
list of compression algorithms that can be used.
Example:
module lz4 is not loaded yet, but is supported by the crypto
engine. /proc/crypto has no information on this module, while
zram's `comp_algorithm' lists it:
cat /proc/crypto | grep -i lz4
cat /sys/block/zram0/comp_algorithm
[lzo] lz4 deflate lz4hc 842
We still use the `backends' array to determine if the requested
compression backend is known to crypto api. This array, however, may not
contain some entries, therefore as the last step we call crypto_has_comp()
function which attempts to insmod the requested compression algorithm to
determine if crypto api supports it. The advantage of this method is that
now we permit the usage of out-of-tree crypto compression modules
(implementing S/W or H/W compression).
[sergey.senozhatsky@gmail.com: zram-use-crypto-api-to-check-alg-availability-v3]
Link: http://lkml.kernel.org/r/20160604024902.11778-4-sergey.senozhatsky@gmail.com
Link: http://lkml.kernel.org/r/20160531122017.2878-5-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergey Senozhatsky [Tue, 26 Jul 2016 22:22:45 +0000 (15:22 -0700)]
zram: switch to crypto compress API
We don't have an idle zstreams list anymore and our write path now works
absolutely differently, preventing preemption during compression. This
removes possibilities of read paths preempting writes at wrong places
(which could badly affect the performance of both paths) and at the same
time opens the door for a move from custom LZO/LZ4 compression backends
implementation to a more generic one, using crypto compress API.
Joonsoo Kim [1] attempted to do this a while ago, but faced with the
need of introducing a new crypto API interface. The root cause was the
fact that crypto API compression algorithms require a compression stream
structure (in zram terminology) for both compression and decompression
ops, while in reality only several of compression algorithms really need
it. This resulted in a concept of context-less crypto API compression
backends [2]. Both write and read paths, though, would have been
executed with the preemption enabled, which in the worst case could have
resulted in a decreased worst-case performance, e.g. consider the
following case:
CPU0
zram_write()
spin_lock()
take the last idle stream
spin_unlock()
<< preempted >>
zram_read()
spin_lock()
no idle streams
spin_unlock()
schedule()
resuming zram_write compression()
but it took me some time to realize that, and it took even longer to
evolve zram and to make it ready for crypto API. The key turned out to be
-- drop the idle streams list entirely. Without the idle streams list we
are free to use compression algorithms that require compression stream for
decompression (read), because streams are now placed in per-cpu data and
each write path has to disable preemption for compression op, almost
completely eliminating the aforementioned case (technically, we still have
a small chance, because write path has a fast and a slow paths and the
slow path is executed with the preemption enabled; but the frequency of
failed fast path is too low).
TEST
====
- 4 CPUs, x86_64 system
- 3G zram, lzo
- fio tests: read, randread, write, randwrite, rw, randrw
test script [3] command:
ZRAM_SIZE=3G LOG_SUFFIX=XXXX FIO_LOOPS=5 ./zram-fio-test.sh
BASE PATCHED
jobs1
READ: 2527.2MB/s 2482.7MB/s
READ: 2102.7MB/s 2045.0MB/s
WRITE: 1284.3MB/s 1324.3MB/s
WRITE: 1080.7MB/s 1101.9MB/s
READ: 430125KB/s 437498KB/s
WRITE: 430538KB/s 437919KB/s
READ: 399593KB/s 403987KB/s
WRITE: 399910KB/s 404308KB/s
jobs2
READ: 8133.5MB/s 7854.8MB/s
READ: 7086.6MB/s 6912.8MB/s
WRITE: 3177.2MB/s 3298.3MB/s
WRITE: 2810.2MB/s 2871.4MB/s
READ: 1017.6MB/s 1023.4MB/s
WRITE: 1018.2MB/s 1023.1MB/s
READ: 977836KB/s 984205KB/s
WRITE: 979435KB/s 985814KB/s
jobs3
READ: 13557MB/s 13391MB/s
READ: 11876MB/s 11752MB/s
WRITE: 4641.5MB/s 4682.1MB/s
WRITE: 4164.9MB/s 4179.3MB/s
READ: 1453.8MB/s 1455.1MB/s
WRITE: 1455.1MB/s 1458.2MB/s
READ: 1387.7MB/s 1395.7MB/s
WRITE: 1386.1MB/s 1394.9MB/s
jobs4
READ: 20271MB/s 20078MB/s
READ: 18033MB/s 17928MB/s
WRITE: 6176.8MB/s 6180.5MB/s
WRITE: 5686.3MB/s 5705.3MB/s
READ: 2009.4MB/s 2006.7MB/s
WRITE: 2007.5MB/s 2004.9MB/s
READ: 1929.7MB/s 1935.6MB/s
WRITE: 1926.8MB/s 1932.6MB/s
jobs5
READ: 18823MB/s 19024MB/s
READ: 18968MB/s 19071MB/s
WRITE: 6191.6MB/s 6372.1MB/s
WRITE: 5818.7MB/s 5787.1MB/s
READ: 2011.7MB/s 1981.3MB/s
WRITE: 2011.4MB/s 1980.1MB/s
READ: 1949.3MB/s 1935.7MB/s
WRITE: 1940.4MB/s 1926.1MB/s
jobs6
READ: 21870MB/s 21715MB/s
READ: 19957MB/s 19879MB/s
WRITE: 6528.4MB/s 6537.6MB/s
WRITE: 6098.9MB/s 6073.6MB/s
READ: 2048.6MB/s 2049.9MB/s
WRITE: 2041.7MB/s 2042.9MB/s
READ: 2013.4MB/s 1990.4MB/s
WRITE: 2009.4MB/s 1986.5MB/s
jobs7
READ: 21359MB/s 21124MB/s
READ: 19746MB/s 19293MB/s
WRITE: 6660.4MB/s 6518.8MB/s
WRITE: 6211.6MB/s 6193.1MB/s
READ: 2089.7MB/s 2080.6MB/s
WRITE: 2085.8MB/s 2076.5MB/s
READ: 2041.2MB/s 2052.5MB/s
WRITE: 2037.5MB/s 2048.8MB/s
jobs8
READ: 20477MB/s 19974MB/s
READ: 18922MB/s 18576MB/s
WRITE: 6851.9MB/s 6788.3MB/s
WRITE: 6407.7MB/s 6347.5MB/s
READ: 2134.8MB/s 2136.1MB/s
WRITE: 2132.8MB/s 2134.4MB/s
READ: 2074.2MB/s 2069.6MB/s
WRITE: 2087.3MB/s 2082.4MB/s
jobs9
READ: 19797MB/s 19994MB/s
READ: 18806MB/s 18581MB/s
WRITE: 6878.7MB/s 6822.7MB/s
WRITE: 6456.8MB/s 6447.2MB/s
READ: 2141.1MB/s 2154.7MB/s
WRITE: 2144.4MB/s 2157.3MB/s
READ: 2084.1MB/s 2085.1MB/s
WRITE: 2091.5MB/s 2092.5MB/s
jobs10
READ: 19794MB/s 19784MB/s
READ: 18794MB/s 18745MB/s
WRITE: 6984.4MB/s 6676.3MB/s
WRITE: 6532.3MB/s 6342.7MB/s
READ: 2150.6MB/s 2155.4MB/s
WRITE: 2156.8MB/s 2161.5MB/s
READ: 2106.4MB/s 2095.6MB/s
WRITE: 2109.7MB/s 2098.4MB/s
BASE PATCHED
jobs1 perfstat
stalled-cycles-frontend 102,480,595,419 ( 41.53%) 114,508,864,804 ( 46.92%)
stalled-cycles-backend 51,941,417,832 ( 21.05%) 46,836,112,388 ( 19.19%)
instructions 283,612,054,215 ( 1.15) 283,918,134,959 ( 1.16)
branches 56,372,560,385 ( 724.923) 56,449,814,753 ( 733.766)
branch-misses 374,826,000 ( 0.66%) 326,935,859 ( 0.58%)
jobs2 perfstat
stalled-cycles-frontend 155,142,745,777 ( 40.99%) 164,170,979,198 ( 43.82%)
stalled-cycles-backend 70,813,866,387 ( 18.71%) 66,456,858,165 ( 17.74%)
instructions 463,436,648,173 ( 1.22) 464,221,890,191 ( 1.24)
branches 91,088,733,902 ( 760.088) 91,278,144,546 ( 769.133)
branch-misses 504,460,363 ( 0.55%) 394,033,842 ( 0.43%)
jobs3 perfstat
stalled-cycles-frontend 201,300,397,212 ( 39.84%) 223,969,902,257 ( 44.44%)
stalled-cycles-backend 87,712,593,974 ( 17.36%) 81,618,888,712 ( 16.19%)
instructions 642,869,545,023 ( 1.27) 644,677,354,132 ( 1.28)
branches 125,724,560,594 ( 690.682) 126,133,159,521 ( 694.542)
branch-misses 527,941,798 ( 0.42%) 444,782,220 ( 0.35%)
jobs4 perfstat
stalled-cycles-frontend 246,701,197,429 ( 38.12%) 280,076,030,886 ( 43.29%)
stalled-cycles-backend 119,050,341,112 ( 18.40%) 110,955,641,671 ( 17.15%)
instructions 822,716,962,127 ( 1.27) 825,536,969,320 ( 1.28)
branches 160,590,028,545 ( 688.614) 161,152,996,915 ( 691.068)
branch-misses 650,295,287 ( 0.40%) 550,229,113 ( 0.34%)
jobs5 perfstat
stalled-cycles-frontend 298,958,462,516 ( 38.30%) 344,852,200,358 ( 44.16%)
stalled-cycles-backend 137,558,742,122 ( 17.62%) 129,465,067,102 ( 16.58%)
instructions 1,005,714,688,752 ( 1.29) 1,007,657,999,432 ( 1.29)
branches 195,988,773,962 ( 697.730) 196,446,873,984 ( 700.319)
branch-misses 695,818,940 ( 0.36%) 624,823,263 ( 0.32%)
jobs6 perfstat
stalled-cycles-frontend 334,497,602,856 ( 36.71%) 387,590,419,779 ( 42.38%)
stalled-cycles-backend 163,539,365,335 ( 17.95%) 152,640,193,639 ( 16.69%)
instructions 1,184,738,177,851 ( 1.30) 1,187,396,281,677 ( 1.30)
branches 230,592,915,640 ( 702.902) 231,253,802,882 ( 702.356)
branch-misses 747,934,786 ( 0.32%) 643,902,424 ( 0.28%)
jobs7 perfstat
stalled-cycles-frontend 396,724,684,187 ( 37.71%) 460,705,858,952 ( 43.84%)
stalled-cycles-backend 188,096,616,496 ( 17.88%) 175,785,787,036 ( 16.73%)
instructions 1,364,041,136,608 ( 1.30) 1,366,689,075,112 ( 1.30)
branches 265,253,096,936 ( 700.078) 265,890,524,883 ( 702.839)
branch-misses 784,991,589 ( 0.30%) 729,196,689 ( 0.27%)
jobs8 perfstat
stalled-cycles-frontend 440,248,299,870 ( 36.92%) 509,554,793,816 ( 42.46%)
stalled-cycles-backend 222,575,930,616 ( 18.67%) 213,401,248,432 ( 17.78%)
instructions 1,542,262,045,114 ( 1.29) 1,545,233,932,257 ( 1.29)
branches 299,775,178,439 ( 697.666) 300,528,458,505 ( 694.769)
branch-misses 847,496,084 ( 0.28%) 748,794,308 ( 0.25%)
jobs9 perfstat
stalled-cycles-frontend 506,269,882,480 ( 37.86%) 592,798,032,820 ( 44.43%)
stalled-cycles-backend 253,192,498,861 ( 18.93%) 233,727,666,185 ( 17.52%)
instructions 1,721,985,080,913 ( 1.29) 1,724,666,236,005 ( 1.29)
branches 334,517,360,255 ( 694.134) 335,199,758,164 ( 697.131)
branch-misses 873,496,730 ( 0.26%) 815,379,236 ( 0.24%)
jobs10 perfstat
stalled-cycles-frontend 549,063,363,749 ( 37.18%) 651,302,376,662 ( 43.61%)
stalled-cycles-backend 281,680,986,810 ( 19.07%) 277,005,235,582 ( 18.55%)
instructions 1,901,859,271,180 ( 1.29) 1,906,311,064,230 ( 1.28)
branches 369,398,536,153 ( 694.004) 370,527,696,358 ( 688.409)
branch-misses 967,929,335 ( 0.26%) 890,125,056 ( 0.24%)
BASE PATCHED
seconds elapsed 79.
421641008 78.
735285546
seconds elapsed 61.
471246133 60.
869085949
seconds elapsed 62.
317058173 62.
224188495
seconds elapsed 60.
030739363 60.
081102518
seconds elapsed 74.
070398362 74.
317582865
seconds elapsed 84.
985953007 85.
414364176
seconds elapsed 97.
724553255 98.
173311344
seconds elapsed 109.
488066758 110.
268399318
seconds elapsed 122.
768189405 122.
967164498
seconds elapsed 135.
130035105 136.
934770801
On my other system (8 x86_64 CPUs, short version of test results):
BASE PATCHED
seconds elapsed 19.
518065994 19.
806320662
seconds elapsed 15.
172772749 15.
594718291
seconds elapsed 13.
820925970 13.
821708564
seconds elapsed 13.
293097816 14.
585206405
seconds elapsed 16.
207284118 16.
064431606
seconds elapsed 17.
958376158 17.
771825767
seconds elapsed 19.
478009164 19.
602961508
seconds elapsed 21.
347152811 21.
352318709
seconds elapsed 24.
478121126 24.
171088735
seconds elapsed 26.
865057442 26.
767327618
So performance-wise the numbers are quite similar.
Also update zcomp interface to be more aligned with the crypto API.
[1] http://marc.info/?l=linux-kernel&m=
144480832108927&w=2
[2] http://marc.info/?l=linux-kernel&m=
145379613507518&w=2
[3] https://github.com/sergey-senozhatsky/zram-perf-test
Link: http://lkml.kernel.org/r/20160531122017.2878-3-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergey Senozhatsky [Tue, 26 Jul 2016 22:22:42 +0000 (15:22 -0700)]
zram: rename zstrm find-release functions
This has started as a 'add zlib support' work, but after some thinking I
saw no blockers for a bigger change -- a switch to crypto API.
We don't have an idle zstreams list anymore and our write path now works
absolutely differently, preventing preemption during compression. This
removes possibilities of read paths preempting writes at wrong places
and opens the door for a move from custom LZO/LZ4 compression backends
implementation to a more generic one, using crypto compress API.
This patch set also eliminates the need of a new context-less crypto API
interface, which was quite hard to sell, so we can move along faster.
benchmarks:
(x86_64, 4GB, zram-perf script)
perf reported run-time fio (max jobs=3). I performed fio test with the
increasing number of parallel jobs (max to 3) on a 3G zram device, using
`static' data and the following crypto comp algorithms:
842, deflate, lz4, lz4hc, lzo
the output was:
- test running time (which can tell us what algorithms performs faster)
and
- zram mm_stat (which tells the compressed memory size, max used memory, etc).
It's just for information. for example, LZ4HC has twice the running
time of LZO, but the compressed memory size is:
23592960 vs
34603008
bytes.
test-fio-zram-842
197.
907655282 seconds time elapsed
201.
623142884 seconds time elapsed
226.
854291345 seconds time elapsed
test-fio-zram-DEFLATE
253.
259516155 seconds time elapsed
258.
148563401 seconds time elapsed
290.
251909365 seconds time elapsed
test-fio-zram-LZ4
27.
022598717 seconds time elapsed
29.
580522717 seconds time elapsed
33.
293463430 seconds time elapsed
test-fio-zram-LZ4HC
56.
393954615 seconds time elapsed
74.
904659747 seconds time elapsed
101.
940998564 seconds time elapsed
test-fio-zram-LZO
28.
155948075 seconds time elapsed
30.
390036330 seconds time elapsed
34.
455773159 seconds time elapsed
zram mm_stat-s (max fio jobs=3)
test-fio-zram-842
mm_stat (jobs1):
3221225472 673185792 690266112 0
690266112 0 0
mm_stat (jobs2):
3221225472 673185792 690266112 0
690266112 0 0
mm_stat (jobs3):
3221225472 673185792 690266112 0
690266112 0 0
test-fio-zram-DEFLATE
mm_stat (jobs1):
3221225472 24379392 37761024 0
37761024 0 0
mm_stat (jobs2):
3221225472 24379392 37761024 0
37761024 0 0
mm_stat (jobs3):
3221225472 24379392 37761024 0
37761024 0 0
test-fio-zram-LZ4
mm_stat (jobs1):
3221225472 23592960 37761024 0
37761024 0 0
mm_stat (jobs2):
3221225472 23592960 37761024 0
37761024 0 0
mm_stat (jobs3):
3221225472 23592960 37761024 0
37761024 0 0
test-fio-zram-LZ4HC
mm_stat (jobs1):
3221225472 23592960 37761024 0
37761024 0 0
mm_stat (jobs2):
3221225472 23592960 37761024 0
37761024 0 0
mm_stat (jobs3):
3221225472 23592960 37761024 0
37761024 0 0
test-fio-zram-LZO
mm_stat (jobs1):
3221225472 34603008 50335744 0
50335744 0 0
mm_stat (jobs2):
3221225472 34603008 50335744 0
50335744 0 0
mm_stat (jobs3):
3221225472 34603008 50335744 0
50339840 0 0
This patch (of 8):
We don't perform any zstream idle list lookup anymore, so
zcomp_strm_find()/zcomp_strm_release() names are not representative.
Rename to zcomp_stream_get()/zcomp_stream_put().
Link: http://lkml.kernel.org/r/20160531122017.2878-2-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aneesh Kumar K.V [Tue, 26 Jul 2016 22:22:39 +0000 (15:22 -0700)]
powerpc/mm: check for irq disabled() only if DEBUG_VM is enabled
We don't need to check this always. The idea here is to capture the
wrong usage of find_linux_pte_or_hugepte and we can do that by
occasionally running with DEBUG_VM enabled.
Link: http://lkml.kernel.org/r/1464692688-6612-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aneesh Kumar K.V [Tue, 26 Jul 2016 22:22:36 +0000 (15:22 -0700)]
include/linux/mmdebug.h: add VM_WARN which maps to WARN()
This enables us to do VM_WARN(condition, "warn message");
Link: http://lkml.kernel.org/r/1464692688-6612-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Tue, 26 Jul 2016 22:22:33 +0000 (15:22 -0700)]
mm: oom: add memcg to oom_control
It's a part of oom context just like allocation order and nodemask, so
let's move it to oom_control instead of passing it in the argument list.
Link: http://lkml.kernel.org/r/40e03fd7aaf1f55c75d787128d6d17c5a71226c2.1464358556.git.vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Tue, 26 Jul 2016 22:22:30 +0000 (15:22 -0700)]
mm: zap ZONE_OOM_LOCKED
Not used since oom_lock was instroduced.
Link: http://lkml.kernel.org/r/1464358093-22663-1-git-send-email-vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reza Arbab [Tue, 26 Jul 2016 22:22:27 +0000 (15:22 -0700)]
memory-hotplug: use zone_can_shift() for sysfs valid_zones attribute
Since zone_can_shift() is being used to validate the target zone during
onlining, it should also be used to determine the content of
valid_zones.
Link: http://lkml.kernel.org/r/1462816419-4479-4-git-send-email-arbab@linux.vnet.ibm.com
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Reviewd-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reza Arbab [Tue, 26 Jul 2016 22:22:23 +0000 (15:22 -0700)]
memory-hotplug: more general validation of zone during online
When memory is onlined, we are only able to rezone from ZONE_MOVABLE to
ZONE_KERNEL, or from (ZONE_MOVABLE - 1) to ZONE_MOVABLE.
To be more flexible, use the following criteria instead; to online
memory from zone X into zone Y,
* Any zones between X and Y must be unused.
* If X is lower than Y, the onlined memory must lie at the end of X.
* If X is higher than Y, the onlined memory must lie at the start of X.
Add zone_can_shift() to make this determination.
Link: http://lkml.kernel.org/r/1462816419-4479-3-git-send-email-arbab@linux.vnet.ibm.com
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Reviewd-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reza Arbab [Tue, 26 Jul 2016 22:22:20 +0000 (15:22 -0700)]
memory-hotplug: add move_pfn_range()
Add move_pfn_range(), a wrapper to call move_pfn_range_left() or
move_pfn_range_right().
No functional change. This will be utilized by a later patch.
Link: http://lkml.kernel.org/r/1462816419-4479-2-git-send-email-arbab@linux.vnet.ibm.com
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oliver O'Halloran [Tue, 26 Jul 2016 22:22:17 +0000 (15:22 -0700)]
mm/init: fix zone boundary creation
As a part of memory initialisation the architecture passes an array to
free_area_init_nodes() which specifies the max PFN of each memory zone.
This array is not necessarily monotonic (due to unused zones) so this
array is parsed to build monotonic lists of the min and max PFN for each
zone. ZONE_MOVABLE is special cased here as its limits are managed by
the mm subsystem rather than the architecture. Unfortunately, this
special casing is broken when ZONE_MOVABLE is the not the last zone in
the zone list. The core of the issue is:
if (i == ZONE_MOVABLE)
continue;
arch_zone_lowest_possible_pfn[i] =
arch_zone_highest_possible_pfn[i-1];
As ZONE_MOVABLE is skipped the lowest_possible_pfn of the next zone will
be set to zero. This patch fixes this bug by adding explicitly tracking
where the next zone should start rather than relying on the contents
arch_zone_highest_possible_pfn[].
Thie is low priority. To get bitten by this you need to enable a zone
that appears after ZONE_MOVABLE in the zone_type enum. As far as I can
tell this means running a kernel with ZONE_DEVICE or ZONE_CMA enabled,
so I can't see this affecting too many people.
I only noticed this because I've been fiddling with ZONE_DEVICE on
powerpc and 4.6 broke my test kernel. This bug, in conjunction with the
changes in Taku Izumi's kernelcore=mirror patch (
d91749c1dda71) and
powerpc being the odd architecture which initialises max_zone_pfn[] to
~0ul instead of 0 caused all of system memory to be placed into
ZONE_DEVICE at boot, followed a panic since device memory cannot be used
for kernel allocations. I've already submitted a patch to fix the
powerpc specific bits, but I figured this should be fixed too.
Link: http://lkml.kernel.org/r/1462435033-15601-1-git-send-email-oohall@gmail.com
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li RongQing [Tue, 26 Jul 2016 22:22:14 +0000 (15:22 -0700)]
mm/memcontrol.c: remove the useless parameter for mc_handle_swap_pte
It seems like this parameter has never been used since being introduced
by
90254a65833b ("memcg: clean up move charge"). Not a big deal because
I assume the function would get inlined into the caller anyway but why
not get rid of it.
[mhocko@suse.com: wrote changelog]
Link: http://lkml.kernel.org/r/20160525151831.GJ20132@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1464145026-26693-1-git-send-email-roy.qing.li@gmail.com
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wei Yongjun [Tue, 26 Jul 2016 22:22:11 +0000 (15:22 -0700)]
mm/slab: use list_move instead of list_del/list_add
Using list_move() instead of list_del() + list_add() to avoid needlessly
poisoning the next and prev values.
Link: http://lkml.kernel.org/r/1468929772-9174-1-git-send-email-weiyj_lk@163.com
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 26 Jul 2016 22:22:08 +0000 (15:22 -0700)]
mm: faster kmalloc_array(), kcalloc()
When both arguments to kmalloc_array() or kcalloc() are known at compile
time then their product is known at compile time but search for kmalloc
cache happens at runtime not at compile time.
Link: http://lkml.kernel.org/r/20160627213454.GA2440@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Tue, 26 Jul 2016 22:22:05 +0000 (15:22 -0700)]
slab: do not panic on invalid gfp_mask
Both SLAB and SLUB BUG() when a caller provides an invalid gfp_mask.
This is a rather harsh way to announce a non-critical issue. Allocator
is free to ignore invalid flags. Let's simply replace BUG() by
dump_stack to tell the offender and fixup the mask to move on with the
allocation request.
This is an example for kmalloc(GFP_KERNEL|__GFP_HIGHMEM) from a test
module:
Unexpected gfp: 0x2 (__GFP_HIGHMEM). Fixing up to gfp: 0x24000c0 (GFP_KERNEL). Fix your code!
CPU: 0 PID: 2916 Comm: insmod Tainted: G O
4.6.0-slabgfp2-00002-g4cdfc2ef4892-dirty #936
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
Call Trace:
dump_stack+0x67/0x90
cache_alloc_refill+0x201/0x617
kmem_cache_alloc_trace+0xa7/0x24a
? 0xffffffffa0005000
mymodule_init+0x20/0x1000 [test_slab]
do_one_initcall+0xe7/0x16c
? rcu_read_lock_sched_held+0x61/0x69
? kmem_cache_alloc_trace+0x197/0x24a
do_init_module+0x5f/0x1d9
load_module+0x1a3d/0x1f21
? retint_kernel+0x2d/0x2d
SyS_init_module+0xe8/0x10e
? SyS_init_module+0xe8/0x10e
do_syscall_64+0x68/0x13f
entry_SYSCALL64_slow_path+0x25/0x25
Link: http://lkml.kernel.org/r/1465548200-11384-2-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Tue, 26 Jul 2016 22:22:02 +0000 (15:22 -0700)]
slab: make GFP_SLAB_BUG_MASK information more human readable
printk offers %pGg for quite some time so let's use it to get a human
readable list of invalid flags.
The original output would be
[ 429.191962] gfp: 2
after the change
[ 429.191962] Unexpected gfp: 0x2 (__GFP_HIGHMEM)
Link: http://lkml.kernel.org/r/1465548200-11384-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Garnier [Tue, 26 Jul 2016 22:21:59 +0000 (15:21 -0700)]
mm: SLUB freelist randomization
Implements freelist randomization for the SLUB allocator. It was
previous implemented for the SLAB allocator. Both use the same
configuration option (CONFIG_SLAB_FREELIST_RANDOM).
The list is randomized during initialization of a new set of pages. The
order on different freelist sizes is pre-computed at boot for
performance. Each kmem_cache has its own randomized freelist.
This security feature reduces the predictability of the kernel SLUB
allocator against heap overflows rendering attacks much less stable.
For example these attacks exploit the predictability of the heap:
- Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
- Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)
Performance results:
slab_test impact is between 3% to 4% on average for 100000 attempts
without smp. It is a very focused testing, kernbench show the overall
impact on the system is way lower.
Before:
Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
100000 times kmalloc(8) -> 49 cycles kfree -> 77 cycles
100000 times kmalloc(16) -> 51 cycles kfree -> 79 cycles
100000 times kmalloc(32) -> 53 cycles kfree -> 83 cycles
100000 times kmalloc(64) -> 62 cycles kfree -> 90 cycles
100000 times kmalloc(128) -> 81 cycles kfree -> 97 cycles
100000 times kmalloc(256) -> 98 cycles kfree -> 121 cycles
100000 times kmalloc(512) -> 95 cycles kfree -> 122 cycles
100000 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles
100000 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles
100000 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles
2. Kmalloc: alloc/free test
100000 times kmalloc(8)/kfree -> 70 cycles
100000 times kmalloc(16)/kfree -> 70 cycles
100000 times kmalloc(32)/kfree -> 70 cycles
100000 times kmalloc(64)/kfree -> 70 cycles
100000 times kmalloc(128)/kfree -> 70 cycles
100000 times kmalloc(256)/kfree -> 69 cycles
100000 times kmalloc(512)/kfree -> 70 cycles
100000 times kmalloc(1024)/kfree -> 73 cycles
100000 times kmalloc(2048)/kfree -> 72 cycles
100000 times kmalloc(4096)/kfree -> 71 cycles
After:
Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
100000 times kmalloc(8) -> 57 cycles kfree -> 78 cycles
100000 times kmalloc(16) -> 61 cycles kfree -> 81 cycles
100000 times kmalloc(32) -> 76 cycles kfree -> 93 cycles
100000 times kmalloc(64) -> 83 cycles kfree -> 94 cycles
100000 times kmalloc(128) -> 106 cycles kfree -> 107 cycles
100000 times kmalloc(256) -> 118 cycles kfree -> 117 cycles
100000 times kmalloc(512) -> 114 cycles kfree -> 116 cycles
100000 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles
100000 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles
100000 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles
2. Kmalloc: alloc/free test
100000 times kmalloc(8)/kfree -> 66 cycles
100000 times kmalloc(16)/kfree -> 66 cycles
100000 times kmalloc(32)/kfree -> 66 cycles
100000 times kmalloc(64)/kfree -> 66 cycles
100000 times kmalloc(128)/kfree -> 65 cycles
100000 times kmalloc(256)/kfree -> 67 cycles
100000 times kmalloc(512)/kfree -> 67 cycles
100000 times kmalloc(1024)/kfree -> 64 cycles
100000 times kmalloc(2048)/kfree -> 67 cycles
100000 times kmalloc(4096)/kfree -> 67 cycles
Kernbench, before:
Average Optimal load -j 12 Run (std deviation):
Elapsed Time 101.873 (1.16069)
User Time 1045.22 (1.60447)
System Time 88.969 (0.559195)
Percent CPU 1112.9 (13.8279)
Context Switches 189140 (2282.15)
Sleeps 99008.6 (768.091)
After:
Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.47 (0.562732)
User Time 1045.3 (1.34263)
System Time 88.311 (0.342554)
Percent CPU 1105.8 (6.49444)
Context Switches 189081 (2355.78)
Sleeps 99231.5 (800.358)
Link: http://lkml.kernel.org/r/1464295031-26375-3-git-send-email-thgarnie@google.com
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Garnier [Tue, 26 Jul 2016 22:21:56 +0000 (15:21 -0700)]
mm: reorganize SLAB freelist randomization
The kernel heap allocators are using a sequential freelist making their
allocation predictable. This predictability makes kernel heap overflow
easier to exploit. An attacker can careful prepare the kernel heap to
control the following chunk overflowed.
For example these attacks exploit the predictability of the heap:
- Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
- Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)
***Problems that needed solving:
- Randomize the Freelist (singled linked) used in the SLUB allocator.
- Ensure good performance to encourage usage.
- Get best entropy in early boot stage.
***Parts:
- 01/02 Reorganize the SLAB Freelist randomization to share elements
with the SLUB implementation.
- 02/02 The SLUB Freelist randomization implementation. Similar approach
than the SLAB but tailored to the singled freelist used in SLUB.
***Performance data:
slab_test impact is between 3% to 4% on average for 100000 attempts
without smp. It is a very focused testing, kernbench show the overall
impact on the system is way lower.
Before:
Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
100000 times kmalloc(8) -> 49 cycles kfree -> 77 cycles
100000 times kmalloc(16) -> 51 cycles kfree -> 79 cycles
100000 times kmalloc(32) -> 53 cycles kfree -> 83 cycles
100000 times kmalloc(64) -> 62 cycles kfree -> 90 cycles
100000 times kmalloc(128) -> 81 cycles kfree -> 97 cycles
100000 times kmalloc(256) -> 98 cycles kfree -> 121 cycles
100000 times kmalloc(512) -> 95 cycles kfree -> 122 cycles
100000 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles
100000 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles
100000 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles
2. Kmalloc: alloc/free test
100000 times kmalloc(8)/kfree -> 70 cycles
100000 times kmalloc(16)/kfree -> 70 cycles
100000 times kmalloc(32)/kfree -> 70 cycles
100000 times kmalloc(64)/kfree -> 70 cycles
100000 times kmalloc(128)/kfree -> 70 cycles
100000 times kmalloc(256)/kfree -> 69 cycles
100000 times kmalloc(512)/kfree -> 70 cycles
100000 times kmalloc(1024)/kfree -> 73 cycles
100000 times kmalloc(2048)/kfree -> 72 cycles
100000 times kmalloc(4096)/kfree -> 71 cycles
After:
Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
100000 times kmalloc(8) -> 57 cycles kfree -> 78 cycles
100000 times kmalloc(16) -> 61 cycles kfree -> 81 cycles
100000 times kmalloc(32) -> 76 cycles kfree -> 93 cycles
100000 times kmalloc(64) -> 83 cycles kfree -> 94 cycles
100000 times kmalloc(128) -> 106 cycles kfree -> 107 cycles
100000 times kmalloc(256) -> 118 cycles kfree -> 117 cycles
100000 times kmalloc(512) -> 114 cycles kfree -> 116 cycles
100000 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles
100000 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles
100000 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles
2. Kmalloc: alloc/free test
100000 times kmalloc(8)/kfree -> 66 cycles
100000 times kmalloc(16)/kfree -> 66 cycles
100000 times kmalloc(32)/kfree -> 66 cycles
100000 times kmalloc(64)/kfree -> 66 cycles
100000 times kmalloc(128)/kfree -> 65 cycles
100000 times kmalloc(256)/kfree -> 67 cycles
100000 times kmalloc(512)/kfree -> 67 cycles
100000 times kmalloc(1024)/kfree -> 64 cycles
100000 times kmalloc(2048)/kfree -> 67 cycles
100000 times kmalloc(4096)/kfree -> 67 cycles
Kernbench, before:
Average Optimal load -j 12 Run (std deviation):
Elapsed Time 101.873 (1.16069)
User Time 1045.22 (1.60447)
System Time 88.969 (0.559195)
Percent CPU 1112.9 (13.8279)
Context Switches 189140 (2282.15)
Sleeps 99008.6 (768.091)
After:
Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.47 (0.562732)
User Time 1045.3 (1.34263)
System Time 88.311 (0.342554)
Percent CPU 1105.8 (6.49444)
Context Switches 189081 (2355.78)
Sleeps 99231.5 (800.358)
This patch (of 2):
This commit reorganizes the previous SLAB freelist randomization to
prepare for the SLUB implementation. It moves functions that will be
shared to slab_common.
The entropy functions are changed to align with the SLUB implementation,
now using get_random_(int|long) functions. These functions were chosen
because they provide a bit more entropy early on boot and better
performance when specific arch instructions are not available.
[akpm@linux-foundation.org: fix build]
Link: http://lkml.kernel.org/r/1464295031-26375-2-git-send-email-thgarnie@google.com
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Brian Foster [Tue, 26 Jul 2016 22:21:53 +0000 (15:21 -0700)]
fs/fs-writeback.c: inode writeback list tracking tracepoints
The per-sb inode writeback list tracks inodes currently under writeback
to facilitate efficient sync processing. In particular, it ensures that
sync only needs to walk through a list of inodes that were cleaned by
the sync.
Add a couple tracepoints to help identify when inodes are added/removed
to and from the writeback lists. Piggyback off of the writeback
lazytime tracepoint template as it already tracks the relevant inode
information.
Link: http://lkml.kernel.org/r/1466594593-6757-3-git-send-email-bfoster@redhat.com
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
cc: Josef Bacik <jbacik@fb.com>
Cc: Holger Hoffstätte <holger.hoffstaette@applied-asynchrony.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Chinner [Tue, 26 Jul 2016 22:21:50 +0000 (15:21 -0700)]
fs/fs-writeback.c: add a new writeback list for sync
wait_sb_inodes() currently does a walk of all inodes in the filesystem
to find dirty one to wait on during sync. This is highly inefficient
and wastes a lot of CPU when there are lots of clean cached inodes that
we don't need to wait on.
To avoid this "all inode" walk, we need to track inodes that are
currently under writeback that we need to wait for. We do this by
adding inodes to a writeback list on the sb when the mapping is first
tagged as having pages under writeback. wait_sb_inodes() can then walk
this list of "inodes under IO" and wait specifically just for the inodes
that the current sync(2) needs to wait for.
Define a couple helpers to add/remove an inode from the writeback list
and call them when the overall mapping is tagged for or cleared from
writeback. Update wait_sb_inodes() to walk only the inodes under
writeback due to the sync.
With this change, filesystem sync times are significantly reduced for
fs' with largely populated inode caches and otherwise no other work to
do. For example, on a 16xcpu 2GHz x86-64 server, 10TB XFS filesystem
with a ~10m entry inode cache, sync times are reduced from ~7.3s to less
than 0.1s when the filesystem is fully clean.
Link: http://lkml.kernel.org/r/1466594593-6757-2-git-send-email-bfoster@redhat.com
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Holger Hoffstätte <holger.hoffstaette@applied-asynchrony.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
piaojun [Tue, 26 Jul 2016 22:21:47 +0000 (15:21 -0700)]
ocfs2/cluster: clean up unnecessary assignment for 'ret'
Clean up unnecessary assignment for 'ret'.
Link: http://lkml.kernel.org/r/578C61F6.4080403@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joseph Qi [Tue, 26 Jul 2016 22:21:44 +0000 (15:21 -0700)]
ocfs2: remove obscure BUG_ON in dlmglue
These BUG_ON(!inode) are obscure because we have already used inode to
get osb. And actually we can guarantee here inode is valid in the
context. So we can safely remove them.
Link: http://lkml.kernel.org/r/5776336A.6030104@huawei.com
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Eric Ren <zren@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joseph Qi [Tue, 26 Jul 2016 22:21:40 +0000 (15:21 -0700)]
ocfs2: cleanup implemented prototypes
Several prototypes in inode.h are just defined but not actually
implemented and used, so remove them.
Link: http://lkml.kernel.org/r/57763787.4020706@huawei.com
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joseph Qi [Tue, 26 Jul 2016 22:21:38 +0000 (15:21 -0700)]
ocfs2/dlm: fix memory leak of dlm_debug_ctxt
dlm_debug_ctxt->debug_refcnt is initialized to 1 and then increased to 2
by dlm_debug_get in dlm_debug_init. But dlm_debug_put is called only
once in dlm_debug_shutdown during unregister dlm, which leads to
dlm_debug_ctxt leaked.
Link: http://lkml.kernel.org/r/577BB755.4030900@huawei.com
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joseph Qi [Tue, 26 Jul 2016 22:21:35 +0000 (15:21 -0700)]
ocfs2: cleanup unneeded goto in ocfs2_create_new_inode_locks
The last goto is unneeded, so remove it.
Link: http://lkml.kernel.org/r/576213D3.6080002@huawei.com
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Junxiao Bi [Tue, 26 Jul 2016 22:21:32 +0000 (15:21 -0700)]
ocfs2: improve recovery performance
Journal replay will be run when performing recovery for a dead node. To
avoid the stale cache impact, all blocks of dead node's journal inode
were reloaded from disk. This hurts the performance. Check whether one
block is cached before reloading it can improve performance a lot. In
my test env, the time doing recovery was improved from 120s to 1s.
[akpm@linux-foundation.org: clean up the for loop p_blkno handling]
Link: http://lkml.kernel.org/r/1466155682-24656-1-git-send-email-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: "Gang He" <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Ren [Tue, 26 Jul 2016 22:21:29 +0000 (15:21 -0700)]
ocfs2: fix a redundant re-initialization
Obviously, memset() has zeroed the whole struct locking_max_version.
So, it's no need to zero its two fields individually.
Link: http://lkml.kernel.org/r/1463970605-18354-1-git-send-email-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Tue, 26 Jul 2016 22:21:26 +0000 (15:21 -0700)]
debugobjects.h: fix trivial kernel doc warning
Add ':' to fix trivial kernel-doc warning in <linux/debugobjects.h>:
..//include/linux/debugobjects.h:63: warning: No description found for parameter 'is_static_object'
Link: http://lkml.kernel.org/r/575B01B8.5060600@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sudip Mukherjee [Tue, 26 Jul 2016 22:21:23 +0000 (15:21 -0700)]
m32r: add __ucmpdi2 to fix build failure
We are having build failure with m32r and the error message being:
ERROR: "__ucmpdi2" [lib/842/842_decompress.ko] undefined!
ERROR: "__ucmpdi2" [fs/btrfs/btrfs.ko] undefined!
ERROR: "__ucmpdi2" [drivers/scsi/sd_mod.ko] undefined!
ERROR: "__ucmpdi2" [drivers/media/i2c/adv7842.ko] undefined!
ERROR: "__ucmpdi2" [drivers/md/bcache/bcache.ko] undefined!
ERROR: "__ucmpdi2" [drivers/iio/imu/inv_mpu6050/inv-mpu6050.ko] undefined!
__ucmpdi2 is introduced to m32r architecture taking example from other
architectures like h8300, microblaze, mips.
Link: http://lkml.kernel.org/r/1465509213-4280-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Riku Voipio [Tue, 26 Jul 2016 22:21:20 +0000 (15:21 -0700)]
scripts/bloat-o-meter: fix percent on <1% changes
Python divisions are integer divisions unless at least one parameter is
a float. The current bloat-o-meter fails to print sub-percentage
changes:
Total: Before=
10515408, After=
10604060, chg 0.000000%
Force float division by using one float and pretty the print to two
significant decimals:
Total: Before=
10515408, After=
10604060, chg +0.84%
Link: http://lkml.kernel.org/r/1465980311-23814-1-git-send-email-riku.voipio@linaro.org
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Tue, 26 Jul 2016 22:21:17 +0000 (15:21 -0700)]
kbuild: abort build on bad stack protector flag
Before, the stack protector flag was sanity checked before .config had
been reprocessed. This meant the build couldn't be aborted early, and
only a warning could be emitted followed later by the compiler blowing
up with an unknown flag. This has caused a lot of confusion over time,
so this splits the flag selection from sanity checking and performs the
sanity checking after the make has been restarted from a reprocessed
.config, so builds can be aborted as early as possible now.
Additionally moves the x86-specific sanity check to the same location,
since it suffered from the same warn-then-wait-for-compiler-failure
problem.
Link: http://lkml.kernel.org/r/20160712223043.GA11664@www.outflux.net
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Tue, 26 Jul 2016 22:21:11 +0000 (15:21 -0700)]
fbmon: remove unused function argument
When building with "make W=1", we get a warning about an empty stub
function that does nothing but reassign its one of its arguments:
drivers/video/fbdev/core/fbmon.c: In function 'fb_edid_to_monspecs':
drivers/video/fbdev/core/fbmon.c:1497:67: error: parameter 'specs' set but not used [-Werror=unused-but-set-parameter]
We can simply make that function completely empty to avoid the warning.
This prevents a warning which everyone will see after "CFLAGS: add
-Wunused-but-set-parameter" is merged.
Link: http://lkml.kernel.org/r/20160715203229.1771162-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Boyd [Tue, 26 Jul 2016 22:21:08 +0000 (15:21 -0700)]
dma-debug: track bucket lock state for static checkers
get_hash_bucket() and put_hash_bucket() acquire and release the same
spinlock, but this confuses static checkers such as sparse
lib/dma-debug.c:254:27: warning: context imbalance in 'get_hash_bucket' - wrong count at exit
lib/dma-debug.c:268:13: warning: context imbalance in 'put_hash_bucket' - unexpected unlock
Add the appropriate acquire and release statements so that checkers can
properly track the lock state.
Link: http://lkml.kernel.org/r/20160701191552.24295-1-sboyd@codeaurora.org
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ross Zwisler [Tue, 26 Jul 2016 22:21:05 +0000 (15:21 -0700)]
dax: remote unused fault wrappers
Remove the unused wrappers dax_fault() and dax_pmd_fault(). After this
removal, rename __dax_fault() and __dax_pmd_fault() to dax_fault() and
dax_pmd_fault() respectively, and update all callers.
The dax_fault() and dax_pmd_fault() wrappers were initially intended to
capture some filesystem independent functionality around page faults
(calling sb_start_pagefault() & sb_end_pagefault(), updating file mtime
and ctime).
However, the following commits:
5726b27b09cc ("ext2: Add locking for DAX faults")
ea3d7209ca01 ("ext4: fix races between page faults and hole punching")
added locking to the ext2 and ext4 filesystems after these common
operations but before __dax_fault() and __dax_pmd_fault() were called.
This means that these wrappers are no longer used, and are unlikely to
be used in the future.
XFS has had locking analogous to what was recently added to ext2 and
ext4 since DAX support was initially introduced by:
6b698edeeef0 ("xfs: add DAX file operations support")
Link: http://lkml.kernel.org/r/20160714214049.20075-2-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ross Zwisler [Tue, 26 Jul 2016 22:21:02 +0000 (15:21 -0700)]
dax: some small updates to dax.txt documentation
These are originally from Matthew Wilcox and were part of his huge
"mm,fs,dax: Change ->pmd_fault to ->huge_fault" patch that was part of
PUD support.
I'm breaking these small changes out as they stand on their own and add
useful information to Documentation/filesystems/dax.txt.
Link: http://lkml.kernel.org/r/20160714214049.20075-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Tue, 26 Jul 2016 22:20:59 +0000 (15:20 -0700)]
arm: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.
PGALLOC_GFP uses __GFP_REPEAT but none of the allocation which uses this
flag is for more than order-2. This means that this flag has never been
actually useful here because it has always been used only for
PAGE_ALLOC_COSTLY requests.
Link: http://lkml.kernel.org/r/1464599699-30131-5-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 26 Jul 2016 22:37:51 +0000 (15:37 -0700)]
Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
"This branch also contains core changes. I've come to the conclusion
that from 4.9 and forward, I'll be doing just a single branch. We
often have dependencies between core and drivers, and it's hard to
always split them up appropriately without pulling core into drivers
when that happens.
That said, this contains:
- separate secure erase type for the core block layer, from
Christoph.
- set of discard fixes, from Christoph.
- bio shrinking fixes from Christoph, as a followup up to the
op/flags change in the core branch.
- map and append request fixes from Christoph.
- NVMeF (NVMe over Fabrics) code from Christoph. This is pretty
exciting!
- nvme-loop fixes from Arnd.
- removal of ->driverfs_dev from Dan, after providing a
device_add_disk() helper.
- bcache fixes from Bhaktipriya and Yijing.
- cdrom subchannel read fix from Vchannaiah.
- set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.
- set of drbd updates and fixes from Fabian, Lars, and Philipp.
- mg_disk error path fix from Bart.
- user notification for failed device add for loop, from Minfei.
- NVMe in general:
+ NVMe delay quirk from Guilherme.
+ SR-IOV support and command retry limits from Keith.
+ fix for memory-less NUMA node from Masayoshi.
+ use UINT_MAX for discard sectors, from Minfei.
+ cancel IO fixes from Ming.
+ don't allocate unused major, from Neil.
+ error code fixup from Dan.
+ use constants for PSDT/FUSE from James.
+ variable init fix from Jay.
+ fabrics fixes from Ming, Sagi, and Wei.
+ various fixes"
* 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
nvme/pci: Provide SR-IOV support
nvme: initialize variable before logical OR'ing it
block: unexport various bio mapping helpers
scsi/osd: open code blk_make_request
target: stop using blk_make_request
block: simplify and export blk_rq_append_bio
block: ensure bios return from blk_get_request are properly initialized
virtio_blk: use blk_rq_map_kern
memstick: don't allow REQ_TYPE_BLOCK_PC requests
block: shrink bio size again
block: simplify and cleanup bvec pool handling
block: get rid of bio_rw and READA
block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
NVMe: don't allocate unused nvme_major
nvme: avoid crashes when node 0 is memoryless node.
nvme: Limit command retries
loop: Make user notify for adding loop device failed
nvme-loop: fix nvme-loop Kconfig dependencies
nvmet: fix return value check in nvmet_subsys_alloc()
...
Linus Torvalds [Tue, 26 Jul 2016 22:03:07 +0000 (15:03 -0700)]
Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:
- the big change is the cleanup from Mike Christie, cleaning up our
uses of command types and modified flags. This is what will throw
some merge conflicts
- regression fix for the above for btrfs, from Vincent
- following up to the above, better packing of struct request from
Christoph
- a 2038 fix for blktrace from Arnd
- a few trivial/spelling fixes from Bart Van Assche
- a front merge check fix from Damien, which could cause issues on
SMR drives
- Atari partition fix from Gabriel
- convert cfq to highres timers, since jiffies isn't granular enough
for some devices these days. From Jan and Jeff
- CFQ priority boost fix idle classes, from me
- cleanup series from Ming, improving our bio/bvec iteration
- a direct issue fix for blk-mq from Omar
- fix for plug merging not involving the IO scheduler, like we do for
other types of merges. From Tahsin
- expose DAX type internally and through sysfs. From Toshi and Yigal
* 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
block: Fix front merge check
block: do not merge requests without consulting with io scheduler
block: Fix spelling in a source code comment
block: expose QUEUE_FLAG_DAX in sysfs
block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
Btrfs: fix comparison in __btrfs_map_block()
block: atari: Return early for unsupported sector size
Doc: block: Fix a typo in queue-sysfs.txt
cfq-iosched: Charge at least 1 jiffie instead of 1 ns
cfq-iosched: Fix regression in bonnie++ rewrite performance
cfq-iosched: Convert slice_resid from u64 to s64
block: Convert fifo_time from ulong to u64
blktrace: avoid using timespec
block/blk-cgroup.c: Declare local symbols static
block/bio-integrity.c: Add #include "blk.h"
block/partition-generic.c: Remove a set-but-not-used variable
block: bio: kill BIO_MAX_SIZE
cfq-iosched: temporarily boost queue priority for idle classes
block: drbd: avoid to use BIO_MAX_SIZE
block: bio: remove BIO_MAX_SECTORS
...
Linus Torvalds [Tue, 26 Jul 2016 21:39:40 +0000 (14:39 -0700)]
Merge branch 'for-4.8' of git://git./linux/kernel/git/tj/libata
Pull libata updates from Tejun Heo:
"libata saw quite a bit of activities in this cycle:
- SMR drive support still being worked on
- bug fixes and improvements to misc SCSI command emulation
- some low level driver updates"
* 'for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (39 commits)
libata-scsi: better style in ata_msense_*()
AHCI: Clear GHC.IS to prevent unexpectly asserting INTx
ata: sata_dwc_460ex: remove redundant dev_err call
ata: define ATA_PROT_* in terms of ATA_PROT_FLAG_*
libata: remove ATA_PROT_FLAG_DATA
libata: remove ata_is_nodata
ata: make lba_{28,48}_ok() use ATA_MAX_SECTORS{,_LBA48}
libata-scsi: minor cleanup for ata_scsi_zbc_out_xlat
libata-scsi: Fix ZBC management out command translation
libata-scsi: Fix translation of REPORT ZONES command
ata: Handle ATA NCQ NO-DATA commands correctly
libata-eh: decode all taskfile protocols
ata: fixup ATA_PROT_NODATA
libsas: use ata_is_ncq() and ata_has_dma() accessors
libata: use ata_is_ncq() accessors
libata: return boolean values from ata_is_*
libata-scsi: avoid repeated calculation of number of TRIM ranges
libata-scsi: reject WRITE SAME (16) with n_block that exceeds limit
libata-scsi: rename ata_msense_ctl_mode() to ata_msense_control()
libata-scsi: fix D_SENSE bit relection in control mode page
...
Linus Torvalds [Tue, 26 Jul 2016 21:34:17 +0000 (14:34 -0700)]
Merge branch 'for-4.8' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"Nothing too exciting.
- updates to the pids controller so that pid limit breaches can be
noticed and monitored from userland.
- cleanups and non-critical bug fixes"
* 'for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: remove duplicated include from cgroup.c
cgroup: Use lld instead of ld when printing pids controller events_limit
cgroup: Add pids controller event when fork fails because of pid limit
cgroup: allow NULL return from ss->css_alloc()
cgroup: remove unnecessary 0 check from css_from_id()
cgroup: fix idr leak for the first cgroup root
Linus Torvalds [Tue, 26 Jul 2016 20:40:17 +0000 (13:40 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"Here is the crypto update for 4.8:
API:
- first part of skcipher low-level conversions
- add KPP (Key-agreement Protocol Primitives) interface.
Algorithms:
- fix IPsec/cryptd reordering issues that affects aesni
- RSA no longer does explicit leading zero removal
- add SHA3
- add DH
- add ECDH
- improve DRBG performance by not doing CTR by hand
Drivers:
- add x86 AVX2 multibuffer SHA256/512
- add POWER8 optimised crc32c
- add xts support to vmx
- add DH support to qat
- add RSA support to caam
- add Layerscape support to caam
- add SEC1 AEAD support to talitos
- improve performance by chaining requests in marvell/cesa
- add support for Araneus Alea I USB RNG
- add support for Broadcom BCM5301 RNG
- add support for Amlogic Meson RNG
- add support Broadcom NSP SoC RNG"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (180 commits)
crypto: vmx - Fix aes_p8_xts_decrypt build failure
crypto: vmx - Ignore generated files
crypto: vmx - Adding support for XTS
crypto: vmx - Adding asm subroutines for XTS
crypto: skcipher - add comment for skcipher_alg->base
crypto: testmgr - Print akcipher algorithm name
crypto: marvell - Fix wrong flag used for GFP in mv_cesa_dma_add_iv_op
crypto: nx - off by one bug in nx_of_update_msc()
crypto: rsa-pkcs1pad - fix rsa-pkcs1pad request struct
crypto: scatterwalk - Inline start/map/done
crypto: scatterwalk - Remove unnecessary BUG in scatterwalk_start
crypto: scatterwalk - Remove unnecessary advance in scatterwalk_pagedone
crypto: scatterwalk - Fix test in scatterwalk_done
crypto: api - Optimise away crypto_yield when hard preemption is on
crypto: scatterwalk - add no-copy support to copychunks
crypto: scatterwalk - Remove scatterwalk_bytes_sglen
crypto: omap - Stop using crypto scatterwalk_bytes_sglen
crypto: skcipher - Remove top-level givcipher interface
crypto: user - Remove crypto_lookup_skcipher call
crypto: cts - Convert to skcipher
...
Linus Torvalds [Tue, 26 Jul 2016 20:05:11 +0000 (13:05 -0700)]
Merge tag 'docs-for-linus' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"Some big changes this month, headlined by the addition of a new
formatted documentation mechanism based on the Sphinx system.
The objectives here are to make it easier to create better-integrated
(and more attractive) documents while (eventually) dumping our
one-of-a-kind, cobbled-together system for something that is widely
used and maintained by others. There's a fair amount of information
what's being done, why, and how to use it in:
https://lwn.net/Articles/692704/
https://lwn.net/Articles/692705/
Closer to home, Documentation/kernel-documentation.rst describes how
it works.
For now, the new system exists alongside the old one; you should soon
see the GPU documentation converted over in the DRM pull and some
significant media conversion work as well. Once all the docs have
been moved over and we're convinced that the rough edges (of which are
are a few) have been smoothed over, the DocBook-based stuff should go
away.
Primary credit is to Jani Nikula for doing the heavy lifting to make
this stuff actually work; there has also been notable effort from
Markus Heiser, Daniel Vetter, and Mauro Carvalho Chehab.
Expect a couple of conflicts on the new index.rst file over the course
of the merge window; they are trivially resolvable. That file may be
a bit of a conflict magnet in the short term, but I don't expect that
situation to last for any real length of time.
Beyond that, of course, we have the usual collection of tweaks,
updates, and typo fixes"
* tag 'docs-for-linus' of git://git.lwn.net/linux: (77 commits)
doc-rst: kernel-doc: fix handling of address_space tags
Revert "doc/sphinx: Enable keep_warnings"
doc-rst: kernel-doc directive, fix state machine reporter
docs: deprecate kernel-doc-nano-HOWTO.txt
doc/sphinx: Enable keep_warnings
Documentation: add watermark_scale_factor to the list of vm systcl file
kernel-doc: Fix up warning output
docs: Get rid of some kernel-documentation warnings
doc-rst: add an option to ignore DocBooks when generating docs
workqueue: Fix a typo in workqueue.txt
Doc: ocfs: Fix typo in filesystems/ocfs2-online-filecheck.txt
Documentation/sphinx: skip build if user requested specific DOCBOOKS
Documentation: add cleanmediadocs to the documentation targets
Add .pyc files to .gitignore
Doc: PM: Fix a typo in intel_powerclamp.txt
doc-rst: flat-table directive - initial implementation
Documentation: add meta-documentation for Sphinx and kernel-doc
Documentation: tiny typo fix in usb/gadget_multi.txt
Documentation: fix wrong value in md.txt
bcache: documentation formatting, edited for clarity, stripe alignment notes
...
Linus Torvalds [Tue, 26 Jul 2016 19:22:51 +0000 (12:22 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
"There are a couple of new things for s390 with this merge request:
- a new scheduling domain "drawer" is added to reflect the unusual
topology found on z13 machines. Performance tests showed up to 8
percent gain with the additional domain.
- the new crc-32 checksum crypto module uses the vector-galois-field
multiply and sum SIMD instruction to speed up crc-32 and crc-32c.
- proper __ro_after_init support, this requires RO_AFTER_INIT_DATA in
the generic vmlinux.lds linker script definitions.
- kcov instrumentation support. A prerequisite for that is the
inline assembly basic block cleanup, which is the reason for the
net/iucv/iucv.c change.
- support for 2GB pages is added to the hugetlbfs backend.
Then there are two removals:
- the oprofile hardware sampling support is dead code and is removed.
The oprofile user space uses the perf interface nowadays.
- the ETR clock synchronization is removed, this has been superseeded
be the STP clock synchronization. And it always has been
"interesting" code..
And the usual bug fixes and cleanups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (82 commits)
s390/pci: Delete an unnecessary check before the function call "pci_dev_put"
s390/smp: clean up a condition
s390/cio/chp : Remove deprecated create_singlethread_workqueue
s390/chsc: improve channel path descriptor determination
s390/chsc: sanitize fmt check for chp_desc determination
s390/cio: make fmt1 channel path descriptor optional
s390/chsc: fix ioctl CHSC_INFO_CU command
s390/cio/device_ops: fix kernel doc
s390/cio: allow to reset channel measurement block
s390/console: Make preferred console handling more consistent
s390/mm: fix gmap tlb flush issues
s390/mm: add support for 2GB hugepages
s390: have unique symbol for __switch_to address
s390/cpuinfo: show maximum thread id
s390/ptrace: clarify bits in the per_struct
s390: stack address vs thread_info
s390: remove pointless load within __switch_to
s390: enable kcov support
s390/cpumf: use basic block for ecctr inline assembly
s390/hypfs: use basic block for diag inline assembly
...
Linus Torvalds [Tue, 26 Jul 2016 18:50:42 +0000 (11:50 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM leftovers from Radim Krčmář:
"This is a combination of two pull requests for 4.7-rc8 that were not
merged due to looking hairy. I have changed the tag message to focus
on circumstances of contained reverts as they were likely the reason
behind rejection.
This merge introduces three patches that are later reverted,
- Switching of MSR_TSC_AUX in SVM was thought to cause a host
misbehavior, but it was later cleared of those doubts and the patch
moved code to a hot path, so we reverted it. That patch also
needed a fix for 32 bit builds and both were reverted in one go.
- Al Viro noticed that a fix for a leak in an error path was not
valid with the given API and provided a better fix, so the original
patch was reverted.
Then there are two VMX fixes that move code around because VMCS was
not accessed between vcpu_load() and vcpu_put(), a simple ARM VHE fix,
and two one-liners for PML and MTRR"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
arm64: KVM: VHE: Context switch MDSCR_EL1
KVM: VMX: handle PML full VMEXIT that occurs during event delivery
Revert "KVM: SVM: fix trashing of MSR_TSC_AUX"
KVM: SVM: do not set MSR_TSC_AUX on 32-bit builds
KVM: don't use anon_inode_getfd() before possible failures
Revert "KVM: release anon file in failure path of vm creation"
KVM: release anon file in failure path of vm creation
KVM: nVMX: Fix memory corruption when using VMCS shadowing
kvm: vmx: ensure VMCS is current while enabling PML
KVM: SVM: fix trashing of MSR_TSC_AUX
KVM: MTRR: fix kvm_mtrr_check_gfn_range_consistency page fault
Linus Torvalds [Tue, 26 Jul 2016 17:26:29 +0000 (10:26 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"This tree contains tooling fixes plus some additions:
- fixes to the vdso2c build environment that Stephen Rothwell is
using for the linux-next build (Arnaldo Carvalho de Melo)
- AVX-512 instruction mappings (Adrian Hunter)
- misc fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "perf tools: event.h needs asm/perf_regs.h"
x86: Make the vdso2c compiler use the host architecture headers
tools build: Fix objtool build with ARCH=x86_64
objtool: Always use host headers
objtool: Use tools/scripts/Makefile.arch to get ARCH and HOSTARCH
tools build: Add HOSTARCH Makefile variable
perf tests kmod-path: Fix build on ubuntu:16.04-x-armhf
perf tools: Add AVX-512 instructions to the new instructions test
perf tools: Add AVX-512 support to the instruction decoder used by Intel PT
x86/insn: Add AVX-512 support to the instruction decoder
x86/insn: perf tools: Fix vcvtph2ps instruction decoding
Linus Torvalds [Tue, 26 Jul 2016 04:35:03 +0000 (21:35 -0700)]
Merge branch 'irq-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"The irq department delivers:
- new core infrastructure to allow better management of multi-queue
devices (interrupt spreading, node aware descriptor allocation ...)
- a new interrupt flow handler to support the new fangled Intel VMD
devices.
- yet another new interrupt controller driver.
- a series of fixes which addresses sparse warnings, missing
includes, missing static declarations etc from Ben Dooks.
- a fix for the error handling in the hierarchical domain allocation
code.
- the usual pile of small updates to core and driver code"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
genirq: Fix missing irq allocation affinity hint
irqdomain: Fix irq_domain_alloc_irqs_recursive() error handling
irq/Documentation: Correct result of echnoing 5 to smp_affinity
MAINTAINERS: Remove Jiang Liu from irq domains
genirq/msi: Fix broken debug output
genirq: Add a helper to spread an affinity mask for MSI/MSI-X vectors
genirq/msi: Make use of affinity aware allocations
genirq: Use affinity hint in irqdesc allocation
genirq: Add affinity hint to irq allocation
genirq: Introduce IRQD_AFFINITY_MANAGED flag
genirq/msi: Remove unused MSI_FLAG_IDENTITY_MAP
irqchip/s3c24xx: Fixup IO accessors for big endian
irqchip/exynos-combiner: Fix usage of __raw IO
irqdomain: Fix disposal of mappings for interrupt hierarchies
irqchip/aspeed-vic: Add irq controller for Aspeed
doc/devicetree: Add Aspeed VIC bindings
x86/PCI/VMD: Use untracked irq handler
genirq: Add untracked irq handler
irqchip/mips-gic: Populate irq_domain names
irqchip/gicv3-its: Implement two-level(indirect) device table support
...
Linus Torvalds [Tue, 26 Jul 2016 03:43:12 +0000 (20:43 -0700)]
Merge branch 'timers-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"This update provides the following changes:
- The rework of the timer wheel which addresses the shortcomings of
the current wheel (cascading, slow search for next expiring timer,
etc). That's the first major change of the wheel in almost 20
years since Finn implemted it.
- A large overhaul of the clocksource drivers init functions to
consolidate the Device Tree initialization
- Some more Y2038 updates
- A capability fix for timerfd
- Yet another clock chip driver
- The usual pile of updates, comment improvements all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
tick/nohz: Optimize nohz idle enter
clockevents: Make clockevents_subsys static
clocksource/drivers/time-armada-370-xp: Fix return value check
timers: Implement optimization for same expiry time in mod_timer()
timers: Split out index calculation
timers: Only wake softirq if necessary
timers: Forward the wheel clock whenever possible
timers/nohz: Remove pointless tick_nohz_kick_tick() function
timers: Optimize collect_expired_timers() for NOHZ
timers: Move __run_timers() function
timers: Remove set_timer_slack() leftovers
timers: Switch to a non-cascading wheel
timers: Reduce the CPU index space to 256k
timers: Give a few structs and members proper names
hlist: Add hlist_is_singular_node() helper
signals: Use hrtimer for sigtimedwait()
timers: Remove the deprecated mod_timer_pinned() API
timers, net/ipv4/inet: Initialize connection request timers as pinned
timers, drivers/tty/mips_ejtag: Initialize the poll timer as pinned
timers, drivers/tty/metag_da: Initialize the poll timer as pinned
...
Linus Torvalds [Tue, 26 Jul 2016 03:06:38 +0000 (20:06 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
"Leftover fix from the v4.7 cycle: adds a reboot quirk"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/reboot: Add Dell Optiplex 7450 AIO reboot quirk
Linus Torvalds [Tue, 26 Jul 2016 02:41:35 +0000 (19:41 -0700)]
Merge branch 'x86-timers-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 timer updates from Ingo Molnar:
"The main change in this tree is the reworking, fixing and extension of
the TSC frequency enumeration code (by Len Brown)"
* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsc: Remove the unused check_tsc_disabled()
x86/tsc: Enumerate BXT tsc_khz via CPUID
x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID
x86/tsc_msr: Remove irqoff around MSR-based TSC enumeration
x86/tsc_msr: Add Airmont reference clock values
x86/tsc_msr: Correct Silvermont reference clock values
x86/tsc_msr: Update comments, expand definitions
x86/tsc_msr: Remove debugging messages
x86/tsc_msr: Identify Intel-specific code
Revert "x86/tsc: Add missing Cherrytrail frequency to the table"
Linus Torvalds [Tue, 26 Jul 2016 02:15:35 +0000 (19:15 -0700)]
Merge branch 'x86-platform-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
"The main changes in this cycle were:
- Intel-SoC enhancements (Andy Shevchenko)
- Intel CPU symbolic model definition rework (Dave Hansen)
- ... other misc changes"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
x86/sfi: Enable enumeration of SD devices
x86/pci: Use MRFLD abbreviation for Merrifield
x86/platform/intel-mid: Make vertical indentation consistent
x86/platform/intel-mid: Mark regulators explicitly defined
x86/platform/intel-mid: Rename mrfl.c to mrfld.c
x86/platform/intel-mid: Enable spidev on Intel Edison boards
x86/platform/intel-mid: Extend PWRMU to support Penwell
x86/pci, x86/platform/intel_mid_pci: Remove duplicate power off code
x86/platform/intel-mid: Add pinctrl for Intel Merrifield
x86/platform/intel-mid: Enable GPIO expanders on Edison
x86/platform/intel-mid: Add Power Management Unit driver
x86/platform/atom/punit: Enable support for Merrifield
x86/platform/intel_mid_pci: Rework IRQ0 workaround
x86, thermal: Clean up and fix CPU model detection for intel_soc_dts_thermal
x86, mmc: Use Intel family name macros for mmc driver
x86/intel_telemetry: Use Intel family name macros for telemetry driver
x86/acpi/lss: Use Intel family name macros for the acpi_lpss driver
x86/cpufreq: Use Intel family name macros for the intel_pstate cpufreq driver
x86/platform: Use new Intel model number macros
x86/intel_idle: Use Intel family macros for intel_idle
...
Linus Torvalds [Tue, 26 Jul 2016 01:48:27 +0000 (18:48 -0700)]
Merge branch 'x86-fpu-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
"The main x86 FPU changes in this cycle were:
- a large series of cleanups, fixes and enhancements to re-enable the
XSAVES instruction on Intel CPUs - which is the most advanced
instruction to do FPU context switches (Yu-cheng Yu, Fenghua Yu)
- Add FPU tracepoints for the FPU state machine (Dave Hansen)"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Do not BUG_ON() in early FPU code
x86/fpu/xstate: Re-enable XSAVES
x86/fpu/xstate: Fix fpstate_init() for XRSTORS
x86/fpu/xstate: Return NULL for disabled xstate component address
x86/fpu/xstate: Fix __fpu_restore_sig() for XSAVES
x86/fpu/xstate: Fix xstate_offsets, xstate_sizes for non-extended xstates
x86/fpu/xstate: Fix XSTATE component offset print out
x86/fpu/xstate: Fix PTRACE frames for XSAVES
x86/fpu/xstate: Fix supervisor xstate component offset
x86/fpu/xstate: Align xstate components according to CPUID
x86/fpu/xstate: Copy xstate registers directly to the signal frame when compacted format is in use
x86/fpu/xstate: Keep init_fpstate.xsave.header.xfeatures as zero for init optimization
x86/fpu/xstate: Rename 'xstate_size' to 'fpu_kernel_xstate_size', to distinguish it from 'fpu_user_xstate_size'
x86/fpu/xstate: Define and use 'fpu_user_xstate_size'
x86/fpu: Add tracepoints to dump FPU state at key points
Linus Torvalds [Tue, 26 Jul 2016 01:18:04 +0000 (18:18 -0700)]
Merge branch 'x86-debug-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 stackdump update from Ingo Molnar:
"A number of stackdump enhancements"
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/dumpstack: Add show_stack_regs() and use it
printk: Make the printk*once() variants return a value
x86/dumpstack: Honor supplied @regs arg
Linus Torvalds [Tue, 26 Jul 2016 01:06:00 +0000 (18:06 -0700)]
Merge branch 'x86-cleanups-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
"Three small cleanups"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lguest: Read offset of device_cap later
lguest: Read length of device_cap later
x86: Do away with ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB
Linus Torvalds [Tue, 26 Jul 2016 01:00:18 +0000 (18:00 -0700)]
Merge branch 'x86-build-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 build updates from Ingo Molnar:
"A build system fix and a cleanup"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kbuild: Remove stale asm-generic wrappers
kbuild, x86: Track generated headers with generated-y
Linus Torvalds [Tue, 26 Jul 2016 00:32:28 +0000 (17:32 -0700)]
Merge branch 'x86-boot-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
"The main changes:
- add initial commits to randomize kernel memory section virtual
addresses, enabled via a new kernel option: RANDOMIZE_MEMORY
(Thomas Garnier, Kees Cook, Baoquan He, Yinghai Lu)
- enhance KASLR (RANDOMIZE_BASE) physical memory randomization (Kees
Cook)
- EBDA/BIOS region boot quirk cleanups (Andy Lutomirski, Ingo Molnar)
- misc cleanups/fixes"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Simplify EBDA-vs-BIOS reservation logic
x86/boot: Clarify what x86_legacy_features.reserve_bios_regions does
x86/boot: Reorganize and clean up the BIOS area reservation code
x86/mm: Do not reference phys addr beyond kernel
x86/mm: Add memory hotplug support for KASLR memory randomization
x86/mm: Enable KASLR for vmalloc memory regions
x86/mm: Enable KASLR for physical mapping memory regions
x86/mm: Implement ASLR for kernel memory regions
x86/mm: Separate variable for trampoline PGD
x86/mm: Add PUD VA support for physical mapping
x86/mm: Update physical mapping variable names
x86/mm: Refactor KASLR entropy functions
x86/KASLR: Fix boot crash with certain memory configurations
x86/boot/64: Add forgotten end of function marker
x86/KASLR: Allow randomization below the load address
x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G
x86/KASLR: Randomize virtual address separately
x86/KASLR: Clarify identity map interface
x86/boot: Refuse to build with data relocations
x86/KASLR, x86/power: Remove x86 hibernation restrictions
Linus Torvalds [Mon, 25 Jul 2016 22:34:18 +0000 (15:34 -0700)]
Merge branch 'x86-mm-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
"Various x86 low level modifications:
- preparatory work to support virtually mapped kernel stacks (Andy
Lutomirski)
- support for 64-bit __get_user() on 32-bit kernels (Benjamin
LaHaise)
- (involved) workaround for Knights Landing CPU erratum (Dave Hansen)
- MPX enhancements (Dave Hansen)
- mremap() extension to allow remapping of the special VDSO vma, for
purposes of user level context save/restore (Dmitry Safonov)
- hweight and entry code cleanups (Borislav Petkov)
- bitops code generation optimizations and cleanups with modern GCC
(H. Peter Anvin)
- syscall entry code optimizations (Paolo Bonzini)"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
x86/mm/cpa: Add missing comment in populate_pdg()
x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs
x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
x86/smp: Remove unnecessary initialization of thread_info::cpu
x86/smp: Remove stack_smp_processor_id()
x86/uaccess: Move thread_info::addr_limit to thread_struct
x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err
x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct
x86/dumpstack: When OOPSing, rewind the stack before do_exit()
x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm
x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS
x86/dumpstack: Try harder to get a call trace on stack overflow
x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()
x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated
x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()
x86/mm: Use pte_none() to test for empty PTE
x86/mm: Disallow running with 32-bit PTEs to work around erratum
x86/mm: Ignore A/D bits in pte/pmd/pud_none()
x86/mm: Move swap offset/type up in PTE to work around erratum
x86/entry: Inline enter_from_user_mode()
...
Linus Torvalds [Mon, 25 Jul 2016 22:09:08 +0000 (15:09 -0700)]
Merge branch 'x86-apic-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86/apic updates from Ingo Molnar:
"Misc cleanups and a small fix"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Remove the unused struct apic::apic_id_mask field
x86/apic: Fix misspelled APIC
x86/ioapic: Simplify ioapic_setup_resources()
Linus Torvalds [Mon, 25 Jul 2016 21:43:00 +0000 (14:43 -0700)]
Merge branch 'timers-nohz-for-linus' of git://git./linux/kernel/git/tip/tip
Pull NOHZ updates from Ingo Molnar:
- fix system/idle cputime leaked on cputime accounting (all nohz
configs) (Rik van Riel)
- remove the messy, ad-hoc irqtime account on nohz-full and make it
compatible with CONFIG_IRQ_TIME_ACCOUNTING=y instead (Rik van Riel)
- cleanups (Frederic Weisbecker)
- remove unecessary irq disablement in the irqtime code (Rik van Riel)
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cputime: Drop local_irq_save/restore from irqtime_account_irq()
sched/cputime: Reorganize vtime native irqtime accounting headers
sched/cputime: Clean up the old vtime gen irqtime accounting completely
sched/cputime: Replace VTIME_GEN irq time code with IRQ_TIME_ACCOUNTING code
sched/cputime: Count actually elapsed irq & softirq time
Linus Torvalds [Mon, 25 Jul 2016 20:59:34 +0000 (13:59 -0700)]
Merge branch 'sched-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- introduce and use task_rcu_dereference()/try_get_task_struct() to fix
and generalize task_struct handling (Oleg Nesterov)
- do various per entity load tracking (PELT) fixes and optimizations
(Peter Zijlstra)
- cputime virt-steal time accounting enhancements/fixes (Wanpeng Li)
- introduce consolidated cputime output file cpuacct.usage_all and
related refactorings (Zhao Lei)
- ... plus misc fixes and enhancements
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Panic on scheduling while atomic bugs if kernel.panic_on_warn is set
sched/cpuacct: Introduce cpuacct.usage_all to show all CPU stats together
sched/cpuacct: Use loop to consolidate code in cpuacct_stats_show()
sched/cpuacct: Merge cpuacct_usage_index and cpuacct_stat_index enums
sched/fair: Rework throttle_count sync
sched/core: Fix sched_getaffinity() return value kerneldoc comment
sched/fair: Reorder cgroup creation code
sched/fair: Apply more PELT fixes
sched/fair: Fix PELT integrity for new tasks
sched/cgroup: Fix cpu_cgroup_fork() handling
sched/fair: Fix PELT integrity for new groups
sched/fair: Fix and optimize the fork() path
sched/cputime: Add steal time support to full dynticks CPU time accounting
sched/cputime: Fix prev steal time accouting during CPU hotplug
KVM: Fix steal clock warp during guest CPU hotplug
sched/debug: Always show 'nr_migrations'
sched/fair: Use task_rcu_dereference()
sched/api: Introduce task_rcu_dereference() and try_get_task_struct()
sched/idle: Optimize the generic idle loop
sched/fair: Fix the wrong throttled clock time for cfs_rq_clock_task()
Kees Cook [Mon, 25 Jul 2016 20:50:36 +0000 (13:50 -0700)]
Merge tag 'v4.7' into for-linus/pstore
Linux 4.7
Linus Torvalds [Mon, 25 Jul 2016 20:20:41 +0000 (13:20 -0700)]
Merge branch 'perf-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"With over 300 commits it's been a busy cycle - with most of the work
concentrated on the tooling side (as it should).
The main kernel side enhancements were:
- Add per event callchain limit: Recently we introduced a sysctl to
tune the max-stack for all events for which callchains were
requested:
$ sysctl kernel.perf_event_max_stack
kernel.perf_event_max_stack = 127
Now this patch introduces a way to configure this per event, i.e.
this becomes possible:
$ perf record -e sched:*/max-stack=2/ -e block:*/max-stack=10/ -a
allowing finer tuning of how much buffer space callchains use.
This uses an u16 from the reserved space at the end, leaving
another u16 for future use.
There has been interest in even finer tuning, namely to control the
max stack for kernel and userspace callchains separately. Further
discussion is needed, we may for instance use the remaining u16 for
that and when it is present, assume that the sample_max_stack
introduced in this patch applies for the kernel, and the u16 left
is used for limiting the userspace callchain (Arnaldo Carvalho de
Melo)
- Optimize AUX event (hardware assisted side-band event) delivery
(Kan Liang)
- Rework Intel family name macro usage (this is partially x86 arch
work) (Dave Hansen)
- Refine and fix Intel LBR support (David Carrillo-Cisneros)
- Add support for Intel 'TopDown' events (Andi Kleen)
- Intel uncore PMU driver fixes and enhancements (Kan Liang)
- ... other misc changes.
Here's an incomplete list of the tooling enhancements (but there's
much more, see the shortlog and the git log for details):
- Support cross unwinding, i.e. collecting '--call-graph dwarf'
perf.data files in one machine and then doing analysis in another
machine of a different hardware architecture. This enables, for
instance, to do:
$ perf record -a --call-graph dwarf
on a x86-32 or aarch64 system and then do 'perf report' on it on a
x86_64 workstation (He Kuang)
- Allow reading from a backward ring buffer (one setup via
sys_perf_event_open() with perf_event_attr.write_backward = 1)
(Wang Nan)
- Finish merging initial SDT (Statically Defined Traces) support, see
cset comments for details about how it all works (Masami Hiramatsu)
- Support attaching eBPF programs to tracepoints (Wang Nan)
- Add demangling of symbols in programs written in the Rust language
(David Tolnay)
- Add support for tracepoints in the python binding, including an
example, that sets up and parses sched:sched_switch events,
tools/perf/python/tracepoint.py (Jiri Olsa)
- Introduce --stdio-color to set up the color output mode selection
in 'annotate' and 'report', allowing emit color escape sequences
when redirecting the output of these tools (Arnaldo Carvalho de
Melo)
- Add 'callindent' option to 'perf script -F', to indent the Intel PT
call stack, making this output more ftrace-like (Adrian Hunter,
Andi Kleen)
- Allow dumping the object files generated by llvm when processing
eBPF scriptlet events (Wang Nan)
- Add stackcollapse.py script to help generating flame graphs (Paolo
Bonzini)
- Add --ldlat option to 'perf mem' to specify load latency for loads
event (e.g. cpu/mem-loads/ ) (Jiri Olsa)
- Tooling support for Intel TopDown counters, recently added to the
kernel (Andi Kleen)"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (303 commits)
perf tests: Add is_printable_array test
perf tools: Make is_printable_array global
perf script python: Fix string vs byte array resolving
perf probe: Warn unmatched function filter correctly
perf cpu_map: Add more helpers
perf stat: Balance opening and reading events
tools: Copy linux/{hash,poison}.h and check for drift
perf tools: Remove include/linux/list.h from perf's MANIFEST
tools: Copy the bitops files accessed from the kernel and check for drift
Remove: kernel unistd*h files from perf's MANIFEST, not used
perf tools: Remove tools/perf/util/include/linux/const.h
perf tools: Remove tools/perf/util/include/asm/byteorder.h
perf tools: Add missing linux/compiler.h include to perf-sys.h
perf jit: Remove some no-op error handling
perf jit: Add missing curly braces
objtool: Initialize variable to silence old compiler
objtool: Add -I$(srctree)/tools/arch/$(ARCH)/include/uapi
perf record: Add --tail-synthesize option
perf session: Don't warn about out of order event if write_backward is used
perf tools: Enable overwrite settings
...
Linus Torvalds [Mon, 25 Jul 2016 20:13:19 +0000 (13:13 -0700)]
Merge branch 'ras-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
"The biggest change in this cycle was an enhancement by Yazen Ghannam
to reduce the number of MCE error injection related IPIs.
The rest are smaller fixes"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Fix mce_rdmsrl() warning message
x86/RAS/AMD: Reduce the number of IPIs when prepping error injection
x86/mce/AMD: Increase size of the bank_map type
x86/mce: Do not use bank 1 for APEI generated error logs
Linus Torvalds [Mon, 25 Jul 2016 19:41:29 +0000 (12:41 -0700)]
Merge branch 'locking-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"The locking tree was busier in this cycle than the usual pattern - a
couple of major projects happened to coincide.
The main changes are:
- implement the atomic_fetch_{add,sub,and,or,xor}() API natively
across all SMP architectures (Peter Zijlstra)
- add atomic_fetch_{inc/dec}() as well, using the generic primitives
(Davidlohr Bueso)
- optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
Waiman Long)
- optimize smp_cond_load_acquire() on arm64 and implement LSE based
atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
on arm64 (Will Deacon)
- introduce smp_acquire__after_ctrl_dep() and fix various barrier
mis-uses and bugs (Peter Zijlstra)
- after discovering ancient spin_unlock_wait() barrier bugs in its
implementation and usage, strengthen its semantics and update/fix
usage sites (Peter Zijlstra)
- optimize mutex_trylock() fastpath (Peter Zijlstra)
- ... misc fixes and cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
locking/static_keys: Fix non static symbol Sparse warning
locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
locking/atomic, arch/tile: Fix tilepro build
locking/atomic, arch/m68k: Remove comment
locking/atomic, arch/arc: Fix build
locking/Documentation: Clarify limited control-dependency scope
locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
locking/atomic, arch/mips: Convert to _relaxed atomics
locking/atomic, arch/alpha: Convert to _relaxed atomics
locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
locking/atomic: Fix atomic64_relaxed() bits
locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
...
Linus Torvalds [Mon, 25 Jul 2016 19:30:01 +0000 (12:30 -0700)]
Merge branch 'efi-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
"The biggest change in this cycle were SGI/UV related changes that
clean up and fix UV boot quirks and problems.
There's also various smaller cleanups and refinements"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Reorganize the GUID table to make it easier to read
x86/efi: Remove the unused efi_get_time() function
x86/efi: Update efi_thunk() to use the the arch_efi_call_virt*() macros
x86/uv: Update uv_bios_call() to use efi_call_virt_pointer()
efi: Convert efi_call_virt() to efi_call_virt_pointer()
x86/efi: Remove unused variable 'efi'
efi: Document #define FOO_PROTOCOL_GUID layout
efibc: Report more information in the error messages
Linus Torvalds [Mon, 25 Jul 2016 19:04:11 +0000 (12:04 -0700)]
Merge branch 'core-rcu-for-linus' of git://git./linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"The main changes in this cycle were:
- documentation updates
- miscellaneous fixes
- minor reorganization of code
- torture-test updates"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
rcu: Correctly handle sparse possible cpus
rcu: sysctl: Panic on RCU Stall
rcu: Fix a typo in a comment
rcu: Make call_rcu_tasks() tolerate first call with irqs disabled
rcu: Disable TASKS_RCU for usermode Linux
rcu: No ordering for rcu_assign_pointer() of NULL
rcutorture: Fix error return code in rcu_perf_init()
torture: Inflict default jitter
rcuperf: Don't treat gp_exp mis-setting as a WARN
rcutorture: Drop "-soundhw pcspkr" from x86 boot arguments
rcutorture: Don't specify the cpu type of QEMU on PPC
rcutorture: Make -soundhw a x86 specific option
rcutorture: Use vmlinux as the fallback kernel image
rcutorture/doc: Create initrd using dracut
torture: Stop onoff task if there is only one cpu
torture: Add starvation events to error summary
torture: Break online and offline functions out of torture_onoff()
torture: Forgive lengthy trace dumps and preemption
torture: Remove CONFIG_RCU_TORTURE_TEST_RUNNABLE, simplify code
torture: Simplify code, eliminate RCU_PERF_TEST_RUNNABLE
...
Ingo Molnar [Mon, 25 Jul 2016 17:48:41 +0000 (19:48 +0200)]
Merge tag 'perf-core-for-mingo-
20160725' of git://git./linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
- Add AVX-512 support to the instruction decoder, used by Intel PT,
fix vcvtph2ps instruction decoding (Adrian Hunter)
- Make objtool and vdso2c use the right arch header search path
(Stephen Rothwell, Josh Poimboeuf, Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Arnaldo Carvalho de Melo [Mon, 25 Jul 2016 14:57:37 +0000 (11:57 -0300)]
Revert "perf tools: event.h needs asm/perf_regs.h"
This reverts commit
e083a21fcac9311ca425e600a15332f4792c56cc.
Not needed at all, tools/perf/util/perf_regs.h, included via:
#include "perf_regs.h"
Should have a definition for PERF_REGS_MAX, and since this is dependent
on HAVE_PERF_REGS_SUPPORT, fixes the build on powerpc, noticed by trying
to cross compile this from ubuntu16.04 with a locally build libz &
elfutils pair, since those are not available in multilib packages.
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-0bv204s71t4wuw1l53b6fz79@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Stephen Rothwell [Sat, 23 Jul 2016 04:35:40 +0000 (14:35 +1000)]
x86: Make the vdso2c compiler use the host architecture headers
To be clear: this is a ppc64le hosted, x86_64 target cross build.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160723150845.3af8e452@canb.auug.org.au
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rafael J. Wysocki [Mon, 25 Jul 2016 11:46:30 +0000 (13:46 +0200)]
Merge branch 'pm-cpu'
* pm-cpu:
x86: remove duplicate turbo ratio limit MSRs
tools/power turbostat: Replace MSR_NHM_TURBO_RATIO_LIMIT
cpufreq: intel_pstate: Replace MSR_NHM_TURBO_RATIO_LIMIT
Rafael J. Wysocki [Mon, 25 Jul 2016 11:46:23 +0000 (13:46 +0200)]
Merge branch 'powercap'
* powercap:
powercap / RAPL: Add support for Ivy Bridge server
powercap / RAPL: add support for Denverton
powercap / RAPL: handle missing MSRs
powercap / RAPL: reduce message loglevel
Rafael J. Wysocki [Mon, 25 Jul 2016 11:46:15 +0000 (13:46 +0200)]
Merge branch 'pm-cpuidle'
* pm-cpuidle:
intel_idle: correct BXT support
intel_idle: re-work bxt_idle_state_table_update() and its helper
idle_intel: Add Denverton
drivers/idle: make intel_idle.c driver more explicitly non-modular
Rafael J. Wysocki [Mon, 25 Jul 2016 11:46:08 +0000 (13:46 +0200)]
Merge branch 'pm-cpufreq'
* pm-cpufreq: (41 commits)
Revert "cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency"
cpufreq: export cpufreq_driver_resolve_freq()
cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index()
cpufreq: acpi-cpufreq: use cached frequency mapping when possible
cpufreq: schedutil: map raw required frequency to driver frequency
cpufreq: add cpufreq_driver_resolve_freq()
cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT
intel_pstate: Update cpu_frequency tracepoint every time
cpufreq: intel_pstate: clean remnant struct element
cpufreq: powernv: Replacing pstate_id with frequency table index
intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate()
cpufreq: Reuse new freq-table helpers
cpufreq: Handle sorted frequency tables more efficiently
cpufreq: Drop redundant check from cpufreq_update_current_freq()
intel_pstate: Declare pid_params/pstate_funcs/hwp_active __read_mostly
intel_pstate: add __init/__initdata marker to some functions/variables
intel_pstate: Fix incorrect placement of __initdata
cpufreq: mvebu: fix integer to pointer cast
cpufreq: intel_pstate: Broxton support
cpufreq: conservative: Do not use transition notifications
...
Rafael J. Wysocki [Mon, 25 Jul 2016 11:45:39 +0000 (13:45 +0200)]
Merge branch 'x86/cpu' from tip
Rafael J. Wysocki [Mon, 25 Jul 2016 11:45:27 +0000 (13:45 +0200)]
Merge branches 'pm-core', 'pm-clk', 'pm-domains' and 'pm-pci'
* pm-core:
PM / runtime: Asynchronous "idle" in pm_runtime_allow()
PM / runtime: print error when activating a child to unactive parent
* pm-clk:
PM / clk: Add support for adding a specific clock from device-tree
PM / clk: export symbols for existing pm_clk_<...> API fcns
* pm-domains:
PM / Domains: Convert pm_genpd_init() to return an error code
PM / Domains: Stop/start devices during system PM suspend/resume in genpd
PM / Domains: Allow runtime PM during system PM phases
PM / Runtime: Avoid resuming devices again in pm_runtime_force_resume()
PM / Domains: Remove redundant pm_request_idle() call in genpd
PM / Domains: Remove redundant wrapper functions for system PM
PM / Domains: Allow genpd to power on during system PM phases
* pm-pci:
PCI / PM: check all fields in pci_set_platform_pm()
Rafael J. Wysocki [Mon, 25 Jul 2016 11:45:11 +0000 (13:45 +0200)]
Merge branch 'pm-devfreq'
* pm-devfreq:
PM / devfreq: exynos-bus: add missing of_node_put after calling of_parse_phandle
PM / devfreq: add missing of_node_put after calling of_parse_phandle
PM / devfreq: exynos-ppmu: fix error path in exynos_ppmu_probe()
PM / devfreq: exynos: fix error path in exynos_bus_probe()
PM / devfreq: make event/exynos-ppmu DEVFREQ_EVENT_EXYNOS_PPMU tristate
PM / devfreq: make event/exynos-nocp DEVFREQ_EVENT_EXYNOS_NOCP tristate
PM / devfreq: make exynos-bus ARM_EXYNOS_BUS_DEVFREQ tristate
PM / devfreq: make devfreq-event explicitly non-modular
PM / devfreq: make devfreq explicitly non-modular
Rafael J. Wysocki [Mon, 25 Jul 2016 11:44:32 +0000 (13:44 +0200)]
Merge branches 'pm-sleep' and 'pm-tools'
* pm-sleep:
PM / hibernate: Introduce test_resume mode for hibernation
x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
PM / hibernate: Image data protection during restoration
PM / hibernate: Add missing braces in __register_nosave_region()
PM / hibernate: Clean up comments in snapshot.c
PM / hibernate: Clean up function headers in snapshot.c
PM / hibernate: Add missing braces in hibernate_setup()
PM / hibernate: Recycle safe pages after image restoration
PM / hibernate: Simplify mark_unsafe_pages()
PM / hibernate: Do not free preallocated safe pages during image restore
PM / suspend: show workqueue state in suspend flow
PM / sleep: make PM notifiers called symmetrically
PM / sleep: Make pm_prepare_console() return void
PM / Hibernate: Don't let kasan instrument snapshot.c
* pm-tools:
PM / tools: scripts: AnalyzeSuspend v4.2
tools/turbostat: allow user to alter DESTDIR and PREFIX
Rafael J. Wysocki [Mon, 25 Jul 2016 11:43:03 +0000 (13:43 +0200)]
Merge branches 'acpi-drivers', 'acpi-misc' and 'acpi-tools'
* acpi-drivers:
ACPI / DPTF: move int340x_thermal.c to the DPTF folder
ACPI / DPTF: Add DPTF power participant driver
* acpi-misc:
ACPI / lpat: make it explicitly non-modular
ACPI / dock: make dock explicitly non-modular
* acpi-tools:
tools/acpi: use CROSS_COMPILE to define prefix
Rafael J. Wysocki [Mon, 25 Jul 2016 11:42:48 +0000 (13:42 +0200)]
Merge branch 'acpi-pmic'
* acpi-pmic:
ACPI / PMIC: remove modular references from non-modular code
ACPI / PMIC: intel: initialize result to 0
ACPI / PMIC: intel: add REGS operation region support
ACPI / PMIC: Add opregion driver for Intel BXT WhiskeyCove PMIC
ACPI / PMIC: modify the pen function signature to take bit field
Conflicts:
drivers/acpi/Makefile
Rafael J. Wysocki [Mon, 25 Jul 2016 11:42:25 +0000 (13:42 +0200)]
Merge branches 'acpi-processor', 'acpi-cppc', 'acpi-apei' and 'acpi-sleep'
* acpi-processor:
ACPI: enable ACPI_PROCESSOR_IDLE on ARM64
arm64: add support for ACPI Low Power Idle(LPI)
drivers: firmware: psci: initialise idle states using ACPI LPI
cpuidle: introduce CPU_PM_CPU_IDLE_ENTER macro for ARM{32, 64}
arm64: cpuidle: drop __init section marker to arm_cpuidle_init
ACPI / processor_idle: Add support for Low Power Idle(LPI) states
ACPI / processor_idle: introduce ACPI_PROCESSOR_CSTATE
* acpi-cppc:
mailbox: pcc: Add PCC request and free channel declarations
ACPI / CPPC: Prevent cpc_desc_ptr points to the invalid data
ACPI: CPPC: Return error if _CPC is invalid on a CPU
* acpi-apei:
ACPI / APEI: Add Boot Error Record Table (BERT) support
ACPI / einj: Make error paths more talkative
ACPI / einj: Convert EINJ_PFX to proper pr_fmt
* acpi-sleep:
ACPI: Execute _PTS before system reboot
Rafael J. Wysocki [Mon, 25 Jul 2016 11:42:00 +0000 (13:42 +0200)]
Merge branches 'acpi-ec', 'acpi-video', 'acpi-button' and 'acpi-thermal'
* acpi-ec:
ACPI / EC: Remove wrong ECDT correction quirks
ACPI / EC: Cleanup boot EC code using acpi_ec_alloc()
* acpi-video:
ACPI / video: Dummy acpi_video_register should return error code
ACPI / video: skip evaluating _DOD when it does not exist
ACPI / video: Thinkpad X201 Tablet needs video_detect_force_video
* acpi-button:
ACPI / button: Add quirks for initial lid state notification
ACPI / button: Refactor functions to eliminate redundant code
ACPI / button: Remove initial lid state notification
* acpi-thermal:
ACPI / thermal: Remove create_workqueue()
Rafael J. Wysocki [Mon, 25 Jul 2016 11:41:25 +0000 (13:41 +0200)]
Merge branches 'acpi-bus', 'acpi-pci', 'acpica' and 'acpi-doc'
* acpi-bus:
ACPI / bus: Support for platform initiated graceful shutdown
ACPI / bus: Correct the comments about acpi_subsystem_init()
ACPI / bus: Use acpi_handle_debug() in acpi_print_osc_error()
* acpi-pci:
ACPI / PCI: make pci_slot explicitly non-modular
ACPI / PCI: pci_slot: Use generic pr_debug utility
ACPI / PCI: pci_slot: Use more common logging style
* acpica:
ACPICA: Linux: Enable ACPI_MUTEX_DEBUG for Linux kernel
* acpi-doc:
ACPI / debugger: Add AML debugger documentation
ACPI: Add documentation describing ACPICA release automation
Rafael J. Wysocki [Mon, 25 Jul 2016 11:41:01 +0000 (13:41 +0200)]
Merge branch 'acpi-tables'
* acpi-tables:
ACPI: Rename configfs.c to acpi_configfs.c to prevent link error
ACPI: add support for loading SSDTs via configfs
ACPI: add support for configfs
efi / ACPI: load SSTDs from EFI variables
spi / ACPI: add support for ACPI reconfigure notifications
i2c / ACPI: add support for ACPI reconfigure notifications
ACPI: add support for ACPI reconfiguration notifiers
ACPI / scan: fix enumeration (visited) flags for bus rescans
ACPI / documentation: add SSDT overlays documentation
ACPI: ARM64: support for ACPI_TABLE_UPGRADE
ACPI / tables: introduce ARCH_HAS_ACPI_TABLE_UPGRADE
ACPI / tables: move arch-specific symbol to asm/acpi.h
ACPI / tables: table upgrade: refactor function definitions
ACPI / tables: table upgrade: use cacheable map for tables
Conflicts:
arch/arm64/include/asm/acpi.h
Rafael J. Wysocki [Mon, 25 Jul 2016 11:40:39 +0000 (13:40 +0200)]
Merge branch 'acpi-numa'
* acpi-numa:
ACPI / NUMA: Enable ACPI based NUMA on ARM64
arm64, ACPI, NUMA: NUMA support based on SRAT and SLIT
ACPI / processor: Add acpi_map_madt_entry()
ACPI / NUMA: Improve SRAT error detection and add messages
ACPI / NUMA: Move acpi_numa_memory_affinity_init() to drivers/acpi/numa.c
ACPI / NUMA: remove unneeded acpi_numa=1
ACPI / NUMA: move bad_srat() and srat_disabled() to drivers/acpi/numa.c
x86 / ACPI / NUMA: cleanup acpi_numa_processor_affinity_init()
arm64, NUMA: Cleanup NUMA disabled messages
arm64, NUMA: rework numa_add_memblk()
ACPI / NUMA: move acpi_numa_slit_init() to drivers/acpi/numa.c
ACPI / NUMA: Move acpi_numa_arch_fixup() to ia64 only
ACPI / NUMA: remove duplicate NULL check
ACPI / NUMA: Replace ACPI_DEBUG_PRINT() with pr_debug()
ACPI / NUMA: Use pr_fmt() instead of printk
Linus Torvalds [Mon, 25 Jul 2016 04:10:30 +0000 (21:10 -0700)]
Merge tag 'hwmon-for-linus-v4.8' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:
- New drivers for FTS BMC "Teutates", TI INA3221, and Sensirion SHT3x.
- Added support for Microchip MCP9808 and TI TMP461.
- Cleanup and minor fixes in various drivers.
* tag 'hwmon-for-linus-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (37 commits)
Documentation: dtb: xgene: Add hwmon dts binding documentation
hwmon: (ftsteutates) Remove unused including <linux/version.h>
hwmon: (adt7411) set bit 3 in CFG1 register
hwmon: Add driver for FTS BMC chip "Teutates"
hwmon: (sht3x) add humidity heater element control
hwmon: (jc42) Add support for generic JC-42.4 devicetree binding
dt/bindings: Add bindings for JC-42.4 compatible temperature sensors
hwmon: (tmp102) Convert to use regmap, and drop local cache
hwmon: (tmp102) Rework chip configuration
hwmon: (tmp102) Improve handling of initial read delay
hwmon: (lm90) Drop unnecessary else statements
hwmon: (lm90) Use bool for valid flag
hwmon: (lm90) Read limit registers only once
hwmon: (lm90) Simplify read functions
hwmon: (lm90) Use devm_hwmon_device_register_with_groups
hwmon: (lm90) Use devm_add_action for cleanup
hwmon: (lm75) Convert to use regmap
hwmon: (lm75) Add update_interval attribute
hwmon: (lm75) Drop lm75_read_value and lm75_write_value
hwmon: (lm75) Handle cleanup with devm_add_action
...
Linus Torvalds [Mon, 25 Jul 2016 03:56:29 +0000 (20:56 -0700)]
Merge tag 'renesas-sh-drivers-for-v4.8' of git://git./linux/kernel/git/horms/renesas
Pull SH drivers updates from Simon Horman:
"Drop use of SH Drivers on Renesas ARM Based SoCs.
I expect this to be my last pull request for these drivers as it
removes usage of them from Renesas ARM Based SoCs and my
co-maintenance of them.
The drivers are still used by some SH SoCs and listed under SUPERH in
the MAINTAINERS file"
* tag 'renesas-sh-drivers-for-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
MAINTAINERS: Drop drivers/sh/ for Renesas ARM
drivers: sh: Stop using the legacy clock domain on ARM
Linus Torvalds [Mon, 25 Jul 2016 00:22:18 +0000 (17:22 -0700)]
Merge tag 'usb-4.8-rc1' of git://git./linux/kernel/git/gregkh/usb
Pull USB updates from Greg KH:
"Here's the big USB driver update for 4.8-rc1. Lots of the normal
stuff in here, musb, gadget, xhci, and other updates and fixes. All
of the details are in the shortlog.
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (169 commits)
cdc-acm: beautify probe()
cdc-wdm: use the common CDC parser
cdc-acm: cleanup error handling
cdc-acm: use the common parser
usbnet: move the CDC parser into USB core
usb: musb: sunxi: Simplify dr_mode handling
usb: musb: sunxi: make unexported symbols static
usb: musb: cppi41: add dma channel tracepoints
usb: musb: cppi41: move struct cppi41_dma_channel to header
usb: musb: cleanup cppi_dma header
usb: musb: gadget: add usb-request tracepoints
usb: musb: host: add urb tracepoints
usb: musb: add tracepoints to dump interrupt events
usb: musb: add tracepoints for register access
usb: musb: dsps: use musb register read/write wrappers instead
usb: musb: switch dev_dbg to tracepoints
usb: musb: add tracepoints support for debugging
usb: quirks: Add no-lpm quirk for Elan
phy: rcar-gen3-usb2: fix mutex_lock calling in interrupt
phy: rockhip-usb: use devm_add_action_or_reset()
...
Linus Torvalds [Mon, 25 Jul 2016 00:14:37 +0000 (17:14 -0700)]
Merge tag 'tty-4.8-rc1' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here is the big tty and serial driver update for 4.8-rc1.
Lots of good cleanups from Jiri on a number of vt and other tty
related things, and the normal driver updates. Full details are in
the shortlog.
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (90 commits)
tty/serial: atmel: enforce tasklet init and termination sequences
serial: sh-sci: Stop transfers in sci_shutdown()
serial: 8250_ingenic: drop #if conditional surrounding earlycon code
serial: 8250_mtk: drop !defined(MODULE) conditional
serial: 8250_uniphier: drop !defined(MODULE) conditional
earlycon: mark earlycon code as __used iif the caller is built-in
tty/serial/8250: use mctrl_gpio helpers
serial: mctrl_gpio: enable API usage only for initialized mctrl_gpios struct
serial: mctrl_gpio: add modem control read routine
tty/serial/8250: make UART_MCR register access consistent
serial: 8250_mid: Read RX buffer on RX DMA timeout for DNV
serial: 8250_dma: Export serial8250_rx_dma_flush()
dmaengine: hsu: Export hsu_dma_get_status()
tty: serial: 8250: add CON_CONSDEV to flags
tty: serial: samsung: add byte-order aware bit functions
tty: serial: samsung: fixup accessors for endian
serial: sirf: make fifo functions static
serial: mps2-uart: make driver explicitly non-modular
serial: mvebu-uart: free the IRQ in ->shutdown()
serial/bcm63xx_uart: use correct alias naming
...
Linus Torvalds [Sun, 24 Jul 2016 23:55:23 +0000 (16:55 -0700)]
Merge tag 'staging-4.8-rc1' of git://git./linux/kernel/git/gregkh/staging
Pull staging and IIO driver updates from Greg KH:
"Here is the big Staging and IIO driver update for 4.8-rc1.
We ended up adding more code than removing, again, but it's not all
that bad. Lots of cleanups all over the staging tree, and new IIO
drivers, full details in the shortlog.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (417 commits)
drivers:iio:accel:mma8452: removed unwanted return statements
drivers:iio:accel:mma8452: added cleanup provision in case of failure.
iio: Add iio.git tree to MAINTAINERS
iio:st_pressure: clean useless static channel initializers
iio:st_pressure:lps22hb: temperature support
iio:st_pressure:lps22hb: open drain support
iio:st_pressure: temperature triggered buffering
iio:st_pressure: document sampling gains
iio:st_pressure: align storagebits on power of 2
iio:st_sensors: align on storagebits boundaries
staging:iio:lis3l02dq drop separate driver
iio: accel: st_accel: Add lis3l02dq support
iio: adc: add missing of_node references to iio_dev
iio: adc: ti-ads1015: add indio_dev->dev.of_node reference
iio: potentiometer: Fix typo in Kconfig
iio: potentiometer: mcp4531: Add device tree binding
iio: potentiometer: mcp4531: Add device tree binding documentation
iio: potentiometer: mcp4531: Add support for MCP454x, MCP456x, MCP464x and MCP466x
iio:imu:mpu6050: icm20608 initial support
iio: adc: max1363: Add device tree binding
...
Linus Torvalds [Sun, 24 Jul 2016 23:26:26 +0000 (16:26 -0700)]
Merge tag 'char-misc-4.8-rc1' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big char/misc driver update for 4.8-rc1.
Not a lot of stuff, but it's all over the place, full details are in
the shortlog. All of these have been in linux-next with no reported
issues for a while"
* tag 'char-misc-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (49 commits)
lkdtm: silence warnings about function declarations
lkdtm: hide unused functions
intel_th: pci: Add Kaby Lake PCH-H support
intel_th: Fix a deadlock in modprobing
dsp56k: prevent a harmless underflow
chardev: add missing line break in pr_warn
lkdtm: use struct arrays instead of enums
lkdtm: move jprobe entry points to start of source
lkdtm: reorganize module paramaters
lkdtm: rename globals for clarity
lkdtm: rename "count" to "crash_count"
lkdtm: remove intentional off-by-one array access
lkdtm: split remaining logic bug tests to separate file
lkdtm: split heap corruption tests to separate file
lkdtm: split memory permissions tests to separate file
lkdtm: split usercopy tests to separate file
lkdtm: drop "alloc_size" parameter
lkdtm: add usercopy test for blocking kernel text
extcon: adc-jack: add suspend/resume support
extcon: add missing of_node_put after calling of_parse_phandle
...
Linus Torvalds [Sun, 24 Jul 2016 23:07:52 +0000 (16:07 -0700)]
Merge tag 'gfs2-4.7.fixes' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Bob Peterson:
"We've got ten patches this time, half of which are related to a
plethora of nasty outcomes when inodes are transitioned from the
unlinked state to the free state. Small file systems are particularly
vulnerable to these problems, and it can manifest as mainly hangs, but
also file system corruption. The patches have been tested for
literally many weeks, with a very gruelling test, so I have a high
level of confidence.
- Andreas Gruenbacher wrote a series of five patches for various
lockups during the transition of inodes from unlinked to free.
The main patch is titled "Fix gfs2_lookup_by_inum lock inversion"
and the other four are support and cleanup patches related to that.
- Ben Marzinski contributed two patches with regard to a recreatable
problem when gfs2 tries to write a page to a file that is being
truncated, resulting in a BUG() in gfs2_remove_from_journal.
Note that Ben had to export vfs function __block_write_full_page to
get this to work properly. It's been posted a long time and he
talked to various VFS people about it, and nobody seemed to mind.
- I contributed 3 patches:
o The first one fixes a memory corruptor: a race in which one
process can overwrite the gl_object pointer set by another
process, causing kernel panic and other symptoms.
o The second patch fixes another race that resulted in a
false-positive BUG_ON. This occurred when resource group
reservations were freed by one process while another process
was trying to grab a new reservation in the same resource
group.
o The third patch fixes a problem with doing journal replay when
the journals are not all the same size"
* tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
GFS2: Fix gfs2_replay_incr_blk for multiple journal sizes
GFS2: Check rs_free with rd_rsspin protection
gfs2: writeout truncated pages
fs: export __block_write_full_page
gfs2: Lock holder cleanup
gfs2: Large-filesystem fix for 32-bit systems
gfs2: Get rid of gfs2_ilookup
gfs2: Fix gfs2_lookup_by_inum lock inversion
gfs2: Initialize iopen glock holder for new inodes
GFS2: don't set rgrp gl_object until it's inserted into rgrp tree
Linus Torvalds [Sun, 24 Jul 2016 19:23:50 +0000 (12:23 -0700)]
Linux 4.7
Linus Torvalds [Sun, 24 Jul 2016 01:00:31 +0000 (10:00 +0900)]
Merge tag 'ceph-for-4.7-rc8' of git://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov:
"A fix for a long-standing bug in the incremental osdmap handling code
that caused misdirected requests, tagged for stable"
The tag is signed with a brand new key - Sage is on vacation and I
didn't anticipate this"
* tag 'ceph-for-4.7-rc8' of git://github.com/ceph/ceph-client:
libceph: apply new_state before new_up_client on incrementals
Andy Lutomirski [Sat, 23 Jul 2016 16:59:28 +0000 (09:59 -0700)]
x86/mm/cpa: Add missing comment in populate_pdg()
In commit:
21cbc2822aa1 ("x86/mm/cpa: Unbreak populate_pgd(): stop trying to deallocate failed PUDs")
I intended to add this comment, but I failed at using git.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/242baf8612394f4e31216f96d13c4d2e9b90d1b7.1469293159.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Sat, 23 Jul 2016 04:58:08 +0000 (21:58 -0700)]
x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs
Valdis Kletnieks bisected a boot failure back to this recent commit:
360cb4d15567 ("x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated")
I broke the case where a PUD table got allocated -- populate_pud()
would wander off a pgd_none entry and get lost. I'm not sure how
this survived my testing.
Fix the original issue in a much simpler way. The problem
was that, if we allocated a PUD table, failed to populate it, and
freed it, another CPU could potentially keep using the PGD entry we
installed (either by copying it via vmalloc_fault or by speculatively
caching it). There's a straightforward fix: simply leave the
top-level entry in place if this happens. This can't waste any
significant amount of memory -- there are at most 256 entries like
this systemwide and, as a practical matter, if we hit this failure
path repeatedly, we're likely to reuse the same page anyway.
For context, this is a reversion with this hunk added in:
if (ret < 0) {
+ /*
+ * Leave the PUD page in place in case some other CPU or thread
+ * already found it, but remove any useless entries we just
+ * added to it.
+ */
- unmap_pgd_range(cpa->pgd, addr,
+ unmap_pud_range(pgd_entry, addr,
addr + (cpa->numpages << PAGE_SHIFT));
return ret;
}
This effectively open-codes what the now-deleted unmap_pgd_range()
function used to do except that unmap_pgd_range() used to try to
free the page as well.
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Mike Krinkin <krinkin.m.u@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/21cbc2822aa18aa812c0215f4231dbf5f65afa7f.1469249789.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Marc Zyngier [Tue, 19 Jul 2016 12:56:54 +0000 (13:56 +0100)]
arm64: KVM: VHE: Context switch MDSCR_EL1
The kprobe enablement work has uncovered that changes made by
a guest to MDSCR_EL1 were propagated to the host when VHE was
enabled, leading to unexpected exception being delivered.
Moving this register to the list of registers that are always
context-switched fixes the issue.
Fixes:
9c6c35683286 ("arm64: KVM: VHE: Split save/restore of registers shared between guest and host")
Cc: stable@vger.kernel.org #4.6
Reported-by: Tirumalesh Chalamarla <Tirumalesh.Chalamarla@cavium.com>
Tested-by: Tirumalesh Chalamarla <Tirumalesh.Chalamarla@cavium.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Linus Torvalds [Sat, 23 Jul 2016 06:44:31 +0000 (15:44 +0900)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix memory leak in nftables, from Liping Zhang.
2) Need to check result of vlan_insert_tag() in batman-adv otherwise we
risk NULL skb derefs, from Sven Eckelmann.
3) Check for dev_alloc_skb() failures in cfg80211, from Gregory
Greenman.
4) Handle properly when we have ppp_unregister_channel() happening in
parallel with ppp_connect_channel(), from WANG Cong.
5) Fix DCCP deadlock, from Eric Dumazet.
6) Bail out properly in UDP if sk_filter() truncates the packet to be
smaller than even the space that the protocol headers need. From
Michal Kubecek.
7) Similarly for rose, dccp, and sctp, from Willem de Bruijn.
8) Make TCP challenge ACKs less predictable, from Eric Dumazet.
9) Fix infinite loop in bgmac_dma_tx_add() from Florian Fainelli.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
packet: propagate sock_cmsg_send() error
net/mlx5e: Fix del vxlan port command buffer memset
packet: fix second argument of sock_tx_timestamp()
net: switchdev: change ageing_time type to clock_t
Update maintainer for EHEA driver.
net/mlx4_en: Add resilience in low memory systems
net/mlx4_en: Move filters cleanup to a proper location
sctp: load transport header after sk_filter
net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int
net: cavium: liquidio: Avoid dma_unmap_single on uninitialized ndata
net: nb8800: Fix SKB leak in nb8800_receive()
et131x: Fix logical vs bitwise check in et131x_tx_timeout()
vlan: use a valid default mtu value for vlan over macsec
net: bgmac: Fix infinite loop in bgmac_dma_tx_add()
mlxsw: spectrum: Prevent invalid ingress buffer mapping
mlxsw: spectrum: Prevent overwrite of DCB capability fields
mlxsw: spectrum: Don't emit errors when PFC is disabled
mlxsw: spectrum: Indicate support for autonegotiation
mlxsw: spectrum: Force link training according to admin state
r8152: add MODULE_VERSION
...