CHROMIUM: avoid deadlock in OOM killer
authorLuigi Semenzato <semenzato@chromium.org>
Tue, 30 Oct 2012 20:35:18 +0000 (13:35 -0700)
committerGerrit <chrome-bot@google.com>
Wed, 31 Oct 2012 01:23:51 +0000 (18:23 -0700)
commit11b92b16b822a8f3dae1a0c68d9f0187580a3328
treec9704dadaf4eb8aaa3570f94e192fa4971022d4e
parente54a4b9c1a94248d26061c992ad61da95561cbb3
CHROMIUM: avoid deadlock in OOM killer

This removes code that prevents a memory starvation in low-memory situations.

select_bad_process may fail to find a victim for the OOM-kill by returning
ERR_PTR(-1).  In theory this should happen only when there is a guarantee
that memory will be freed shortly.  But in some cases this is not true.
If any process tries to allocate memory between setting the PF_EXITING
bit of p->flags and setting p->exit_state to non-zero, it prevents
the OOM-killer from making any progress, and nobody is able to
allocate memory.

I have found a process that does exactly that:

[ 4687.418818]  [<8104512d>] __cond_resched+0x1b/0x2b
[ 4687.418828]  [<813b67a7>] _cond_resched+0x18/0x21
[ 4687.418840]  [<81093940>] shrink_slab+0x224/0x22f
[ 4687.418856]  [<81095a96>] try_to_free_pages+0x1b7/0x2e6
[ 4687.418868]  [<8108df2a>] __alloc_pages_nodemask+0x40a/0x61f
[ 4687.418882]  [<810a9dbe>] read_swap_cache_async+0x4a/0xcf
[ 4687.418894]  [<810a9ea4>] swapin_readahead+0x61/0x8d
[ 4687.418906]  [<8109fff4>] handle_pte_fault+0x310/0x5fb
[ 4687.418918]  [<810a0420>] handle_mm_fault+0xae/0xbd
[ 4687.418932]  [<8101d0f9>] do_page_fault+0x265/0x284
[ 4687.419002]  [<813b7887>] error_code+0x67/0x6c
[ 4687.419060]  [<8102351d>] mm_release+0x1d/0xc3
[ 4687.419070]  [<81026ce9>] exit_mm+0x1d/0xe9
[ 4687.419090]  [<81028082>] do_exit+0x19b/0x640

mm_release gets its page fault in the vicinity of this
code which is related to futexes:

if (unlikely(tsk->robust_list)) {
exit_robust_list(tsk);
tsk->robust_list = NULL;
}

Since robust_list is a userspace structure, the page
fault looks legitimate, and this is likely a design bug
(also see comment about deadlocks earlier in select_bad_process)
and difficult to fix completely.

In any case we're happy to trade spurious OOM kills for no hangs.

BUG=chromium-os:32321
TEST=tested with a load that reliably causes a hang before and none after

Change-Id: I7037e68cc3eef3a36ca355b9535af0f559b3a148
Signed-off-by: Luigi Semenzato <semenzato@chromium.org>
Reviewed-on: https://gerrit.chromium.org/gerrit/36953
Reviewed-by: Mandeep Singh Baines <msb@chromium.org>
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>
mm/oom_kill.c