CHROMIUM: avoid deadlock in OOM killer
authorLuigi Semenzato <semenzato@chromium.org>
Tue, 30 Oct 2012 20:35:18 +0000 (13:35 -0700)
committerGerrit <chrome-bot@google.com>
Wed, 31 Oct 2012 01:23:51 +0000 (18:23 -0700)
This removes code that prevents a memory starvation in low-memory situations.

select_bad_process may fail to find a victim for the OOM-kill by returning
ERR_PTR(-1).  In theory this should happen only when there is a guarantee
that memory will be freed shortly.  But in some cases this is not true.
If any process tries to allocate memory between setting the PF_EXITING
bit of p->flags and setting p->exit_state to non-zero, it prevents
the OOM-killer from making any progress, and nobody is able to
allocate memory.

I have found a process that does exactly that:

[ 4687.418818]  [<8104512d>] __cond_resched+0x1b/0x2b
[ 4687.418828]  [<813b67a7>] _cond_resched+0x18/0x21
[ 4687.418840]  [<81093940>] shrink_slab+0x224/0x22f
[ 4687.418856]  [<81095a96>] try_to_free_pages+0x1b7/0x2e6
[ 4687.418868]  [<8108df2a>] __alloc_pages_nodemask+0x40a/0x61f
[ 4687.418882]  [<810a9dbe>] read_swap_cache_async+0x4a/0xcf
[ 4687.418894]  [<810a9ea4>] swapin_readahead+0x61/0x8d
[ 4687.418906]  [<8109fff4>] handle_pte_fault+0x310/0x5fb
[ 4687.418918]  [<810a0420>] handle_mm_fault+0xae/0xbd
[ 4687.418932]  [<8101d0f9>] do_page_fault+0x265/0x284
[ 4687.419002]  [<813b7887>] error_code+0x67/0x6c
[ 4687.419060]  [<8102351d>] mm_release+0x1d/0xc3
[ 4687.419070]  [<81026ce9>] exit_mm+0x1d/0xe9
[ 4687.419090]  [<81028082>] do_exit+0x19b/0x640

mm_release gets its page fault in the vicinity of this
code which is related to futexes:

if (unlikely(tsk->robust_list)) {
exit_robust_list(tsk);
tsk->robust_list = NULL;
}

Since robust_list is a userspace structure, the page
fault looks legitimate, and this is likely a design bug
(also see comment about deadlocks earlier in select_bad_process)
and difficult to fix completely.

In any case we're happy to trade spurious OOM kills for no hangs.

BUG=chromium-os:32321
TEST=tested with a load that reliably causes a hang before and none after

Change-Id: I7037e68cc3eef3a36ca355b9535af0f559b3a148
Signed-off-by: Luigi Semenzato <semenzato@chromium.org>
Reviewed-on: https://gerrit.chromium.org/gerrit/36953
Reviewed-by: Mandeep Singh Baines <msb@chromium.org>
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>
mm/oom_kill.c

index 46bf2ed..cd9f076 100644 (file)
@@ -355,14 +355,6 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
                        if (p == current) {
                                chosen = p;
                                *ppoints = 1000;
-                       } else if (!force_kill) {
-                               /*
-                                * If this task is not being ptraced on exit,
-                                * then wait for it to finish before killing
-                                * some other task unnecessarily.
-                                */
-                               if (!(p->group_leader->ptrace & PT_TRACE_EXIT))
-                                       return ERR_PTR(-1UL);
                        }
                }