OOM is not recoverable, but should be

Project:JNode Core
Component:Code
Category:bug report
Priority:critical
Assigned:Unassigned
Status:active
Description

Not 100% reproduable, but in 3 runs, I had this twice.

Reproduction:
1. Run JNode (in KVM)
2. GRUB "tests" option
3. Run Mauve tests: "cd /device/hda0/mauve/; mauve tests.txt"

Actual result:
It's working for a while, then hangs with a lot of "grey" color output on the console. That output is also in the kernel log file (via serial console), and attached here.

Scrolling the console is still possible. If you do that, the grey output goes away and the white, normal output appears again.
In both cases, the last line on the console was:
gnu.testlet.java.util.zip.Deflater.PR27435

Hitting return or anything on the console has no effect. Scrolling works.
The host CPU is 98% idle, so it's not that JNode is busy. This state seems to continue infinitely.

In other words, JNode hangs with no CPU when running this test, but not 100% reproducably.

AttachmentSize
kernel.txt14.95 KB

#1

FYI: the grey output comes from calls to Unsafe.debug and friends. It "goes away" because it is painted directly to the memory-mapped VGA(?) screen buffer without going through the normal console output system. When you scroll the console, the console subsystem simply paints over the top of the debug output. That's ok ... sort of ... because the debug output is also written to the serial port.

#2

I now hang in "Generating java.util" all the time. I use ext2 as FS now. I don't know if that's related. It sure is passed the Deflate test.

The hang happens exactly when the GC kicks in - before, the (host) CPU is on 100% usage, then I see the 3 <mark/><sweep/><cleanup/>, then the host CPU load drops to 98% idle, and stays there. Waiting for 1-2 hours (as I did in the previous run) doesn't change anything. The test results are far from complete.

As said, it may be entirely separate issues, as they appeared at different points (and each point repeatedly), but they're both hangs with similar symptoms.

#3

This can be an out of memory problem too.

#4

Well, it has 1 GB RAM now. That should really suffice.
The bug in my last still happened.

I just was had one successful run, after more stops like above.
I'll try to switch back to FAT, whether that makes any difference.

#5

Actually, yes: only now do I notice
...<oom/><mark/><sweep/><cleanup/>
as last line in the attached kernel log, and the console out.
"OOM" = Out of memory.

Why is that? 1 GB should suffice. And the testsuite ran through before with only 512 MB (albeit on FAT, not ext2).

#6

FYI, only mauve hangs. While Enter has no effect, Ctrl-C does abort mauve and put me back on the console.

#7

Title:Lockup when running ZIP test gnu.testlet.java.util.zip.Deflater.PR27435» ext2 leaks lots of RAM
Project:JNode Core» JNode FS

lsantha explained to me that these oom on the kernel console mean that the system can't find or even free any RAM anymore and is so tight on RAM that it can't even throw an Exception object (because that obj also needs RAM etc.), therefore the kernel console msg.
The GC still runs afterwards, so maybe in some cases it succeeds to free up some mem, which would explain why I could continue after oom once, but only once.

I had already increased RAM to 1 GB, about the time when switching to ext2. Obviously, that was not enough, although it should. When I switched back to FAT (but still with 1 GB RAM), it seems to work reliably now (at least in 2-3 test runs). At the beginning, I had only 512 MB.

So, there are two problems/bugs here:
1. Why can't we continue after OOM?

2. Why does ext2 never work with 1 GB RAM, although FAT works reliably and sometimes evebn works with 512 MB RAM? There seems to be some serious abuse of RAM in ext2, probably lots of leaks.

I'll file a new bug 2856 about 2., so this bug here can focus on 1., which it was originally filed as.

#8

Title:ext2 leaks lots of RAM» OOM is not recoverable, but should be
Project:JNode FS» JNode Core