Skip to main content

cvm termination problem

5 replies [Last post]
lamsap
Offline
Joined: 2004-11-09
Points: 0

Hi,

I'm running phoneme advanced mr1 on an IXP465 (ARM XScale). My program runs for roughly 20-25 minutes before cvm terminates without any message. I've recompiled with:

CVM_DEBUG=true
CVM_JIT=false

Still don't get anything to stdout when it dies. Any ideas on how to find the cause of the termination? There is only one point in my code that calls System.exit(0), and it is immediately preceeded by a System.out.println (and it isn't called in regular operation). Also of note that this same program runs without dying on a x86 machine (running a very old version of cvm, 2004-ish I imagine)

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
lamsap
Offline
Joined: 2004-11-09
Points: 0

> CVM won't typically exit without a message saying
> why. If you
> are running linux, then the kernel might be killing
> the cvm process
> if the system is out of memory. You could also try
> running with
> "strace" or "ltrace" to see if exit() is getting
> called. Running inside
> gdb might reveal something too.

Thanks for the response. When it dies, there is still at ~60MB RAM free out of 128MB total. Also notice that changing the mx from 16MB to 32MB didn't make a difference.

I've run it with strace with -p flag. Not sure how to interpret the results, I'm seeing:

sched_yield() = 0
sched_yield() = 0
sched_yield() = 0
sched_yield() = 0
sched_yield() = 0
sched_yield() = 0
sched_yield() = 0
sched_yield() = 0
sched_yield() = 0
rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
NULL) = 0
rt_sigsuspend([]kill(1872, SIGRTMIN
--- SIGRTMIN (Unknown signal 32) ---
) = 0
<... rt_sigsuspend resumed> ) = 32
rt_sigprocmask(SIG_SETMASK, NULL, sigreturn([RTMIN], 8) = 0
) = ?
rt_sigsuspend([]rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
rt_sigsuspend([]0) = -1 EINTR (Interrupted system call)
--- SIGRT_1 (Unknown signal 33) ---
_exit(1) = ?

--- SIGRT_1 (Unknown signal 33) ---

--- SIGRT_1 (Unknown signal 33) ---
<... rt_sigsuspend resumed> ) = 33
<... rt_sigsuspend resumed> ) = 33
_exit(1) = ?
_exit(1) = ?

On another thread for the same process I saw:

getppid() = 42
poll([{fd=3, events=POLLIN}], 1, 2000) = 0
getppid() = 42
poll([{fd=3, events=POLLIN}], 1, 2000) = 0
getppid() = 42
poll([{fd=3, events=POLLIN}], 1, 2000) = 0
getppid() = 42
poll([{fd=3, events=POLLIN}], 1, 2000) = 0
getppid() = 42
poll([{fd=3, events=POLLIN}], 1, 2000) = 0
getppid() = 42
poll([{fd=3, events=POLLIN}], 1, 2000) = 0
getppid() = 42
poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 2000) = 1
getppid() = 42
read(3, "\276\177\376 \0\0\0\2\0\0\0\1\0\0\0 \0\0\0\0\0\0\0\1\0"..., 148) = 148
kill(61, SIGRT_1) = 0
--- SIGRT_1 (Unknown signal 33) ---
sigreturn() = ?
kill(60, SIGRT_1) = 0
--- SIGRT_1 (Unknown signal 33) ---
sigreturn() = ?
kill(50, SIGRT_1) = 0
--- SIGRT_1 (Unknown signal 33) ---
sigreturn() = ?
kill(49, SIGRT_1) = 0
--- SIGRT_1 (Unknown signal 33) ---
sigreturn() = ?
kill(48, SIGRT_1) = 0
--- SIGRT_1 (Unknown signal 33) ---
sigreturn() = ?
kill(46, SIGRT_1) = 0
--- SIGRT_1 (Unknown signal 33) ---
sigreturn() = ?
kill(45, SIGRT_1) = 0
--- SIGRT_1 (Unknown signal 33) ---
sigreturn() = ?
kill(44, SIGRT_1) = 0
--- SIGRT_1 (Unknown signal 33) ---
sigreturn() = ?
kill(42, SIGRT_1) = 0
kill(855, SIGRT_1) = 0
kill(97, SIGRT_1) = 0
--- SIGRT_1 (Unknown signal 33) ---
sigreturn() = ?
kill(74, SIGRT_1) = 0
--- SIGRT_1 (Unknown signal 33) ---
sigreturn() = ?
wait4(61, NULL, __WCLONE, NULL) = 61
wait4(60, NULL, __WCLONE, NULL) = 60
wait4(50, NULL, __WCLONE, NULL) = 50
wait4(49, NULL, __WCLONE, NULL) = 49
wait4(48, NULL, __WCLONE, NULL) = 48
wait4(46, NULL, __WCLONE, NULL) = 46
wait4(45, NULL, __WCLONE, NULL) = 45
wait4(44, NULL, __WCLONE, NULL) = 44
wait4(42, NULL, __WCLONE, NULL) = -1 ECHILD (No child processes)
wait4(855, NULL, __WCLONE, NULL) = 855
--- SIGRT_1 (Unknown signal 33) ---
sigreturn() = ?
wait4(97, NULL, __WCLONE, NULL) = 97
wait4(74, NULL, __WCLONE, NULL) = 74
kill(62, SIGRTMIN) = 0
--- SIGRT_1 (Unknown signal 33) ---
sigreturn() = ?
_exit(0) = ?

I've looked up the SIGRT_1 signal, seems to be program specific?

xyzzy
Offline
Joined: 2006-08-30
Points: 0

I think the signals you are seeing are from the pthreads implementation.
From the trace, it looks like the threads exited voluntarily. So as a next
step, these are my suggestions:

1) run inside gdb if possible
2) trace function calls with ltrace
3) use the -Xtrace:0x4002 option to turn on method and
exception tracing (debug build only)

Dean

lamsap
Offline
Joined: 2004-11-09
Points: 0

Xtrace did it... turned out to be a memory leak in one of my JNI libraries, the error message for it only showed up under trace. Cheers!

xyzzy
Offline
Joined: 2006-08-30
Points: 0

> Hi,
>
> I'm running phoneme advanced mr1 on an IXP465 (ARM
> XScale). My program runs for roughly 20-25 minutes
> before cvm terminates without any message. I've
> recompiled with:
>
> CVM_DEBUG=true
> CVM_JIT=false
>
> Still don't get anything to stdout when it dies. Any
> ideas on how to find the cause of the termination?
> There is only one point in my code that calls
> System.exit(0), and it is immediately preceeded by a
> System.out.println (and it isn't called in regular
> operation). Also of note that this same program runs
> without dying on a x86 machine (running a very old
> version of cvm, 2004-ish I imagine)

CVM won't typically exit without a message saying why. If you
are running linux, then the kernel might be killing the cvm process
if the system is out of memory. You could also try running with
"strace" or "ltrace" to see if exit() is getting called. Running inside
gdb might reveal something too.

Dean

cjplummer
Offline
Joined: 2006-10-16
Points: 0

You should try running top from another shell while your application is running to see if it is slowly using up more and more memory until eventually there is no free memory. If this is the case, then almost certainly the problem is the kernel killing off the vm.

Chris