Skip to main content

CDC phoneME Advanced foundation b34,b135,b167,b168 - crash/errors with signal 11

2 replies [Last post]
fcorades
Offline
Joined: 2010-08-27
Points: 0

I have some, not frequent issues, as SIGSEGV or SIGILL: gdb said that these signals arrives from /lib/libc.so.0 or /lib/libpthread.so.0 .
Usually during network communication or disk (nand with jffs2) operations.
But this problem is difficult to reproduce.
In a software that is running 24h it can happen once a week.

The target is a ARM926 rev.5 and the libc is uClibc-0.9.30.2.
The target distribution and toolchain is prepared with Buildroot 2010.02.

The behavior is that :
1) very often : some threads inside the cvm go in suspension (i see that from top) and they stop working.
2) few times: cvm stop working but all the threads seems running (i see that from top).

From where you suggest to start investigation:
kernel ? toolchain ? arm folder in the cvm src folder ?

The GNUMakefile to prepare this cvm is attached, it comes from :
/phoneme_advanced-mr2-dev-b135/cdc/build/linux-arm-generic

////////////////////////////////////////////////////////////////////
Problem example:
////////////////////////////////////////////////////////////////////

2012-10-26 06:59:15 Server - Incoming connection.
Process #15486 received signal 11
Process #15486 being suspended

# cd /usr/lib/jvm
# gdb bin/cvm 15486

dlopen failed on 'libthread_db.so.1' - File not found
GDB will not be able to debug pthreads.

GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-uclibcgnueabi"...
(no debugging symbols found)
Attaching to program: /usr/lib/jvm/bin/cvm, process 15486

warning: process 15486 is a cloned process
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libdl.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.0
Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.0
Reading symbols from /lib/ld-uClibc.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-uClibc.so.0

0x40052920 in kill () from /lib/libc.so.0
(gdb) fg
Continuing.

Program received signal SIGSTOP, Stopped (signal).
0x40052920 in kill () from /lib/libc.so.0
(gdb) continue
Continuing.

Program received signal SIGSTOP, Stopped (signal).
0x40052920 in kill () from /lib/libc.so.0
(gdb) backtrace
#0 0x40052920 in kill () from /lib/libc.so.0
#1 0x40015f9c in ?? () from /lib/libpthread.so.0
(gdb)

////////////////////////////////////////////////////////////////////
Another problem example:
////////////////////////////////////////////////////////////////////

dlopen failed on 'libthread_db.so.1' - File not found
GDB will not be able to debug pthreads.

GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-uclibcgnueabi"...
(no debugging symbols found)
Attaching to program: /usr/lib/jvm/bin/cvm, process 25018

warning: process 25018 is a cloned process
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libdl.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.0
Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.0
Reading symbols from /lib/ld-uClibc.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-uClibc.so.0

0x40052920 in kill () from /lib/libc.so.0
(gdb) fg
Continuing.

Program received signal SIGSTOP, Stopped (signal).
0x40052920 in kill () from /lib/libc.so.0
(gdb) continue
Continuing.

Program received signal SIGSTOP, Stopped (signal).
0x40052920 in kill () from /lib/libc.so.0
(gdb) backtrace
#0 0x40052920 in kill () from /lib/libc.so.0
#1 0x40015f9c in ?? () from /lib/libpthread.so.0
(gdb)

(gdb) continue
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x40016a2c in __pthread_alt_unlock () from /lib/libpthread.so.0
(gdb) continue
Continuing.

AttachmentSize
lib_folder_content.txt2.23 KB
GNUmakefile.txt5.98 KB

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
fcorades
Offline
Joined: 2010-08-27
Points: 0

Some additional information from a debug with symbols.

Backtrace of the suspended thread:

#0 0x40061448 in __syscall_kill (pid=1701, sig=19) at libc/sysdeps/linux/common/kill.c:16
#1 0x40061400 in *__GI_kill (pid=1701, sig=19) at libc/sysdeps/linux/common/kill.c:20
#2 0x4001f2ac in pthread_sighandler_rt (signo=11, si=0xbd1fed88, uc=0xbd1fee08) at libpthread/linuxthreads.old/signals.c:127
#3
#4 0x40022e34 in __pthread_wait_for_restart_signal (self=0x10) at libpthread/linuxthreads.old/pthread.c:967
#5 0x40020564 in suspend (self=0x10) at libpthread/linuxthreads.old/restart.h:35
#6 0x40020434 in __pthread_alt_lock (lock=0x800, self=0x10) at libpthread/linuxthreads.old/spinlock.c:394
#7 0x4001b9b8 in __pthread_mutex_lock (mutex=0x2606f8) at libpthread/linuxthreads.old/mutex.c:119
#8 0x000e5d38 in LINUXioDequeue ()
#9 0x000e5ecc in CVMioRead ()
#10 0x000e8b2c in Java_java_net_SocketInputStream_socketRead0 ()
#11 0x000fa390 in args_done ()

Extract from pthread.c where is located the line 967 :

/* Primitives for controlling thread execution */

void __pthread_wait_for_restart_signal(pthread_descr self)
{
sigset_t mask;

sigprocmask(SIG_SETMASK, NULL, &mask); /* Get current signal mask */
sigdelset(&mask, __pthread_sig_restart); /* Unblock the restart signal */

SUPPOSED PROBLEM START POINT ? -----------------------------------> THREAD_SETMEM(self, p_signal, 0);

do {
sigsuspend(&mask); /* Wait for signal */
} while (THREAD_GETMEM(self, p_signal) !=__pthread_sig_restart);

READ_MEMORY_BARRIER(); /* See comment in __pthread_restart_new */
}

info threads:
15 Thread 0x440e (LWP 1704) 0x400659d8 in __rt_sigsuspend (mask=0xbcbff100, size=8) at libc/sysdeps/linux/common/sigsuspend.c:19
14 Thread 0x400d (LWP 1703) 0x400659d8 in __rt_sigsuspend (mask=0xbcdff980, size=8) at libc/sysdeps/linux/common/sigsuspend.c:19
13 Thread 0x3c0c (LWP 1702) 0x400659d8 in __rt_sigsuspend (mask=0xbcfff918, size=8) at libc/sysdeps/linux/common/sigsuspend.c:19
* 12 Thread 0x380b (LWP 1701) 0x40061448 in __syscall_kill (pid=1701, sig=19) at libc/sysdeps/linux/common/kill.c:16
11 Thread 0x4c0a (LWP 1706) 0x400659d8 in __rt_sigsuspend (mask=0xbd3ff910, size=8) at libc/sysdeps/linux/common/sigsuspend.c:19
10 Thread 0x3009 (LWP 1698) 0x40062598 in __libc_nanosleep (req=0xbd5ff658, rem=0x0) at libc/sysdeps/linux/common/nanosleep.c:18
9 Thread 0x2c08 (LWP 1697) 0x400659d8 in __rt_sigsuspend (mask=0xbd7ff100, size=8) at libc/sysdeps/linux/common/sigsuspend.c:19
8 Thread 0x2807 (LWP 1696) 0x400659d8 in __rt_sigsuspend (mask=0xbd9ff0f8, size=8) at libc/sysdeps/linux/common/sigsuspend.c:19
7 Thread 0x2406 (LWP 1695) 0x4006365c in __libc_read (fd=19, buf=0xbdbff218, count=1) at libc/sysdeps/linux/common/read.c:15
6 Thread 0x1405 (LWP 1683) 0x0005553c in isInlinable ()
5 Thread 0x1004 (LWP 1682) 0x400659d8 in __rt_sigsuspend (mask=0xbdfff8f8, size=8) at libc/sysdeps/linux/common/sigsuspend.c:19
4 Thread 0x803 (LWP 1675) 0x400659d8 in __rt_sigsuspend (mask=0xbe1ff6b0, size=8) at libc/sysdeps/linux/common/sigsuspend.c:19
3 Thread 0x402 (LWP 1674) 0x400659d8 in __rt_sigsuspend (mask=0xbe3ff8b0, size=8) at libc/sysdeps/linux/common/sigsuspend.c:19
2 Thread 0x801 (LWP 1673) 0x40062be4 in __libc_poll (fds=0x2c9204, nfds=1, timeout=2000) at libc/sysdeps/linux/common/poll.c:29
1 Thread 0x400 (LWP 1672) 0x400c4b04 in __libc_accept (call=26, addr=0xbebe09d4, addrlen=0xbebe09f0) at libc/inet/socketcalls.c:41

Attached the involved files/extract from gdb.

fcorades
Offline
Joined: 2010-08-27
Points: 0

I've tried to substitute the SIGUSR1 signal with SIGUSR2 in the following places:

/src/linux/javavm/runtime/io_md.c
/src/linux/javavm/runtime/sync_md.c

But without success. A thread in the cvm still go in T state and cvm stops working. In the same way indicated in the previous debug trace.
The issue seems to start during a network socket io read.

I've also tried to set #undef CVMJIT_TRAP_BASED_NULL_CHECKS in the source (line 40):
/src/linux-arm/javavm/include/jit/jit_arch.h

Still without success. Cvm thread still go in T and stop working.