Skip to main content

signal 11 on Linux MIPS 32

22 replies [Last post]
dainson
Offline
Joined: 2007-01-29
Points: 0

Hi,

CVM is running many applications including tests that came with it without problems but with Volano benchmark it gets suspended with signal 11. What can be the problem?

The platform is MontaVista Linux 3.1 (kernel 2.4.32) running on AMD Au1550 (MIPS 32) LE.

We've tried MR1, MR2 dev 05, debug versions of these two and built it built it once with CVM_OPTIMIZED turned off. Each build, after several hours of running Volano benchmark (v. 2.5.0.9 www.volano.com/benchmarks.html), will eventually get suspended because of signal 11 or signal 10. Most of the time it will be signal 11, I saw signal 10 only once. In all cases CVM process will stay suspended; it will not terminate with core dump.

What can be the problem? Any recommendations?

Thank you,
Boris Dainson

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
cjplummer
Offline
Joined: 2006-10-16
Points: 0

Sorry Boris, but no progress has been made. I can't give you a schedule for this fix. As I mentioned before, it's not a high priority at the moment. Possibly as MR2 gets closer to release, it will be, but that won't be for a while. I'll be sure to update this thread if any progress is made.

thanks,

Chris

bobvandette
Offline
Joined: 2006-04-27
Points: 0

I just spent several weeks isolating at a similar problem on an ARM platform.

The problem turned out to be a race condition in the assembly CVMCCMruntimeLookupInterfaceMBGlue function in this file:

src/mips/javavm/runtime/jit/ccmglue_cpu.S

The problem is that we reload the GUESS value twice in this routine.
The GUESS value can be modified by another thread after it was
loaded the first time.

The fix for the ARM platform was to store the result of the calculation on the stack and avoid loading the GUESS value twice.

I'll post a MIPS implementation once it gets tested.

Here is the change for ARM platforms:

598c598,601
< ldr r3, [r3, GUESS] /* target ICB = &ointerfaces.itable[guess] */
---
> ldr r3, [r3, GUESS] /* target ICB = &ointerfaces.itable[guess] */
>
> /* Free up the GUESS reg (r0) for use below. Save its content first: */
> str GUESS, [sp, #OFFSET_CVMCCExecEnv_ccmStorage+0]
635a639,641
> /* Reload the GUESS reg (r0) content from earlier: */
> ldr GUESS, [sp, #OFFSET_CVMCCExecEnv_ccmStorage+0]
>
637,639c643,644
< ocb->interfacesX.itable[guess].methodTableIndicesX;
< */
< ldr GUESS, [lr, #-4] /* load guess value */
---
> * ocb->interfacesX.itable[guess].methodTableIndicesX;
> */
641d645
< mov GUESS, GUESS, asl #CONSTANT_LOG2_CVMInterfaceTable_SIZE

bobvandette
Offline
Joined: 2006-04-27
Points: 0

The fix for the bug I mentioned on MIPS platforms is to use v1 as the
register which holds the GUESS value and to load the guess only
once.

Thanks to Mark Lam and Chris Plummer for providing this fix.

------- ccmglue_cpu.S -------
438c438
< #define GUESS v0
---
> #define GUESS v1
449d448
< #undef GUESS
488d486
< #define GUESS v0
492,494d489
< lw GUESS, -4(ra) # load guess value
< sll GUESS, GUESS, CONSTANT_LOG2_CVMInterfaceTable_SIZE
< addu GUESS, GUESS, OINTERFACES

cjplummer
Offline
Joined: 2006-10-16
Points: 0

Hi Boris,

We do test phoneME Advanced with Volano. I believe the last time it was used to test the Linux/MIPS port was with CDC-HI 1.1.1_01 (also called CDC AMS 1.0), which was released about a year ago, and went on to become phoneME Advanced MR1 (with no code changes). I'm not sure which version of Volano was used for the testing. The one I've played with on occassion is 2.1.2, and the only problem I had with it is not crash related. See the following:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4957535

Maybe someone from our QA group can give more details on Volano testing and which version was last used (and when it was used).

The signal 10 you got is a SIGUSR1. These are common during normal execution, but are supposed to be handled and not cause thread suspension. Maybe Dean or Mark can provide more insight.

The signal 11 is a SIGSEGV, which is a true crash. The VM suspends the thread in this case rather than allow a core dump. This allows you to attach to the program in GDB and do some live debugging. If you choose to do this, type "continue" after attaching, which will cause the crash to happen again. This is needed to move the thread out of the thread suspension code, and back to the crash point.

If you would prefer debugging a core dump, disable the kill(pid, SIGSTOP) in src/linux-mips/javavm/runtime/segvhandler_arch.c and src/linux/javavm/runtime/sync_md.c.

BTW, are you running with or without the JIT. If it with with the JIT, I would advise trying a CVM_JIT=false build and see if you get the same results.

good luck,

Chris Plummer

cjplummer
Offline
Joined: 2006-10-16
Points: 0

I've just been told that we have not tested with Volano since CDC-HI 1.0.1, so it has been a while.

dainson
Offline
Joined: 2007-01-29
Points: 0

Hi Chris,

All previous versions we built were with JIT. Now one box runs without JIT (compiled with CVM_JIT=false) and it’s now 3 days of Volano nonstop without signals 10/11. It often takes more than a day to get signal 10/11 on my setup so it’s a bit early to say if non-JIT version does not have this problem. On MIPS, signal 10 may actually mean SIGBUS Bus error (bad memory access). So it looks like the case client side of Volano benchmark got suspended with signal 10 was a real crash too.

I ran a JIT version that has kill(pid, SIGSTOP) replaced with abort(). It finally dumped core. GDB can’t back-trace deep enough for me to see what caused signal 11 (see below). Is there any way to find it out? I started a test with a debug build with JIT. Will its core dump give more?

A little more detail about the Volano benchmark. I run ‘loop’ benchmark (version 2.5.0.9) where client and server run in different CVMs on the same machine. Both are run with 24 MB heap size (-Xmx24m). Client loop sometimes hangs in Java code (just a hang without signals 10/11 or crash) and I have to kill it (with Ctrl+c) and restart. Java part of the server side of the loop benchmark behaves better; it does not hang in this setup.

Thank you,
Boris Dainson
Xanboo Inc

(gdb) bt
#0 0x2acd0d74 in kill () at :2
#1 0x2ab0f420 in pthread_kill (thread=21897, signo=6) at signals.c:65
#2 0x2ab0f9f0 in __pthread_raise (sig=6) at signals.c:187
#3 0x2acd0ab0 in *__GI_raise (sig=6) at ../linuxthreads/sysdeps/unix/sysv/linux/raise.c:34
#4 0x2acd2acc in *__GI_abort () at ../sysdeps/generic/abort.c:88
#5 0x0055d964 in handleSegv ()
#6 0x2ab14c0c in __pthread_sighandler_rt (signo=11, si=0x6c9ff488, uc=0x6c9ff508) at sighandler.c:70
#7
warning: Warning: GDB can't find the start of the function at 0x2af1bfd4.

GDB is unable to find the start of the function at 0x2af1bfd4 and thus can't determine the size of that function's stack frame.
This means that GDB may be unable to access that stack frame, or the frames below it.
This problem is most likely caused by an invalid program counter or stack pointer.
However, if you think GDB should simply search farther back from 0x2af1bfd4 for code which looks like the beginning of a function, you can increase the range of the search using the `set heuristic-fence-post' command.
Current language: auto; currently asm

cjplummer
Offline
Joined: 2006-10-16
Points: 0

Regarding signal 10, glad to here it is SIGBUS and not SIGUSR1. That makes more sense. I was looking at a linux/x86 signals.h when I found it to be SIGUSR1. I didn't realize these standard signals were different on various platforms.

Regarding the backtrace, I think my advice wasn't that good. If you just remove or replace the the kill(pid, SIGSTOP), you still end up in the signal handler when you do the backtrace, and gdb doesn't know how to backtrace past the signal handler. This is a bit tricky to work around, because by default the JIT'd code relies on SIGSEGV for normal operation. This is because it handles NullPointerExceptions by letting the code crash, and then having the handleSegv convert it into a NullPointerException. So what you need to do is disable this mode of operation, and also disable the segv handler. I'll try to step you through that.

In src/linux-mips/javavm/include/jit/jit_arch.h, look for #define CVMJIT_TRAP_BASED_NULL_CHECKS and change it to a #undef. This will make it so generated JIT code does explicit NPE checks rather than rely on SIGSEGV and handleSegv.

Next, in src/linux-mips/javavm/runtime/segvhandler_arch.c, disable linuxSegvHandlerInit(). You can make it just return CVM_TRUE right at the start.

I think by doing the above, you should just end up with a core dump rather than in the handleSegv function. Note, if you do not use the JIT, then you also need to change linuxSyncInit() in /src/linux/javavm/runtime/sync_md.c, so it no longer handles SEGV. By default it does not when the JIT is enabled. However, it does handle SIGBUS and a few others, so maybe it is best to disable the signal handling in sync_md.c also.

good luck,

Chris

dainson
Offline
Joined: 2007-01-29
Points: 0

Hi Chris,

I got core dump from debug version of MR1 when kill(pid, SIGSTOP) was replaced with abort() in src/linux-mips/javavm/runtime/segvhandler_arch.c and src/linux/javavm/runtime/sync_md.c. GDB can back-trace it ok.

Below is a relevant snippet that points to garbage collection code. Is it good enough to see the problem?

Complete output of “bt full” is several pages long. I can put it on our ftp site together with the core itself.

Should I still change handling of NullPointerExceptions as you described?

Thanks a lot for you help
Boris

………………….
#7
No symbol table info available.
#8 0x0044f8e4 in CVMgenSemispaceForwardOrPromoteObject (thisGen=0x10138de0, ref=0x2af70380, classWord=0)
at ../../src/share/javavm/runtime/gc/generational/gen_semispace.c:541
objCb = (CVMClassBlock *) 0x0
objSize = 1438642656
copyTop = (CVMUint32 *) 0x1011f740
ret = (CVMObject *) 0x2af70380
copyIntoToSpace = 269208768
#9 0x0044ffcc in CVMgenSemispaceGrayObject (thisGen=0x10138de0, refPtr=0x2c1ae378, ref=0x2af70380)
at ../../src/share/javavm/runtime/gc/generational/gen_semispace.c:648
classWord = 0
#10 0x004501e4 in CVMgenSemispaceFilteredGrayObject (thisGen=0x10138de0, refPtr=0x2c1ae378, ref=0x2af70380)
at ../../src/share/javavm/runtime/gc/generational/gen_semispace.c:670
#11 0x004503dc in CVMgenSemispaceFilteredHandleRoot (refPtr=0x2c1ae378, data=0x10138de0)
at ../../src/share/javavm/runtime/gc/generational/gen_semispace.c:707
thisGen = (CVMGenSemispaceGeneration *) 0x10138de0
ref = (CVMObject *) 0x2af70380
#12 0x00489ba4 in scanAndSummarizeObjectsOnCard (ee=0x1042ec48, gcOpts=0x55bff4c8, objStart=0x2c1ae1e0, summEntry=0x2aaeb7c8,
regionStart=0x2c1ae200, regionEnd=0x2c1ae400, oldGenLower=0x2b070200, callback=0x450288 ,
callbackData=0x10138de0) at ../../src/share/javavm/runtime/gc/generational/gc_impl.c:288
…………..

cjplummer
Offline
Joined: 2006-10-16
Points: 0

Hi Boris,

Your back trace is good enough. Or I should say it is good enough that I can pretty much tell that we'll not be able to fix it without reproducing it on site. It's crashed in the garbage collector, which probably means having a VM engineer do a marathon debug session on it. You'll probably need to file a bug with full details on how you reproduced it, and then be patient. I can't make any promises on when it might get fixed. I'm seeing about getting Volano added back to our test suite. If we do, and the bug reproduces on one of our ARM platforms, then it should get fixed before our next major release (MR2) is complete, but that won't be for quite a while.

You should also give MR2 a try. I'm pretty sure it has some gc fixes since MR1. If the problem goes away with MR2, but you want to stick with MR1, we might even be able to track down the change that fixed it and produce a patch for MR1.

Another thing you can try is building with CVM_DEBUG_ASSERTS=true. Sometimes with a gc crash like this, it will detect a problem soon enough that we don't need to do much debugging to figure out what went wrong. However, this will slow down you run somewhat, but probabl much less than 2x.

Chris

dainson
Offline
Joined: 2007-01-29
Points: 0

Hi Chris,

I'll build CVM with assertions as you described. If it runs slower than regular builds it could take a while to see the problem.

MR2 (phoneme_advanced-mr2-dev-b05) was getting suspended with signal 11 too. I'll get a core dump of it to see if it's same problem as on MR1 or not.

I'll post here what happens with these builds.

Thanks again
Boris

cjplummer
Offline
Joined: 2006-10-16
Points: 0

>
> I'll build CVM with assertions as you described. If
> it runs slower than regular builds it could take a
> while to see the problem.
>

It will be a bit slower, but not that bad. I'm guessing maybe 10% to 20% if you are using the JIT. It shouldn't be anything like 2X slower.

Chris

dainson
Offline
Joined: 2007-01-29
Points: 0

Hi Chris,

Debug builds already have asserts:
CVM_DEBUG_ASSERTS default: $(CVM_DEBUG)

I got a full back-trace of MR2 b05 crash, see relevant part is below.

Does it have enough information for your VM engineers to look at? I can easily make whole core dumps or MR1 and MR2 available over ftp.

Thanks,
Boris

#7
No symbol table info available.
#8 0x0048f0c8 in scanAndSummarizeObjectsOnCard (ee=0x2ccd47f0, gcOpts=0x563ff4a0, objStart=0x2c25db8c, summEntry=0x2c7b1db8,
regionStart=0x2c25dc00, regionEnd=0x2c25de00, oldGenLower=0x2b070000, callback=0x4517e8 ,
callbackData=0x563ff458) at ../../src/share/javavm/runtime/gc/generational/gc_impl.c:325
currObj = (CVMObject *) 0x2c25dc9c
currCb = (CVMClassBlock *) 0x0
objSize = 32
scanStatus = 0
numRefs = 0
curr = (CVMJavaVal32 *) 0x2c25dc9c
top = (CVMJavaVal32 *) 0x2c25de00
cardBoundary = (CVMJavaVal32 *) 0x2c25dc00
youngGen = (CVMGeneration *) 0x10122ce8
youngGenStart = (CVMJavaVal32 *) 0x2ae70000
youngGenEnd = (CVMJavaVal32 *) 0x2b070000
#9 0x0049047c in callbackIfNeeded (ee=0x2ccd47f0, gcOpts=0x563ff4a0, card=0x2c779f6e "\001", lowerLimit=0x2c25dc00, higherLimit=0x2c25de00,
genLower=0x2b070000, genHigher=0x2c268834, callback=0x4517e8 , callbackData=0x563ff458)
at ../../src/share/javavm/runtime/gc/generational/gc_impl.c:562
objStart = (CVMJavaVal32 *) 0x2c25db8c
summEntry = (CVMGenSummaryTableEntry *) 0x2c7b1db8
#10 0x00490ca4 in CVMgenBarrierPointersTraverse (gen=0x10122e68, ee=0x2ccd47f0, gcOpts=0x563ff4a0,
callback=0x4517e8 , callbackData=0x563ff458)
---Type to continue, or q to quit---
at ../../src/share/javavm/runtime/gc/generational/gc_impl.c:748
hptr = (CVMJavaVal32 *) 0x2c25dc00
cptr = (CVMUint8 *) 0x2c779f6e "\001"
cptr_end = (CVMUint8 *) 0x2c779f70 ""
word = 65536
start = (CVMUint32 *) 0x2c779f58
lowerCardLimit = (CVMUint8 *) 0x2c771004 ""
higherCardLimit = (CVMUint8 *) 0x2c779fc4 "\001"
cardPtrWord = (CVMUint32 *) 0x2c779f6c
heapPtr = (CVMJavaVal32 *) 0x2c25d800
remainder = 0
genLower = (CVMJavaVal32 *) 0x2b070000
genHigher = (CVMJavaVal32 *) 0x2c268834
#11 0x00455dd0 in CVMgenMarkCompactScanOlderToYoungerPointers (gen=0x10122e68, ee=0x2ccd47f0, gcOpts=0x563ff4a0,
callback=0x4517e8 , callbackData=0x563ff458)
at ../../src/share/javavm/runtime/gc/generational/gen_markcompact.c:488
No locals.
#12 0x00492c34 in CVMgenScanAllRoots (thisGen=0x10122ce8, ee=0x2ccd47f0, gcOpts=0x563ff4a0,
callback=0x4517e8 , data=0x563ff458) at ../../src/share/javavm/runtime/gc/generational/gc_impl.c:1420
oldGen = (CVMGeneration *) 0x10122e68

cjplummer
Offline
Joined: 2006-10-16
Points: 0

> Hi Chris,
>
> Debug builds already have asserts:
> CVM_DEBUG_ASSERTS default: $(CVM_DEBUG)
>
> I got a full back-trace of MR2 b05 crash, see
> relevant part is below.
>
> Does it have enough information for your VM engineers
> to look at? I can easily make whole core dumps or MR1
> and MR2 available over ftp.
>
>
> Thanks,
> Boris
>
>

Hi Boris,

All I can say for sure about the crash is what is provided by the following:

currObj = (CVMObject *) 0x2c25dc9c
currCb = (CVMClassBlock *) 0x0

If you display the first word of currObj, there is a 0 there, which is why currCb is 0. Most likely the GC has improperly scanned for live objects at some point, probably due to a stackmap bug in the JIT, thus currObj points to an invalid or corrupted object.

These are among the hardest of bugs to track down and fix. It's not something that can be done interactively by stepping you through a debug session. So I think at this point your best recourse is to file a bug with details on how to reproduce it.

regards,

Chris Plummer

dainson
Offline
Joined: 2007-01-29
Points: 0

Thank you Chris. I filed the bug: https://phoneme.dev.java.net/issues/show_bug.cgi?id=8

Regards,
Boris

cjplummer
Offline
Joined: 2006-10-16
Points: 0

Thanks,

Do you know if Volano 2.5.0.9 works without a GUI? It used to, but when I tried to run it with a Foundation Profile build, I get the following:

java.lang.NoClassDefFoundError: java.beans.PropertyChangeSupport
at org.apache.catalina.logger.LoggerBase.()V(LoggerBase.java:110)
at org.apache.catalina.logger.FileLogger.()V(FileLogger.java:90)
at COM.volano.b.(Ljava/io/File;)V(DashoA5383)
at COM.volano.b.a([Ljava/lang/String;)V(DashoA5383)
at COM.volano.Main.main([Ljava/lang/String;)V(DashoA5383)
at java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Method.java:321)
at sun.misc.CVM.runMain()V(CVM.java:514)

If it requires a PP or PBP build, then this will greatly limit my ability to reproduce and debug the problem, especially on mips.

thanks,

Chris

dainson
Offline
Joined: 2007-01-29
Points: 0

Chris,

Sorry for the delay in reply. Right, Volano is using something from java.beans.* and also from java.sql.* (probably java.sql.TimeStamp). No GUI is necessary though. So I simply added java.beans and java.sql packages from JDK (forgot which one I used, 1.3 or 1.4). I'm not suggesting to distribute CVM with these additions but adding jar-files with these packages into lib directory of Volano allows to run tests. Will it save you time if I attach these jars to the bug report I submitted?

Thanks,
Boris

cjplummer
Offline
Joined: 2006-10-16
Points: 0

Hi Boris,

I tried the rt.jar from JavaSE 1.4.2, which seems to have fixed the beans reference, but now I have a new problem. In startup.sh, I set the cvm options as follows:

options="Xmx24m -Xss96k -Xbootclasspath/a:../lib/rt.jar"

And I get the following failure when I run netserver.sh:

CVMjniExceptionDescribe failed: couldn't print stack trace.
Using brute force method to print stack trace.
java.lang.NoSuchMethodError: java.lang.String: method getBytes(II[BI)V not found
at org.apache.catalina.util.RequestUtil.URLDecode(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;(RequestUtil.java:376)
at org.apache.catalina.util.RequestUtil.URLDecode(Ljava/lang/String;)Ljava/lang/String;(RequestUtil.java:356)
at org.apache.catalina.core.StandardContext.setPath(Ljava/lang/String;)V(StandardContext.java:895)
at org.apache.catalina.startup.Embedded.createContext(Ljava/lang/String;Ljava/lang/String;)Lorg/apache/catalina/Context;(Embedded.java:557)
at COM.volano.ac.(LCOM/volano/g;)V(DashoA5383)
at COM.volano.b.(Ljava/io/File;)V(DashoA5383)
at COM.volano.b.a([Ljava/lang/String;)V(DashoA5383)
at COM.volano.Main.main([Ljava/lang/String;)V(DashoA5383)
at java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Method.java:316)
at sun.misc.CVM.runMain()V(CVM.java:478)

The String.getBytes() method being called is deprecated in JavaSE, and the CDC spec does not include any deprecated methods, thus the NoSuchMethodError. String.getBytes() is commented out in the CDC source. I could just uncomment it, but want to make sure this is what you also did, and there aren't any other special mods I need to know about to get volano running with CDC.

regards,

Chris

dainson
Offline
Joined: 2007-01-29
Points: 0

Chris,

Right, CVM will not execute deprecated methods and I patched the packages a little to fix that. I should have said that from the beginning. Please pick them up from here: https://phoneme.dev.java.net/issues/show_bug.cgi?id=8 (attachments https://phoneme.dev.java.net/nonav/issues/showattachment.cgi/1/java_sql_... and
https://phoneme.dev.java.net/nonav/issues/showattachment.cgi/2/java_bean...)

You should not need rt.jar in bootclasspath when using these jars.

Thank you,
Boris

dainson
Offline
Joined: 2007-01-29
Points: 0

Hi Chris,

I wanted to ask if there is any progress with the signal 11 problem. Do you know of if the latest build has a fix for that?

Thank you,
Boris

Hinkmond Wong

Hi Boris,

Chris is out of the office today, but may be checking e-mail so might
still post a reply. Otherwise, you should see a reply by Mon.

Hinkmond

phonemeadvanced@mobileandembedded.org wrote:
> Hi Chris,
>
> I wanted to ask if there is any progress with the signal 11 problem. Do you know of if the latest build has a fix for that?
>
> Thank you,
> Boris
> [Message sent by forum member 'dainson' (dainson)]
>
> http://forums.java.net/jive/thread.jspa?messageID=211424
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: advanced-unsubscribe@phoneme.dev.java.net
> For additional commands, e-mail: advanced-help@phoneme.dev.java.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: advanced-unsubscribe@phoneme.dev.java.net
For additional commands, e-mail: advanced-help@phoneme.dev.java.net

Hinkmond Wong

Hinkmond Wong wrote:
> Hi Boris,
>
> Chris is out of the office today, but may be checking e-mail so might
> still post a reply. Otherwise, you should see a reply by Mon.

Sorry, Boris. I've been corrected. You should see a reply from Chris
by Mon. 16Apr.

Hinkmond

> phonemeadvanced@mobileandembedded.org wrote:
>> Hi Chris,
>>
>> I wanted to ask if there is any progress with the signal 11 problem.
>> Do you know of if the latest build has a fix for that?
>> Thank you,
>> Boris
>> [Message sent by forum member 'dainson' (dainson)]
>>
>> http://forums.java.net/jive/thread.jspa?messageID=211424
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: advanced-unsubscribe@phoneme.dev.java.net
>> For additional commands, e-mail: advanced-help@phoneme.dev.java.net
>>
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: advanced-unsubscribe@phoneme.dev.java.net
For additional commands, e-mail: advanced-help@phoneme.dev.java.net

dainson
Offline
Joined: 2007-01-29
Points: 0

Thank you Hinkmond.

Boris