Skip to main content

Building CDC for linux-x86 with FPU fails

4 replies [Last post]
yyoss
Offline
Joined: 2009-03-18

Hi

I'm trying to build the CDC for the linux-x86 platform with CVM_JIT_USE_FP_HARDWARE=true, but I get an error.

Has anybody else encountered this problem?

The build information is as follows,
===================================================================
Build version:pMEA b111
OS:Ubuntu-server-8.10
gcc:gcc-4.3.2
jdk:j2sdk-1.4.2-19
Build options:JDK_HOME=/usr/java/jdk1.4.2 J2ME_CLASSLIB=foundation CVM_JIT=true CVM_JIT_USE_FP_HARDWARE=true

=========================================================
touch /home/yoshida/WORK/phoneme/mr2-b111/cdc/build/linux-x86-generic/./generated/empty.mk
MAKEFLAGS = CVM_JIT_USE_FP_HARDWARE=true CVM_JIT=true J2ME_CLASSLIB=foundation JDK_HOME=/usr/java/jdk1.4.2
CVM_HOST = i686-Ubuntu-linux
CVM_TARGET = linux-x86-generic
SHELL = sh -e
HOST_CC = /usr/bin/cc
HOST_CCC = /usr/bin/g++
ZIP = /usr/bin/zip
FLEX = /usr/bin/flex
BISON = /usr/bin/bison
CVM_JAVA = /usr/java/jdk1.4.2/bin/java
CVM_JAVAC = /usr/java/jdk1.4.2/bin/javac
CVM_JAVAH = /usr/java/jdk1.4.2/bin/javah
CVM_JAR = /usr/java/jdk1.4.2/bin/jar
TARGET_CC = /usr/bin/cc
TARGET_CCC = /usr/bin/g++
TARGET_AS = /usr/bin/cc
TARGET_LD = /usr/bin/cc
TARGET_AR = /usr/bin/ar
TARGET_RANLIB = /usr/bin/ranlib
LINKFLAGS = -g -Wl,-export-dynamic
LINKLIBS = -lpthread -ldl -lm
ASM_FLAGS = -c -fno-common -march=i686 -traditional
CCCFLAGS = -fno-rtti
CCFLAGS_SPEED = -O4 -c -fno-common -Wall -W -Wno-unused-parameter -Wno-sign-compare -fno-strict-aliasing -fwrapv -march=i686
CCFLAGS_SPACE = -O2 -c -fno-common -Wall -W -Wno-unused-parameter -Wno-sign-compare -fno-strict-aliasing -fwrapv -march=i686
CCFLAGS_LOOP = -O4 -c -fno-common -Wall -W -Wno-unused-parameter -Wno-sign-compare -fno-strict-aliasing -fwrapv -march=i686 -fno-inline
CCFLAGS_FDLIB = -O4 -c -fno-common -Wall -W -Wno-unused-parameter -Wno-sign-compare -fno-strict-aliasing -fwrapv -march=i686 -ffloat-store
JAVAC_OPTIONS = -g:none -J-Xms32m -J-Xmx128m -encoding iso8859-1 -source 1.4 -target 1.4
CVM_DEFINES = -DCVM_OPTIMIZED -DCVM_DEBUG_STACKTRACES -DNDEBUG -DCVM_CLASSLOADING -DCVM_SERIALIZATION -DCVM_REFLECT -DCVM_DYNAMIC_LINKING -DCVM_JIT -DCVM_JIT_USE_FP_HARDWARE -DCVM_SPLIT_VERIFY -DCVM_TIMESTAMPING -DJ2ME_CLASSLIB=foundation -DTARGET_CPU_FAMILY=x86 -D_GNU_SOURCE
TARGET_CC version = 4.3.2 i486-linux-gnu
HOST_CC version = 4.3.2 i486-linux-gnu
CVM_JAVA version = java version 1.4.2_19
TOOLS_DIR = /home/yoshida/WORK/phoneme/mr2-b111/tools
updating /home/yoshida/WORK/phoneme/mr2-b111/cdc/build/linux-x86-generic/./generated/build_defs.mk ...
Checking for build-time classes to compile ...
make[1]: Entering directory `/home/yoshida/WORK/phoneme/mr2-b111/cdc/build/linux-x86-generic'
make[1]: Leaving directory `/home/yoshida/WORK/phoneme/mr2-b111/cdc/build/linux-x86-generic'
make[1]: Entering directory `/home/yoshida/WORK/phoneme/mr2-b111/cdc/build/linux-x86-generic'
make[1]: Leaving directory `/home/yoshida/WORK/phoneme/mr2-b111/cdc/build/linux-x86-generic'
Checking for phoneME Advanced classes to compile ...
make[1]: Entering directory `/home/yoshida/WORK/phoneme/mr2-b111/cdc/build/linux-x86-generic'
make[1]: Leaving directory `/home/yoshida/WORK/phoneme/mr2-b111/cdc/build/linux-x86-generic'
Checking for test classes to compile ...
Checking for demo classes to compile ...
cc /home/yoshida/WORK/phoneme/mr2-b111/cdc/build/linux-x86-generic/./generated/javavm/runtime/jit/jitcodegen.c
../../src/x86/javavm/runtime/jit/jitgrammardefs.jcs: In function 'computeTargetRegs':
../../src/x86/javavm/runtime/jit/jitgrammardefs.jcs:268: error: 'cnt' undeclared (first use in this function)
../../src/x86/javavm/runtime/jit/jitgrammardefs.jcs:268: error: (Each undeclared identifier is reported only once
../../src/x86/javavm/runtime/jit/jitgrammardefs.jcs:268: error: for each function it appears in.)
make: *** [obj/jitcodegen.o] Error

Regards,
Yoshida

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
cjplummer
Offline
Joined: 2006-10-16

The linux/x86 JIT FPU support has never really worked. You're pretty much on your own if you want to try to get it working.

The linux/x86 JIT (w/o FPU support) has never been tested. Sun engineers do occasionally build with it, and it seems fairly stable, but there is at least one known serious bug. There is a bug internal to Sun filed for this. I'm not sure if these filed bugs are exported, but if you have access to a pmea bug database, look for: "x86 JIT patched based gc-rendezvous points can lead to crash". I can post some more detail of the problem if you are interested in fixing it.

yyoss
Offline
Joined: 2009-03-18

Thank you for your reply.

I want to try to fix it.
Please tell me more information.

And I tried to access the bug database, but I could not find it.
Does it mean the "Issue tracker" or the bug database of Sun?
Could you tell me the URL ?

Best regards,

cjplummer
Offline
Joined: 2006-10-16

Sorry, but I don't know how or if internally filed bugs are exposed outside of Sun.

Here's is the description of the crash I was referring to. You'll probably need to write a multi-threaded stress test that causes GC's in order to reproduce the problem. I wouldn't trust any fix you do until first you can prove you have a reliable way of reproducing the bug.

[pre]
x86 JIT patched based gc-rendezvous points can lead to crash

I ran into the following after running a stress test on a single core
linux/x86 host for a few hours:

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 1376168880 (LWP 11488)]
0x4078ce80 in ?? ()
(gdb) x /4i $pc
0x4078ce80: movl $0xe84d8be4,0x758b9090(%eax)
0x4078ce8a: cmp %ecx,%esi
0x4078ce8c: jge 0x4078cf34
0x4078ce92: mov 0xffffffd0(%ebp),%ebx
(gdb) x /8i $pc-4
0x4078ce7c: call 0x81ee434
0x4078ce81: nop
0x4078ce82: nop
0x4078ce83: nop
0x4078ce84: mov 0xffffffe4(%ebp),%esi
0x4078ce87: mov 0xffffffe8(%ebp),%ecx
0x4078ce8a: cmp %ecx,%esi
0x4078ce8c: jge 0x4078cf34

GC rendezvous points in the x86 JIT are handled with 8 NOPs. The patched in
CVMCCMruntimeGCRendezvousGlue call takes 5 bytes. I think 8 bytes are
used so the patch can be done by writing two words. I'm pretty sure the crash
is because we were about to execute the nop at 0x4078ce80 when the patching
happened. After patching, execution resumed at 0x4078ce80, which is now an
illegal instruction because it is in the middle of the call instruction.

The generated x86 code for a gc rendezvous point looks like this. Note the trace
starts with the instructions at the end of the previous block.

0x4078b0d3 67: cmpl %esi, $0
0x4078b0d6 70: jge $0x4078b090 @ branch to block L1
0x4078b0dc 76: negl %esi
0x4078b0de 78: movl -24(%ebp), %esi @ Java local cell # 0
0x4078b0e1 81: movl -4(%ebp), $45 @ Java local cell # 5
0x4078b0e1 81: movl -4(%ebp), $45 @ Java local cell # 5
88: L1:
:::::Fixed instruction at 70 to reference 88
@ Initial Temp REF set is
L1: 88: @ Patchable GC Check point
0x4078b0e8 88: call $0x81ee434 @ call CVMCCMruntimeGCRendezvousGlue
0x4078b0ed 93: nop @ padding nops
0x4078b0ee 94: nop
0x4078b0ef 95: nop
0x4078b0e8 88: nop
0x4078b0e9 89: nop
0x4078b0ea 90: nop
0x4078b0eb 91: nop
0x4078b0ec 92: nop
0x4078b0ed 93: nop
0x4078b0ee 94: nop
0x4078b0ef 95: nop
@ Captured a stackmap here.
L1: 96: @ entry point for branches
@ MAP_PC idepth=0 javaPc=18 compiledPc=96
0x4078b0f0 96: movl %esi, -24(%ebp) @ Java local cell # 0

And a branch backwarks branch to the target is as follows:

@ Patchable backwards branch:
0x4078b1f2 354: jmp $0x4078b0e8 @ branch to block L1 branch to GC rendezvous instruction
0x4078b1f7 359: nop @ padding nops
0x4078b1f8 360: nop
0x4078b1f9 361: nop
0x4078b1f2 354: jmp $0x4078b0f0 @ overwriting with branch to loop start
0x4078b1f7 359: nop @ padding nops
0x4078b1f8 360: nop
0x4078b1f9 361: nop

So normally the backwards branch is to the point just after the nops,
avoiding execution of the nops. When a GC is needed, first the call to
CVMCCMruntimeGCRendezvousGlue is patched in, and then the backwards
branch is patched to branch to the rendezvous point where the call to
CVMCCMruntimeGCRendezvousGlue is. This all seems fine. However, note
that the code that falls through to the block, and the conditional branch to the
block, both always end up at the first nop, leading to the possibility of the
patching happening while executing in the middle of nops.

On PowerPC we handle things a bit differently. Normally it only takes one
4 byte instruction to handle the call to CVMCCMruntimeGCRendezvousGlue,
so we just patch this over whatever instruction is normally at the start of the
block. However, if you build the powerpc port with
CVM_JIT_COPY_CCMCODE_TO_CODECACHE=false, then
CVMCCMruntimeGCRendezvousGlue can't be reached with just one instruction,
so patching is not so simple. In this case we need to branch around the code
that calls CVMCCMruntimeGCRendezvousGlue. Here's an example:

0x306380f8 128: bge- PC=(128) @ branch to block L1
0x306380fc 132: neg r22, r23
0x30638100 136: stw r22, -24(rJFP) @ Java local cell # 0
0x30638104 140: li r23, 45 @ const 45
0x30638108 144: stw r23, -4(rJFP) @ Java local cell # 5
@ Initial Temp REF set is
L1: 148: @ Patchable GC Check point
0x3063810c 148: lis r0, hi16(0x102141f0)
0x30638110 152: ori r0, r0, lo16(0x102141f0)
0x30638114 156: mtlr r0
0x30638118 160: blrl @ CVMCCMruntimeGCRendezvousGlue
0x30638118 160: nop
@ Captured a stackmap here.
L1: 164: @ entry point for branches
:::::Fixed instruction at 128 to reference 164
@ MAP_PC idepth=0 javaPc=18 compiledPc=164
0x3063811c 164: lwz r25, -24(rJFP) @ Java local cell # 0

and then the backwards branch:

@ Patchable backwards branch:
0x30638224 428: b PC=(148) @ branch to block L1
0x30638224 428: b PC=(164) @ branch to block L1, skip GC

Now this is a bit better. Noticed that the forward branch to L1 is fixed up
to branch to 164 instead of 148, which is after all the code that calls
CVMCCMruntimeGCRendezvousGlue. However, the fallthrough code still
falls through to the code that calls CVMCCMruntimeGCRendezvousGlue,
thus it has some unneeded overhead. There's no bug here since we patch
4 byte instructions, and usually there is a nop where the call is located.
Thus, on fallthrough we just waste cycles executign the 3 instruction that
setup the call. On x86, this is not good enough since all instructions are
not 4 bytes long.

So, x86 needs to do two fixes. The first is to make the fallthough code to
a backwards branch instead branch to just after the 8 nops. The second
is to fix all forward branches to backwards branch targets so they also
branch to after the 8 nops. Note that backwards branches are already doing
this, so some of the logic is alread in place.

BTW, for fixing the fallthrough code to do a branch, we actually already
do something like this in the RISC ports when registers locals are enabled
(I disbled a them for the above powerpc code to make things easier to read).
The powerpc code with register locals enabled looks like this:

0x306380fc 132: bge- PC=(132) @ branch to block L1
0x30638100 136: lwz r25, -24(rJFP) @ Java local cell # 0
0x30638104 140: neg r14, r25
0x30638108 144: stw r14, -24(rJFP) @ Java local cell # 0
0x3063810c 148: li r17, 45 @ const 45
0x30638110 152: stw r17, -4(rJFP) @ Java local cell # 5
0x30638114 156: b PC=(156) @ fallthrough to block L1, which is backward branch target
@ Initial Temp REF set is
L1: 160: @ Patchable GC Check point
0x30638118 160: lis r0, hi16(0x10219790)
0x3063811c 164: ori r0, r0, lo16(0x10219790)
0x30638120 168: mtlr r0
0x30638124 172: blrl @ CVMCCMruntimeGCRendezvousGlue
0x30638124 172: nop
@ Captured a stackmap here.
L1: 176: @ entry point when locals need to be loaded
@ Preloading incoming local(0) reg(14)
0x30638128 176: lwz r14, -24(rJFP) @ Java local cell # 0
@ Preloading incoming local(1) reg(15)
0x3063812c 180: lwz r15, -20(rJFP) @ Java local cell # 1
@ Preloading incoming local(4) reg(16)
0x30638130 184: lwz r16, -8(rJFP) @ Java local cell # 4
@ Preloading incoming local(5) reg(17)
0x30638134 188: lwz r17, -4(rJFP) @ Java local cell # 5
L1: 192: @ entry point for branches
:::::Fixed instruction at 156 to reference 192
:::::Fixed instruction at 132 to reference 192

192 is where we normally want to begin excution of the block, since normally
the incoming locals are already in the proper registers. Thus all branches to
the block branch here, and fallthough code skips over all the code to load
incoming locals. The code to load incoming locals is needed after a GC, so
it gets executed after CVMCCMruntimeGCRendezvousGlue returns. I only
point this out because it serves as a useful example of how to make fallthough
code branch to the start of the next block. You can look for it in the RISC port.
[/pre]

yyoss
Offline
Joined: 2009-03-18

I wrote a test program and I could reproduce the bug.
I am testing the fixed code now.

And I have another question.
In the example of powerpc, the flag "register locals" is enabled but it is disabled in x86.
I was not able to find the explanation about this flag.
What is this flag?
Why is it disabled?