Skip to main content

Identifying crash with hs_err_pid*.log and gdb

2 replies [Last post]
noy_aristotle
Offline
Joined: 2005-01-30
Points: 0

Our application keeps on crashing on HP-UX machine.
We only have the hs_err_pid*.log, the library(*.so) file that caused the crash, and the source file of the library. There was no core file.

Here's the fatal error summary listed in hs_err_pid*.log file.

# A fatal error has been detected by the Java Runtime Environment:
#
# SIGBUS (10) at pc=c00000000f675671, pid=26914, tid=44
#
# JRE version: 6.0
# Java VM: Java HotSpot(TM) 64-Bit Server VM (17.1-b03-jre1.6.0.09-rc1 mixed mode hp-ux-ia64 )
# Problematic frame:
# C [libtracejni.so+0x5060a2d1] _NZ10TFM07PrintTraceEPi+0x5065eff9

I have read in the "Troubleshooting and Diagnostic Guide" that it is possible to
inspect the instructions that caused the crash without a debugger or core file by
using a disassembler to dump instructions near the offset, in our case [0x5065eff9].

C [libtracejni.so+0x5060a2d1] _NZ10TFM07PrintTraceEPi+0x5065eff9

So I tried disassembling the library and dump the instructions using
GDB, but I am at lost in determining the instruction/offset that caused the crash.

Here's what I did.

$ gdb libtracejni.so

(gdb) info line _NZ10TFM07PrintTraceEPi
Line 111 of "tfmtrace.cpp"
starts at address 0x4000000000099590:0 <TFMTrace::PrintTrace(char*,unsigned int,int*)>
and ends at 0x40000000000995c0:2 <TFMTrace::PrintTrace(char*,unsigned int,int*)+0x32>.

(gdb) disas 0x4000000000099590 0x40000000000995c0
Dump of assembler code from 0x4000000000099590:0 to 0x40000000000995c0:0:
;;; DOC Line Information: [Line, Column Start, Column End] [Line, Column] [Line]
;;; File: tfmtrace.cpp
0x4000000000099590:0 <TFMTrace::PrintTrace(...)>: alloc r37=ar.pfs,0,30,4,0 MMI
0x4000000000099590:1 <TFMTrace::PrintTrace(...)+0x1>: mov r46=r35
0x4000000000099590:2 <TFMTrace::PrintTrace(...)+0x2>: mov r39=pr
0x40000000000995a0:0 <TFMTrace::PrintTrace(...)+0x10>: adds sp=0xffffffffffffefa0,sp MMI,
0x40000000000995a0:1 <TFMTrace::PrintTrace(...)+0x11>: adds r45=0x498,r32
0x40000000000995a0:2 <TFMTrace::PrintTrace(...)+0x12>: mov ret3=-1;;
0x40000000000995b0:0 <TFMTrace::PrintTrace(...)+0x20>: ld8.a r62=[r45] MMI
0x40000000000995b0:1 <TFMTrace::PrintTrace(...)+0x21>: cmp.ne.unc p6=r0,r46
0x40000000000995b0:2 <TFMTrace::PrintTrace(...)+0x22>: mov r38=rp
End of assembler dump.

Can I use the output of "disas" to find the nearest offset of 0x5065eff9?
It seems like the addresses (0x4000000000099590~0x40000000000995b0) are different
from that of 0x5065eff9.

Looking forward for any inputs.

Thanks and best regards.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
sassan
Offline
Joined: 2011-08-03
Points: 0

Hello: Please do a 'disass _NZ10TFM07PrintTraceEPi' at the gdb prompt to get the disassembly output of the entire '_NZ10TFM07PrintTraceEPi' routine. Please scroll to the address obtained by adding the start address of '_NZ10TFM07PrintTraceEPi' to 0x5065eff9 to get the instruction you are interested in. For further debugging, please invoke the corefile under the debugger. For java corefile debugging with WDB, please go through the section "14.25.1.3. Java corefile debugging support" in the Debugging with GDB Manual. (obtainable from http://h21007.www2.hp.com/portal/download/files/unprot/wdb/wdb_6.2/GDB.pdf or from the 'Documentation' link off http://www.hp.com/go/wdb).

For HP-UX WDB debugger queries, feel free to send your queries to <wdb-help@cup.hp.com> (and more broadly, you may want to post question on HP-UX S/W development tools portal to <hpux-developmen-public> as well)

noy_aristotle
Offline
Joined: 2005-01-30
Points: 0

Hello. Thank you for your reply. As instructed, I was able to get the
instruction by adding the start address of '_NZ10TFM07PrintTraceEPi'
to 0x5065eff9. Also, I was furnished a copy of the core dump file. The
GDB Manual was of great help in comming up with the following result.

By using "backtrace" command, I was able to identify the source of the
SIGBUS(BUS_ADRALN) signal which is at frame 8.

(gdb) bt
#0 0xc0000000001e5350:0 in _lwp_kill+0x30 ()
from /usr/lib/hpux64/libpthread.so.1
#1 0xc00000000014c7b0:0 in pthread_kill+0x9d0 ()
from /usr/lib/hpux64/libpthread.so.1
#2 0xc0000000002e4080:0 in raise+0xe0 () from /usr/lib/hpux64/libc.so.1
#3 0xc0000000003f47f0:0 in abort+0x170 () from /usr/lib/hpux64/libc.so.1
#4 0xc00000000e65e0d0:0 in os::abort ()
at /CLO/Components/JAVA_HOTSPOT/Src/src/os/hp-ux/vm/os_hp-ux.cpp:2033
#5 0xc00000000eb473e0:0 in VMError::report_and_die ()
at /CLO/Components/JAVA_HOTSPOT/Src/src/share/vm/utilities/vmError.cpp:1008
#6 0xc00000000e66fc90:0 in os::Hpux::JVM_handle_hpux_signal ()
at /CLO/Components/JAVA_HOTSPOT/Src/src/os_cpu/hp-ux_ia64/vm/os_hp-ux_ia64.cpp:1051
#7 <signal handler called>
#8 0xc00000000c0c5670:1 in TFMTrace::PrintTrace () at tfmtrace.cpp:1065
#9 0xc00000000c0c4f70:0 in FMLogger::WriteLog () at fmlogger.cpp:90
...

-------------------------------------------------------------------------------
By viewing frame 8, it seems that the abort happened in the IF-Statement
in tfmtrace.cpp line# 1065. But what I don't understand is that why would
a BUS_ADRALN signal will be generated in the IF-Statement? Both dwTrcLen
and dwMaxTrcLen are local variables.

(gdb) frame 8
#8 0xc00000000c0c5670:1 in TFMTrace::PrintTrace () at tfmtrace.cpp:1065
1065 if( dwTrcLen <= dwMaxTrcLen ) {
Current language: auto; currently c++

-------------------------------------------------------------------------------
I tried to disassemble frame 8 to know what instructions are being performed.
Unfortunately, I was not able to understand the assembly codes and have no
idea on the cause of the abort. =(

(gdb) disas $pc-16*4 $pc+16*4
...
0xc00000000c0c5660:0 <TFMTrace::PrintTrace(...)+0xd0> : ld4.a r27=[r48] MII,
;;; 1065 if( dwTrcLen <= dwMaxTrcLen ) {
0xc00000000c0c5660:1 <TFMTrace::PrintTrace(...)+0xd1> : adds r28=28,r42
0xc00000000c0c5660:2 <TFMTrace::PrintTrace(...)+0xd2> : cmp.ne.unc p6=r0,r46;;
0xc00000000c0c5670:0 <TFMTrace::PrintTrace(...)+0xe0> : ld4.sa r26=[r28] MMI,
0xc00000000c0c5670:1 <TFMTrace::PrintTrace(...)+0xe1> : (p6) ld4 r31=[r28]
0xc00000000c0c5670:2 <TFMTrace::PrintTrace(...)+0xe2> : adds r46=24,r42;;
0xc00000000c0c5680:0 <TFMTrace::PrintTrace(...)+0xf0> : (p6) st4 [r35]=r31 MI,I
0xc00000000c0c5680:1 <TFMTrace::PrintTrace(...)+0xf1> : adds r59=36,r42;;
0xc00000000c0c5680:2 <TFMTrace::PrintTrace(...)+0xf2> : nop.i 0x0
0xc00000000c0c5690:0 <TFMTrace::PrintTrace(...)+0x100>: ld4.c.clr r27=[r48] MIB,
;;; 1067 dwLen = dwTrcLen ;
0xc00000000c0c5690:1 <TFMTrace::PrintTrace(...)+0x101>: cmp4.eq.unc p6,p8=99,r27
0xc00000000c0c5690:2 <TFMTrace::PrintTrace(...)+0x102>: nop.b 0x0;;
0xc00000000c0c56a0:0 <TFMTrace::PrintTrace(...)+0x110>: (p8) ld4.c.clr r26=[r28] MMI
;;; 1068 }
0xc00000000c0c56a0:1 <TFMTrace::PrintTrace(...)+0x111>: (p6) st4 [r48]=r47
0xc00000000c0c56a0:2 <TFMTrace::PrintTrace(...)+0x112>: cmp4.geu.unc p7=r26,r27
End of assembler dump.
(gdb)

Here are my follow-up questions:
1) Is there any other thing (e.g. other gdb command) I could check so as to
determine the cause of BUS_ADRALN?

2) Is it possible that a BUS_ADRALN happened during the execution of the
IF-Statement above?

3) Do you have any idea what does the disassemble instruction set mean?

Thanks and looking forward for any inputs.