Wednesday, March 3, 2010

Analyzing Core dump file

Problem Description

An application gets a binary core file produced when the WebLogic Server process terminates due to some invalid native core (machine specific code). A server crash, JVM crash, machine crash, or HotSpot error may also be associated with this occurrence. This pattern will describe what steps are needed to gather information from a core file on various platforms.


Problem Troubleshooting

Please note that not all of the following items would need to be done. Some issues can be solved by only following a few of the items.


Quick Links


Why does the problem occur?

In order to determine the cause of such an error you need to determine all potential sources of native code used by the WebLogic Server process. The places to focus on are:

  1. The WebLogic Server performance pack. The WebLogic Server performance pack is native code and when enabled could potentially produce such an error. Disable this feature to determine if that is the cause. You can do this via the console or via the command line. Using the console look under the Server tab by setting NativeIOEnabled to false. See the section Enabling Performance Packs to get the exact sequence of steps under the Server tab in the console. For WLS 8.1 this would be under: http://e-docs.bea.com/wls/docs81/perform/WLSTuning.html#1142800 The steps are:
    1. Start the Administration Server if it is not already running.
    2. Access the Administration Console for the domain.
    3. Expand the Servers node in the left pane to display the servers configured in your domain.
    4. Click the name of the server instance that you want to configure.
    5. Select the Configuration —> Tuning tab.
    6. If the Enable Native IO check box is selected, please deselect it in the check box so it is now *not* enabled.
    7. Click Apply.
    8. Restart the server.

    You can also do this via the java options to the start command for WebLogic Server. Set -Dweblogic.NativeIOEnabled=false on the command line and then start the server. The command line will take precedence over what is sent via the console.


  2. Any Type 2 JDBC driver makes use of native DBMS libraries, which could also produce this type of error. Switch to a pure java (Type 4) JDBC driver in order to determine if that is the cause.
  3. Any native libraries accessed with JNI calls can also cause this type of error. If the application uses such libraries, they should be carefully examined. It may be difficult to rule out these libraries, as their functionality may not be easily removed from the application. Extensive logging may be needed to determine if a pattern of use can be correlated with the core dump / Dr Watson error.
  4. The JVM itself is a native program and can cause such errors. When in doubt, try another certified JVM and/or later release to determine if a JVM bug is at fault. Many JVM bugs involve the use of the JIT compiler and disabling this feature will often resolve this type of problem. Usually this can be done by supplying the -Djava.compiler=none
    command option.
  5. Sometimes the JVM will produce a small log file that may contain useful information as to which library the core may have come from but this is not true all of the time. The file is produced in the directory where WebLogic Server was started and it is of the form hs_err_pid.log, where is the process ID of the WebLogic Server process.

If after doing these things you cannot determine the cause of the error, then you can examine the core file that is produced in the directory where WebLogic Server was started. You must obtain the exact stacktrace from the binary core file to pinpoint the reason for the core. To do this, you can run a debugger, such as dbx or gdb, as outlined in this Diagnostic Pattern depending on your operating system.


Top of Page


Gathering Core information from: SOLARIS

  1. Do file /core to verify if the core file is from the Java VM.
  2. Get a stack trace using dbx or gdb as follows. With gdb you may get more useful information. Sun Support recommends the use of dbx(1) for core file analysis. If you do not have the dbx licensed product, a 30 day trial version is available as a download from Sun where dbx is bundled at http://wwws.sun.com/software/sundev/buy.html.

If a core file is *not* produced this may be because of file permissions problems or actual limits on the core file itself. The core dump file's size may be affected by the following factors:

  • Check ulimit -a to see whether your environment allows core files to be produced.
  • ulimit -c (This is the size limit of the core file. Fix it with ulimit -c unlimited).
  • Kernel limitation (hard limit for ulimit -c).
  • Available disk space for the user (e.g., is there disk quota?).
  • See also: Operating System Values that should be checked for core file generation

If there is a core file produced, run dbx or gdb on the core file. The following shows the commands for dbx and gdb with an example of the output produced by gdb. (NOTE: DEBUG_PROG is an environment variable that allows you to specify a debugger or profiler to launch for working with java.)


dbx


$ java -version (need to use right version of jdk)
$ ls /opt/bin/dbx (need to know dbx location) or "which dbx"
$ export DEBUG_PROG=/opt/bin/dbx (or wherever "dbx" is located)

For JDK 1.3.X do the following:
$ /java corefile
For JDK 1.4.X do the following:
$ dbx /java corefile

Now you will be in the debugger. Execute the following commands:
(dbx) where ("shows a summary of the stack")
(dbx) threads ("shows the state of the existing threads")
(dbx) quit

These commands get a stacktrace of the last thread that executed (where command) and show the state of all the threads (threads command) in the core file.


gdb


$ java -version (need to use right version of jdk)
$ ls /usr/local/bin/gdb (need to know gdb location) or "which gdb"
$ export DEBUG_PROG=/usr/local/bin/gdb (or wherever "gdb" is located)

For JDK 1.3.X do the following:
$ /java corefile
For JDK 1.4.X do the following:
$ gdb /java corefile

Now you will be in the debugger. Execute the following commands:
(gdb) where ("shows a summary of the stack")
(gdb) thr ("switch among threads or show the current thread")
(gdb) info thr ("inquire about existing threads")
(gdb) thread apply 1 bt ("apply a command to a list of threads, specifically the backtrace to thread #1")
(gdb) quit

Using these commands will produce a stacktrace of the last thread that executed (wherethr command), show the state of all the threads (info thrthread apply 1 bt command). Using the last command (thread apply # bt) is a way to get the stack trace of an individual thread by replacing # with an actual thread number or you can replace 3 with "all" to get the stack trace for all the threads. command), show the current thread ( command), and provide another way to get a stack trace of thread 1 in the core file (


The following is an example of the corefile with those commands in the gdb debugger. This example core was caused by an error with user native code in the application. (See bold items.) On this stack trace, look at the last line before the signal handler is called. This will lead you to look at the displayHelloWorld function in the native library libhello.so.


$ export DEBUG_PROG=/usr/local/bin/gdb
$ java core

GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.8"...
(no debugging symbols found)...
Core was generated by `/wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/bin/../bin/sparc/native_threads'.
Program terminated with signal 9, Killed.
Reading symbols from /usr/lib/libthread.so.1...(no debugging symbols found)...
Loaded symbols for /usr/lib/libthread.so.1
Reading symbols from /usr/lib/libdl.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libdl.so.1
Reading symbols from /usr/lib/libc.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libc.so.1
Reading symbols from /usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1...
(no debugging symbols found)...done.
Loaded symbols for /usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so...(no debugging symbols found)... done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
Reading symbols from /usr/lib/libCrun.so.1...(no debugging symbols found)...
Loaded symbols for /usr/lib/libCrun.so.1
Reading symbols from /usr/lib/libsocket.so.1...(no debugging symbols found)...
Loaded symbols for /usr/lib/libsocket.so.1
Reading symbols from /usr/lib/libnsl.so.1...(no debugging symbols found)...
Loaded symbols for /usr/lib/libnsl.so.1
Reading symbols from /usr/lib/libm.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libm.so.1
Reading symbols from /usr/lib/libw.so.1...
warning: Lowest section in /usr/lib/libw.so.1 is .hash at 00000074
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libw.so.1
Reading symbols from /usr/lib/libmp.so.2...(no debugging symbols found)...
Loaded symbols for /usr/lib/libmp.so.2
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/native_threads/libhpi.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/native_threads/libhpi.so
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libverify.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libverify.so
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libjava.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libjava.so
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libzip.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libzip.so
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libnet.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libnet.so
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/server/lib/solaris/libfilelock.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/server/lib/solaris/libfilelock.so
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libioser12.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libioser12.so
Reading symbols from /usr/lib/nss_nis.so.1...(no debugging symbols found)...
Loaded symbols for /usr/lib/nss_nis.so.1
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/server/lib/solaris/libstackdump.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/server/lib/solaris/libstackdump.so
Reading symbols from /usr/lib/libmd5.so.1...(no debugging symbols found)...
Loaded symbols for /usr/lib/libmd5.so.1
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/server/lib/solaris/libmuxer.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/server/lib/solaris/libmuxer.so
Reading symbols from /usr/ucblib/libucb.so.1...(no debugging symbols found)...
Loaded symbols for /usr/ucblib/libucb.so.1
Reading symbols from /usr/lib/libresolv.so.2...(no debugging symbols found)...
Loaded symbols for /usr/lib/libresolv.so.2
Reading symbols from /usr/lib/libelf.so.1...(no debugging symbols found)...
Loaded symbols for /usr/lib/libelf.so.1
Reading symbols from /home/usera/wls70/solaris/projectWork/lib/libhello.so...
(no debugging symbols found)...done.
Loaded symbols for /home/usera/wls70/solaris/projectWork/lib/libhello.so

(gdb) where

#0 0xff369764 in __sigprocmask () from /usr/lib/libthread.so.1
#1 0xff35e978 in _resetsig () from /usr/lib/libthread.so.1
#2 0xff35e118 in _sigon () from /usr/lib/libthread.so.1
#3 0xff361158 in _thrp_kill () from /usr/lib/libthread.so.1
#4 0xff24b908 in raise () from /usr/lib/libc.so.1
#5 0xff2358f4 in abort () from /usr/lib/libc.so.1
#6 0xfe3c6904 in __1cCosFabort6Fl_v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#7 0xfe3c59f8 in __1cCosbBhandle_unexpected_exception6FpnGThread_ipCpv_v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#8 0xfe20a8bc in JVM_handle_solaris_signal ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#9 0xff36b82c in __sighndlr () from /usr/lib/libthread.so.1
#10
#11 0xe9f90420 in Java_HelloWorld_displayHelloWorld ()
from /home/usera/wls70/solaris/projectWork/lib/libhello.so

#12 0x90aec in ?? ()
#13 0x8dc54 in ?? ()
#14 0x8dc54 in ?? ()
#15 0x8dc54 in ?? ()
#16 0x8ddbc in ?? ()
#17 0x8dde0 in ?? ()
#18 0x8dc54 in ?? ()
#19 0x8dc54 in ?? ()
#20 0x8dde0 in ?? ()
#21 0x8dc78 in ?? ()
#22 0x8dc54 in ?? ()
#23 0x8ddbc in ?? ()
#24 0x8dc54 in ?? ()
#25 0xfe5324f0 in __1cMStubRoutinesG_code1_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#26 0xfe0cbe9c in
__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#27 0xfe1f6dc4 in __1cJJavaCallsMcall_virtual6FpnJJavaValue_nLKlassHandle_nMsymbolHandle_4pnRJavaCallArguments_pnGThread__v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#28 0xfe1fcd94 in __1cJJavaCallsMcall_virtual6FpnJJavaValue_nGHandle_nLKlassHandle_nMsymbolHandle_5pnGThread__v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#29 0xfe21b708 in __1cMthread_entry6FpnKJavaThread_pnGThread__v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#30 0xfe216208 in __1cKJavaThreadDrun6M_v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#31 0xfe213ed0 in _start ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so

(gdb) thr

[Current thread is 1 (LWP 14 )]

(gdb) info thr
16 LWP 13 0xff29d194 in _poll () from /usr/lib/libc.so.1
15 LWP 12 0xff29f008 in _lwp_sema_wait () from /usr/lib/libc.so.1
14 LWP 11 0xff29f008 in _lwp_sema_wait () from /usr/lib/libc.so.1
13 LWP 10 0xff29bc2c in _so_accept () from /usr/lib/libc.so.1
12 LWP 9 0xff29bc2c in _so_accept () from /usr/lib/libc.so.1
11 LWP 8 0xff29d194 in _poll () from /usr/lib/libc.so.1
10 LWP 7 0xff29d194 in _poll () from /usr/lib/libc.so.1
9 LWP 6 0xff29f008 in _lwp_sema_wait () from /usr/lib/libc.so.1
8 LWP 5 0xff29f008 in _lwp_sema_wait () from /usr/lib/libc.so.1
7 LWP 4 0xff29f008 in _lwp_sema_wait () from /usr/lib/libc.so.1
6 LWP 3 0xff29d194 in _poll () from /usr/lib/libc.so.1
5 LWP 2 0xff29e958 in _signotifywait () from /usr/lib/libc.so.1
4 LWP 1 0xff29d194 in _poll () from /usr/lib/libc.so.1
3 LWP 16 0xff29c4fc in door_restart () from /usr/lib/libc.so.1
2 LWP 15 0xff369774 in private___lwp_cond_wait ()
from /usr/lib/libthread.so.1
* 1 LWP 14 0xff369764 in __sigprocmask ()
from /usr/lib/libthread.so.1

(gdb) thread apply 1 bt

Thread 1 (LWP 14 ):
#0 0xff369764 in __sigprocmask () from /usr/lib/libthread.so.1
#1 0xff35e978 in _resetsig () from /usr/lib/libthread.so.1
#2 0xff35e118 in _sigon () from /usr/lib/libthread.so.1
#3 0xff361158 in _thrp_kill () from /usr/lib/libthread.so.1
#4 0xff24b908 in raise () from /usr/lib/libc.so.1
#5 0xff2358f4 in abort () from /usr/lib/libc.so.1
#6 0xfe3c6904 in __1cCosFabort6Fl_v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#7 0xfe3c59f8 in __1cCosbBhandle_unexpected_exception6FpnGThread_ipCpv_v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#8 0xfe20a8bc in JVM_handle_solaris_signal ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#9 0xff36b82c in __sighndlr () from /usr/lib/libthread.so.1
#10
#11 0xe9f90420 in Java_HelloWorld_displayHelloWorld ()
from /home/usera/wls70/solaris/projectWork/lib/libhello.so
#12 0x90aec in ?? ()
#13 0x8dc54 in ?? ()
#14 0x8dc54 in ?? ()
#15 0x8dc54 in ?? ()
#16 0x8ddbc in ?? ()
#17 0x8dde0 in ?? ()
#18 0x8dc54 in ?? ()
#19 0x8dc54 in ?? ()
#20 0x8dde0 in ?? ()
#21 0x8dc78 in ?? ()
#22 0x8dc54 in ?? ()
#23 0x8ddbc in ?? ()
#24 0x8dc54 in ?? ()
#25 0xfe5324f0 in __1cMStubRoutinesG_code1_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#26 0xfe0cbe9c in __1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#27 0xfe1f6dc4 in __1cJJavaCallsMcall_virtual6FpnJJavaValue_nLKlassHandle_nMsymbolHandle_4pnRJavaCallArguments_pnGThread__v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#28 0xfe1fcd94 in __1cJJavaCallsMcall_virtual6FpnJJavaValue_nGHandle_nLKlassHandle_nMsymbolHandle_5pnGThread__v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#29 0xfe21b708 in __1cMthread_entry6FpnKJavaThread_pnGThread__v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#30 0xfe216208 in __1cKJavaThreadDrun6M_v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#31 0xfe213ed0 in _start ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so

(gdb) quit

Top of Page


Gathering Core information from: LINUX

GDB is the default preferred Linux debugger and it is powerful and stable. There are also various visual debuggers available, but only a simple command-line debugger is really needed to get the stacktrace from the core.

  1. Do file /core to verify if the core file is from the Java VM.
  2. Make sure you are using the latest GDB version from GNU on Linux to avoid any known bugs.
  3. See: http://ftp.gnu.org/gnu/gdb/
  4. Also make sure on Linux that ulimit for a core file is set. (e.g., ulimit -c unlimited).
  5. On Linux, the coredump is turned off by default on all systems. In RedHat Advanced Server 2.1, it should be under /etc/security. There should be a file called limits.conf. The file itself is self-explanatory, look for the word “core”. If set to 0, then coredump is disabled.
  6. See also: Operating System Values that should be checked for core file generation
  7. Get a stack trace using gdb as follows (same as done previously):

$ java -version (need to use right version of jdk)
$ ls /usr/local/bin/gdb (need to know gdb location) or "which gdb"
$ export DEBUG_PROG=/usr/local/bin/gdb (or wherever "gdb" is located)

For JDK 1.3.X do the following:
$ /java corefile
For JDK 1.4.X do the following:
$ gdb /java corefile

Now you will be in the debugger. Execute the following commands:
(gdb) where ("shows a summary of the stack")
(gdb) thr ("switch among threads or show the current thread")
(gdb) info thr ("inquire about existing threads")
(gdb) thread apply 1 bt ("apply a command to a list of threads, specifically the backtrace to thread #1")
(gdb) quit

Using these commands will produce a stacktrace of the last thread that executed (wherethr command), show the state of all the threads (info thrthread apply 1 bt command). Using the last command (thread apply # bt) is a way to get the stack trace of an individual thread by replacing # with an actual thread number or you can replace 3 with "all" to get the stack trace for all the threads. command), show the current thread ( command), and provide another way to get a stack trace of thread 1 in the core file (


Top of Page


Gathering Core information from: HPUX

The usual command-line debuggers are GDB and ADB.


GDB

Follow the same information previously for gdb if that debugger is available, also provided below:


$ java -version (need to use right version of jdk)
$ ls /usr/local/bin/gdb (need to know gdb location) or "which gdb"
$ export DEBUG_PROG=/usr/local/bin/gdb (or wherever "gdb" is located)

For JDK 1.3.X do the following:
$ /java corefile
For JDK 1.4.X do the following:
$ gdb /java corefile

Now you will be in the debugger. Execute the following commands:
(gdb) where ("shows a summary of the stack")
(gdb) thr ("switch among threads or show the current thread")
(gdb) info thr ("inquire about existing threads")
(gdb) thread apply 1 bt ("apply a command to a list of threads, specifically the backtrace to thread #1")
(gdb) quit

ADB

You should be able to get a stacktrace by doing the following:


$ java -version (need to use right version of jdk)
$ ls /usr/local/bin/adb (need to know adb location) or "which adb"
$ export DEBUG_PROG=/usr/local/bin/adb (or wherever "adb" is located)
$ /java corefile

Now you will be in the debugger. Execute the following commands:
adb> $C ("shows a summary of the stack and you may get an error at this point, see below")
adb> $r ("shows the state of the registers")
adb> $q ("the command to quit adb")

If you get a message such as "can't unwind -- no_entry" when doing the $C command in adb, then it could most likely be that adb doesn't understand shared libraries. For this case, the use of gdb or wdb is recommended. WDB is available at: http://h21007.www2.hp.com/dspp/tech/tech_TechSoftwareDetailPage_IDX/1,1703,1665,00.html


Top of Page


Gathering Core information from: AIX

  1. Follow the same information previously for gdb if it is available and if there is an actual binary core file produced:
    $ java -version (need to use right version of jdk)
    $ ls /usr/local/bin/gdb (need to know gdb location) or "which gdb"
    $ export DEBUG_PROG=/usr/local/bin/gdb (or wherever "gdb" is located)

    For JDK 1.3.X do the following:
    $ /java corefile
    For JDK 1.4.X do the following:
    $ gdb /java corefile

    Now you will be in the debugger. Execute the following commands:
    (gdb) where ("shows a summary of the stack")
    (gdb) thr ("switch among threads or show the current thread")
    (gdb) info thr ("inquire about existing threads")
    (gdb) thread apply 1 bt ("apply a command to a list of threads, specifically the backtrace to thread #1")
    (gdb) quit

  2. However, the JVM on AIX will usually print out a javacore..txt file to debug your application. It has some very useful information and it will show the current thread that was executing when the core happened. For example, the following would tell you the problem happened in your displayHelloWorld() native method you created. Look at that native code to determine why the core occurred.

    Sample information from javacore..txt file:
    Current Thread Details:
    "ExecuteThread: '10' for queue: 'default'" (TID:0x31c70ad0, sys_thread_t:0x3e52df68, state:R, native ID:0xf10) prio=5
    at HelloWorld.displayHelloWorld(Native Method)
    at servlets.NativeServlet.doGet(NativeServlet.java:85)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:740)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
    at weblogic.servlet.internal.ServletStubImpl$ServletInvocationAction.run(ServletStubImpl.java:1058)
    at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:401)
    at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:306)
    at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:5445)
    at weblogic.security.service.SecurityServiceManager.runAs(SecurityServiceManager.java:780)
    at weblogic.servlet.internal.WebAppServletContext.invokeServlet(WebAppServletContext.java:3105)
    at weblogic.servlet.internal.ServletRequestImpl.execute(ServletRequestImpl.java:2588)
    at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:213)
    at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:189)

  3. Look for the Current Thread Details which will give you an idea of the problem. For example, the following shows that the core came from ExecuteThread 24, but that at the time the JVM was doing some JIT. Therefore, it looks like a problem with the JIT compiler on this JVM version from IBM.
    Current Thread Details
    "ExecuteThread: '24' for queue: 'default'" sys_thread_t:0x781
    Native Stack
    at 0xD0F15924 in get_invoke_op
    at 0xD0F1535C in resolve_a_method
    at 0xD0F1E610 in resolve_method_call_graph
    at 0xD0F29C40 in jit_compiler_entry
    at 0xD0F2A404 in _jit_fast_compile

Top of Page


Gathering Core information from: WINDOWS

  1. The drwtsn32.log files are similar to core files on Unix. On Windows 2000, these files are found in the following directory: C:\Documents and Settings\All Users\Documents\DrWatson. After entering drwtsn32 ?, the Dr. Watson for Windows 2000 box appears. The DrWatson log file overview option will display a screen which explains the format of the drwtsn32.log files.
  2. A hs_err_pid<#>.log may also be produced from the JVM itself which may contain some useful information.

Enabling/Disabling Dr. Watson

By default, Dr. Watson will be enabled when Windows NT is installed.

  1. Check under the following registry key to make sure that Dr. Watson is enabled (0 is enabled and 1 is disabled):

    \HKEY_LOCAL_MACHINE\SOFTWARE \Microsoft\Windows NT\CurrentVersion\AeDebug.
  2. There is an entry called "Auto" that corresponds to how Dr. Watson will startup. This will then launch whatever debugger, or application, is under the Debugger registry value.
  3. For Dr. Watson, the Debugger value should contain:

    drwtsn32 -p %ld -e %ld -g

Top of Page


What if I don't have a Debugger?

  1. If you do not have access to a debugger, check to see if you have the pstack and pmap
    utilities on your operating system.
  2. If you do have these utilities (on some operating systems you have to download these utilities separately), you can run those commands on the system core file to gather information for Support.

    The syntax of the command would be something like this:
    $ /usr/proc/bin/pstack core
    $ /usr/proc/bin/pmap core

The following are the commands by operating system:


Solaris

pstack command = pstack

pmap command = pmap


AIX 5.2

AIX 5.2 or greater with an add-on from IBM (these are not available in earlier versions)


pstack command = procstack

pmap command = procmap

See: http://www-106.ibm.com/developerworks/eserver/articles/AIX5.2PerfTools.html


Linux

pstack = lsstack

pmap = pmap


NOTE: You can get lsstack from: http://sourceforge.net/projects/lsstack/ and build on your Linux platform. It is the equivalent of pstack on Solaris.


You can get pmap source from: http://web.hexapodia.org/~adi/pmap.c and build on your Linux platform.


HPUX

(none found)


The following is a snippet of the pstack and pmap data on the same core file from the gdb/dbx output. You can use this to narrow down the library where this is happening. In this example, it is evident that the error is coming from the libhello.so:


pstack output:


core 'core' of 20956: /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/bin/../bin/sparc/nativ
----------------- lwp# 14 / thread# 25 --------------------
ff369764 __sigprocmask (ff36bf60, 0, 0, e6181d70, ff37e000, 0) + 8
ff35e110 _sigon (e6181d70, ff385930, 6, e6180114, e6181d70, 6) + d0
ff361150 _thrp_kill (0, 19, 6, ff37e000, 19, ff2c0450) + f8
ff24b900 raise (6, 0, 0, ffffffff, ff2c03bc, 4) + 40
ff2358ec abort (ff2bc000, e6180268, 0, fffffff8, 4, e6180289) + 100
fe3c68fc __1cCosFabort6Fl_v_ (1, fe4c8000, 1, e61802e8, 0, e9f90420) + b8
fe3c59f0 __1cCosbBhandle_unexpected_exception6FpnGThread_ipCpv_v_ (ff2c02ac, fe53895c, fe4dc164, fe470ab4, fe4c8000, e6180308) + 254
fe20a8b4 JVM_handle_solaris_signal (0, 25d5b8, e6180d90, fe4c8000, b, e6181048) + 8ec
ff36b824 __sighndlr (b, e6181048, e6180d90, fe20a8cc, e6181e14, e6181e04) + c
ff3684d8 sigacthandler (b, e6181d70, 0, 0, 0, ff37e000) + 708
--- called from signal handler with signal 11 (SIGSEGV) ---
e9f90420 Java_HelloWorld_displayHelloWorld (25d644, e6181224, e61819b8, 0, 2, 0) + 30

00090ae4 ???????? (e6181224, e61819b8, 25d5b8, fe4c8000, 0, 109a0)
0008dc4c ???????? (e61812c4, ffffffff, ffffffff, 97400, 4, e61811b8)
0008dc4c ???????? (e618135c, e61819b8, fe4c8000, 99600, c, e6181250)
0008dc4c ???????? (e61813ec, f76a2f90, e618147c, 99600, c, e61812f8)
0008ddb4 ???????? (e618147c, f68578b8, 0, 99974, c, e6181388)
0008ddd8 ???????? (e618154c, e61815c8, e61815cc, 99974, 4, e6181410)
......

pmap output snippet:


........
E9500000 1184K read
E9680000 1392K read
E9800000 4608K read
E9F60000 136K read/write/exec
E9F90000 8K read/exec /home/usera/wls70/solaris/projectWork/lib/libhello.so
E9FA0000 8K read/write/exec /home/usera/wls70/solaris/projectWork/lib/libhello.so

E9FB4000 8K read/write/exec
E9FC0000 120K read/exec /usr/lib/libelf.so.1
E9FEE000 8K read/write/exec /usr/lib/libelf.so.1
.......

Notice from the pstack output that the address where this happened is at e9f90420. The pmap output snippet shows that e9f90420 falls between E9F90000 and E9FA0000, so the error is happening somewhere within the libhello.so shared object.


Top of Page


Operating System Values that should be checked for core file generation

  1. Check the ulimit -c (configured size of the core file) at a system and user level.
  2. Check the available disk space for the user (For example: Is there a disk quota?). You can verify the disk quota by using the quota -v command:
    $quota –v

    Disk quotas for weblogic (uid 12908):
    Filesystem usage quota limit timeleft files quota limit timeleft
    /home 896792 2048000 2048000 1121 204800 204800

    Please bear in mind that a core file will be the same size as the memory used for the application, so you will need at least that amount of disk space available.
  3. For Linux, the coredump is turned off by default on all systems. For RedHat Advanced Server 2.1, it should be under /etc/security. There should be a self-explanatory file called limits.conf and look for the word “core”. If set to "0", then coredump is disabled.
  4. Also on Linux, core files may not be generated for the following scenario. If you start Apache (with or without the plugin) as root user on a low port (as normal), core files will not be generated. The workaround is to start Apache as a non-root user on a high port and core files will be generated.
  5. For HP, change the HP OS setting called kernel parm maxdsiz (max_per_proc_data_size
    which increases the User Process Data Segment Size) from the old value of say 64m to something higher like 134M.
  6. For Solaris, you can also make sure core files are enabled with the coreadm command.

    Check to see if per process core files are disabled on the system with the coreadm
    command:
    $ coreadm

    global core file pattern:
    init core file pattern: core
    global core dumps disabled
    per-process core dumps: disabled
    global setid core dumps: disabled
    per-process setid core dumps: disabled
    global core dump logging: disabled

    Enable per process core file creation (this must be run as root):


    $ coreadm -e process

    Verify per process core files are enabled with the coreadm command again:


    $ coreadm

    global core file pattern:
    init core file pattern: core
    global core dumps disabled
    per-process core dumps: enabled
    global setid core dumps: disabled
    per-process setid core dumps: disabled
    global core dump logging: disabled

Top of Page


Core Files due to JIT Compiler

Sometimes a core from the JVM can be due to the JIT Compiler.


In order to determine this, check out the pstack information from the core file.

  1. If it looks somewhat similar to the following:

    Note: Check to see if per process core files are disabled on the system with the coreadm command:


    fe16d550 __1cMURShiftINodeFValue6kMpnOPhaseTransform__pknEType__ (9983c8, 88d00d4c, 1, 0, fe570000, 3b7480) + f8
    fe0d2180 __1cMPhaseIterGVNNtransform_old6MpnENode__2_ (88d00d4c, 3b795c, 9c, 88d00e9c, 11, e4c4d8) + 1d4
    fe19b1e8 __1cMPhaseIterGVNIoptimize6M_v_ (88d00d4c, 0, fe5b89f8, 0, 0, 0) + a0
    fe202008 __1cHCompileIOptimize6M_v_ (88d01298, fe5335c4, 88d011ac, fe570000, 0, 0) + 168
    fe2008b4 __1cHCompile2t6MpnFciEnv_pnHciScope_pnIciMethod_iii_v_ (fe5333f9, 371584, 2f1d24, d30664, ffffffff, 1) + bac
    fe1fd08c __1cKC2CompilerOcompile_method6MpnFciEnv_pnHciScope_pnIciMethod_ii_v_ (2bb80, 88d01ab4, 0, 372918, ffffffff, 0) + 64
    fe1fc850 __1cNCompileBrokerZinvoke_compiler_on_method6FpnLCompileTask__v_ (720, 0, ffffffff, fe5aee50, fe5bbbe4, eaff8) + 61c
    fe2ac1f8 __1cNCompileBrokerUcompiler_thread_loop6F_v_ (fe533c01, fe5af218, eaff8, eb5a8, 306d10, fe269254) + 428
    fe26927c __1cKJavaThreadDrun6M_v_ (eaff8, b, 40, 0, a, ff37c000) + 284
    fe26575c _start (eaff8, ff37d658, 1, 1, ff37c000, 0) + 134
    ff36b01c _thread_start (eaff8, 0, 0, 0, 0, 0) + 40

    Then the JIT may be at fault.


  2. In order to determine what method caused this, add the following flags to the java server line and run your test again to make the server core dump to obtain information:

    -XX:+PrintCompilation -XX:+PrintOpto


    The -XX:+PrintCompilation flag output looks something like this:


    1 sb java.lang.ClassLoader::loadClassInternal (6 bytes)
    2 b java.lang.String::lastIndexOf (12 bytes)
    3 s!b java.lang.ClassLoader::loadClass (58 bytes)

  3. Once you find the offending method, you can tell the JVM to bypass this by creating a .hotspot_compiler file in your current working directory with an exclude statement of the offending method. For example:

    exclude java/lang/String indexOf


    This would stop the java.lang.String.indexOf() method from being compiled by the JVM.


Top of Page


Stop the JVM to get Thread Dumps

You can set the following flags to enable taking a thread dump of the server right before a core happens to get the state of the threads at that moment:


Sun JVM

The option is -XX:+ShowMessageBoxOnError on the SUN JVM (which is not officially documented on the SUN website). When the JVM crashes, the program will prompt: Do you want to debug the problem? You can then take a thread dump of the JVM.


JRockit JVM

The corresponding option will be available on the 8.1 SP2 version of JRockit when this service pack is released. The option in JRockit is -Djrockit.waitonerror.


Top of Page


On-line Debugger Manuals

You may obtain manuals for the debuggers used for core file analysis as follows:


gdb: http://www.gnu.org/software/gdb/documentation/


dbx: Sun: http://docs.sun.com/db/doc/805-4948?q=DBX


dbx: IBM: http://publib16.boulder.ibm.com/pseries/en_US/cmds/aixcmds2/dbx.htm


adb: HP: http://docs.hp.com/hpux/onlinedocs/B2355-90680/00/00/8-con.html

No comments: