Thursday, March 11, 2010

Message Bridge and Foriegn Server, When to use what.

Foreign Server:

Foreign JMS server definitions are an administratively configured symbolic link between a JNDI object in a remote JNDI directory, such as a JMS connection factory or destination object, and a JNDI name in the JNDI name space for a stand-alone WebLogic Server or a WebLogic cluster. They can be configured using the Administration console, standard JMX MBean APIs, or programmatically using scripting. See Simplified Access to Foreign JMS Providers.

When is it best to use a Foreign JMS Server Definition?

For this release, a Foreign JMS Server definition conveniently moves JMS JNDI parameters into one central place. You can share one definition between EJBs, servlets, and messaging bridges. You can change a definition without recompiling or changing deployment descriptors. They are especially useful for:

  • Any message driven EJB (MDB) where it is desirable to administer standard JMS communication properties via configuration rather than hard code them into the application's EJB deployment descriptors. This applies even if the MDB's source destination isn't remote.

  • Any MDB that has a destination remote to the cluster. This simplifies deployment descriptor configuration and enhances administrative control.

  • Any EJB or servlet that sends or receives from a remote destination.

  • Enabling resource references to refer to remote JMS providers. See Using EJB/Servlet JMS Resource References.


    Message Bridge:


    Messaging bridges are administratively configured services that run on a WebLogic server. They automatically forward messages from a configured source JMS destination to a configured target JMS destination. These destinations can be on different servers than the bridge and can even be foreign (non-WebLogic) destinations. Each bridge destination is configured using the four common properties of a remote provider:

    • The initial context factory.

    • The connection URL.

    • The connection factory JNDI name.

    • The destination JNDI name.

    Messaging bridges can be configured to use transactions to ensure exactly-once message forwarding from any XA capable (global transaction capable) JMS provider to another.

    When should I use a messaging bridge?

    Typically, messaging bridges are used to provide store-and-forward high availability design requirements. A messaging bridge is configured to consume from a sender's local destination and forward it to the sender's actual target remote destination. This provides high availability because the sender is still able to send messages to its local destination even when the target remote destination is unreachable. When a remote destination is not reachable, the local destination automatically begins to store messages until the bridge is able to forward them to the target destination when the target becomes available again.

Wednesday, March 10, 2010

How threads work in Weblogic.

1. A client contacts the ListenThread, the entry point into WebLogic Server, which accepts the connection. It then registers the socket with a WLS component known as the SocketMuxer for further processing.

2. The SocketMuxer is responsible for reading and dispatching all client requests to the proper WLS container. It then adds this socket to an internal data structure for processing and makes a request of an ExecuteThreadManager to create a new SocketReaderRequest. This request is then dispatched by the manager to an ExecuteThread

3. As a result, the ExecuteThread becomes a SocketReader thread - it will continually run the SocketMuxer’s processSockets and checks the muxer’s queue to determine if there is work to be done. If an entry exists, it pulls it off the queue and processes it.

4. The SocketReader thread reads the client request and determines the protocol type of this client request, to create a new protocol specific MuxableSocket.

5. The MuxableSocketDiscrminator stores a new MuxableSocket with the implementation matching the protocol of the client. It also returns true to the SocketReader to notify it that the message is complete and it can be dispatched.

6. MuxableSocketDiscriminator re-registers the protocol specific version of the MuxableSocket that was created earlier. The net result is that “Step 2” is repeated, and a new protocol specific MuxableSocket is placed in the SocketMuxer’s queue for processing.

7. A socket reader will get the new protocol specific MuxableSocket off the queue and read it. It will then checks to see if the message is complete, based on the protocol. If it is, it will invoke the protocol specific MuxableSocketDiscriminator

8. Before the work requested by the client can be performed, there may be many iterations of “step 7”. This is determined by the protocol – for example, t3 will read a portion of the message, dispatch it so it can act upon the portion of the protocol read thus far.

9. The subsystem will create an ExecuteRequest and send it to an ExecuteThreadManager for processing. The request is dispatched to an ExecuteThread, and the result is returned to the client.


From a high level overview’s perspective, the SocketMuxer can be explained as follows. Each and every socket connection that comes into WebLogic Server is “registered” with the SocketMuxer - which then maintains a list of these connections, each represented by a form of the MuxableSocket interface. It then becomes the responsibility of the SocketMuxer to read and dispatch each client request to the appropriate subsystem. This is a fairly elaborate process, which is illustrated by steps 2 through 8 above.

There are only a few key things to know about the SocketMuxer:

First, it has a data structure in which it stores a socket entry for each client connected to WebLogic Server.
Second, a “socket reader” is the main component of the SocketMuxer - which really is just an execute thread that is running the SocketMuxer’s processSockets() method.

Third, the SocketMuxer does most of its work through the only interface it knows how to operate on – the MuxableSocket interface.

Socket Reader:

A SocketReaderRequest is merely an implementation of an ExecuteRequest, which is sent to the ExecuteThreadManger by the invocation of the registerSocket(). When the ExecuteThread invokes the execute() method of the SocketReaderRequest, the SocketMuxer’s processSockets() method is invoked.
So, a socket reader thread is simply a normal execute thread which runs the main processing method of the SocketMuxer, processSockets().


The acceptBacklog parameter of Weblogic server is passed to ServerSocket. The value of acceptBacklog means "The maximum queue length for incoming connection indications (a request to connect) is set to the backlog parameter. If a connection indication arrives when the queue is full, the connection is refused. "

Thus if too many connects come on the server at the same time, the server would queue this connects and process them one at a time. The value does not mean that only that many clients can connect to the server.

It does not limit the number of connections made. It limits the number of potential connections that can lie in the backlog queue. So for e.g.: AcceptBacklog is 2. If hundreds of connections were made to the server and the server has one thread to accept new connections.

This thread accepts the new connection and dispatches it to a new thread and then goes back to listening to new connections. Sample code is

while (true) {

Socket sock = serversocket.accept(); // Line 1
new MyThread(sock).run(); // Line 2

}

Here the thread accepts a new connection at line1. Dispatches to new thread in line 2. Evaluates the while expression and goes back to line 1. So in between the time it takes for it to get back to line 1(say T1) many new connections requests are made by the clients. These new connections lie on the accept backlog queue and this queue length is controlled by the accept backlog parameter.

If the queue length is 2, and between this time T1 several hundred connections are made to the server only 2 would get accepted and rest of them rejected. For rejection there must be too many simultaneous requests to the server, if it’s not simultaneous then the chances of queue getting full is less.

Wednesday, March 3, 2010

Analyzing Core dump file

Problem Description

An application gets a binary core file produced when the WebLogic Server process terminates due to some invalid native core (machine specific code). A server crash, JVM crash, machine crash, or HotSpot error may also be associated with this occurrence. This pattern will describe what steps are needed to gather information from a core file on various platforms.


Problem Troubleshooting

Please note that not all of the following items would need to be done. Some issues can be solved by only following a few of the items.


Quick Links


Why does the problem occur?

In order to determine the cause of such an error you need to determine all potential sources of native code used by the WebLogic Server process. The places to focus on are:

  1. The WebLogic Server performance pack. The WebLogic Server performance pack is native code and when enabled could potentially produce such an error. Disable this feature to determine if that is the cause. You can do this via the console or via the command line. Using the console look under the Server tab by setting NativeIOEnabled to false. See the section Enabling Performance Packs to get the exact sequence of steps under the Server tab in the console. For WLS 8.1 this would be under: http://e-docs.bea.com/wls/docs81/perform/WLSTuning.html#1142800 The steps are:
    1. Start the Administration Server if it is not already running.
    2. Access the Administration Console for the domain.
    3. Expand the Servers node in the left pane to display the servers configured in your domain.
    4. Click the name of the server instance that you want to configure.
    5. Select the Configuration —> Tuning tab.
    6. If the Enable Native IO check box is selected, please deselect it in the check box so it is now *not* enabled.
    7. Click Apply.
    8. Restart the server.

    You can also do this via the java options to the start command for WebLogic Server. Set -Dweblogic.NativeIOEnabled=false on the command line and then start the server. The command line will take precedence over what is sent via the console.


  2. Any Type 2 JDBC driver makes use of native DBMS libraries, which could also produce this type of error. Switch to a pure java (Type 4) JDBC driver in order to determine if that is the cause.
  3. Any native libraries accessed with JNI calls can also cause this type of error. If the application uses such libraries, they should be carefully examined. It may be difficult to rule out these libraries, as their functionality may not be easily removed from the application. Extensive logging may be needed to determine if a pattern of use can be correlated with the core dump / Dr Watson error.
  4. The JVM itself is a native program and can cause such errors. When in doubt, try another certified JVM and/or later release to determine if a JVM bug is at fault. Many JVM bugs involve the use of the JIT compiler and disabling this feature will often resolve this type of problem. Usually this can be done by supplying the -Djava.compiler=none
    command option.
  5. Sometimes the JVM will produce a small log file that may contain useful information as to which library the core may have come from but this is not true all of the time. The file is produced in the directory where WebLogic Server was started and it is of the form hs_err_pid.log, where is the process ID of the WebLogic Server process.

If after doing these things you cannot determine the cause of the error, then you can examine the core file that is produced in the directory where WebLogic Server was started. You must obtain the exact stacktrace from the binary core file to pinpoint the reason for the core. To do this, you can run a debugger, such as dbx or gdb, as outlined in this Diagnostic Pattern depending on your operating system.


Top of Page


Gathering Core information from: SOLARIS

  1. Do file /core to verify if the core file is from the Java VM.
  2. Get a stack trace using dbx or gdb as follows. With gdb you may get more useful information. Sun Support recommends the use of dbx(1) for core file analysis. If you do not have the dbx licensed product, a 30 day trial version is available as a download from Sun where dbx is bundled at http://wwws.sun.com/software/sundev/buy.html.

If a core file is *not* produced this may be because of file permissions problems or actual limits on the core file itself. The core dump file's size may be affected by the following factors:

  • Check ulimit -a to see whether your environment allows core files to be produced.
  • ulimit -c (This is the size limit of the core file. Fix it with ulimit -c unlimited).
  • Kernel limitation (hard limit for ulimit -c).
  • Available disk space for the user (e.g., is there disk quota?).
  • See also: Operating System Values that should be checked for core file generation

If there is a core file produced, run dbx or gdb on the core file. The following shows the commands for dbx and gdb with an example of the output produced by gdb. (NOTE: DEBUG_PROG is an environment variable that allows you to specify a debugger or profiler to launch for working with java.)


dbx


$ java -version (need to use right version of jdk)
$ ls /opt/bin/dbx (need to know dbx location) or "which dbx"
$ export DEBUG_PROG=/opt/bin/dbx (or wherever "dbx" is located)

For JDK 1.3.X do the following:
$ /java corefile
For JDK 1.4.X do the following:
$ dbx /java corefile

Now you will be in the debugger. Execute the following commands:
(dbx) where ("shows a summary of the stack")
(dbx) threads ("shows the state of the existing threads")
(dbx) quit

These commands get a stacktrace of the last thread that executed (where command) and show the state of all the threads (threads command) in the core file.


gdb


$ java -version (need to use right version of jdk)
$ ls /usr/local/bin/gdb (need to know gdb location) or "which gdb"
$ export DEBUG_PROG=/usr/local/bin/gdb (or wherever "gdb" is located)

For JDK 1.3.X do the following:
$ /java corefile
For JDK 1.4.X do the following:
$ gdb /java corefile

Now you will be in the debugger. Execute the following commands:
(gdb) where ("shows a summary of the stack")
(gdb) thr ("switch among threads or show the current thread")
(gdb) info thr ("inquire about existing threads")
(gdb) thread apply 1 bt ("apply a command to a list of threads, specifically the backtrace to thread #1")
(gdb) quit

Using these commands will produce a stacktrace of the last thread that executed (wherethr command), show the state of all the threads (info thrthread apply 1 bt command). Using the last command (thread apply # bt) is a way to get the stack trace of an individual thread by replacing # with an actual thread number or you can replace 3 with "all" to get the stack trace for all the threads. command), show the current thread ( command), and provide another way to get a stack trace of thread 1 in the core file (


The following is an example of the corefile with those commands in the gdb debugger. This example core was caused by an error with user native code in the application. (See bold items.) On this stack trace, look at the last line before the signal handler is called. This will lead you to look at the displayHelloWorld function in the native library libhello.so.


$ export DEBUG_PROG=/usr/local/bin/gdb
$ java core

GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.8"...
(no debugging symbols found)...
Core was generated by `/wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/bin/../bin/sparc/native_threads'.
Program terminated with signal 9, Killed.
Reading symbols from /usr/lib/libthread.so.1...(no debugging symbols found)...
Loaded symbols for /usr/lib/libthread.so.1
Reading symbols from /usr/lib/libdl.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libdl.so.1
Reading symbols from /usr/lib/libc.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libc.so.1
Reading symbols from /usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1...
(no debugging symbols found)...done.
Loaded symbols for /usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so...(no debugging symbols found)... done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
Reading symbols from /usr/lib/libCrun.so.1...(no debugging symbols found)...
Loaded symbols for /usr/lib/libCrun.so.1
Reading symbols from /usr/lib/libsocket.so.1...(no debugging symbols found)...
Loaded symbols for /usr/lib/libsocket.so.1
Reading symbols from /usr/lib/libnsl.so.1...(no debugging symbols found)...
Loaded symbols for /usr/lib/libnsl.so.1
Reading symbols from /usr/lib/libm.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libm.so.1
Reading symbols from /usr/lib/libw.so.1...
warning: Lowest section in /usr/lib/libw.so.1 is .hash at 00000074
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libw.so.1
Reading symbols from /usr/lib/libmp.so.2...(no debugging symbols found)...
Loaded symbols for /usr/lib/libmp.so.2
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/native_threads/libhpi.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/native_threads/libhpi.so
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libverify.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libverify.so
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libjava.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libjava.so
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libzip.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libzip.so
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libnet.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libnet.so
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/server/lib/solaris/libfilelock.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/server/lib/solaris/libfilelock.so
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libioser12.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/libioser12.so
Reading symbols from /usr/lib/nss_nis.so.1...(no debugging symbols found)...
Loaded symbols for /usr/lib/nss_nis.so.1
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/server/lib/solaris/libstackdump.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/server/lib/solaris/libstackdump.so
Reading symbols from /usr/lib/libmd5.so.1...(no debugging symbols found)...
Loaded symbols for /usr/lib/libmd5.so.1
Reading symbols from /wwsl/sharedInstalls/solaris/wls70sp2/server/lib/solaris/libmuxer.so...(no debugging symbols found)...done.
Loaded symbols for /wwsl/sharedInstalls/solaris/wls70sp2/server/lib/solaris/libmuxer.so
Reading symbols from /usr/ucblib/libucb.so.1...(no debugging symbols found)...
Loaded symbols for /usr/ucblib/libucb.so.1
Reading symbols from /usr/lib/libresolv.so.2...(no debugging symbols found)...
Loaded symbols for /usr/lib/libresolv.so.2
Reading symbols from /usr/lib/libelf.so.1...(no debugging symbols found)...
Loaded symbols for /usr/lib/libelf.so.1
Reading symbols from /home/usera/wls70/solaris/projectWork/lib/libhello.so...
(no debugging symbols found)...done.
Loaded symbols for /home/usera/wls70/solaris/projectWork/lib/libhello.so

(gdb) where

#0 0xff369764 in __sigprocmask () from /usr/lib/libthread.so.1
#1 0xff35e978 in _resetsig () from /usr/lib/libthread.so.1
#2 0xff35e118 in _sigon () from /usr/lib/libthread.so.1
#3 0xff361158 in _thrp_kill () from /usr/lib/libthread.so.1
#4 0xff24b908 in raise () from /usr/lib/libc.so.1
#5 0xff2358f4 in abort () from /usr/lib/libc.so.1
#6 0xfe3c6904 in __1cCosFabort6Fl_v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#7 0xfe3c59f8 in __1cCosbBhandle_unexpected_exception6FpnGThread_ipCpv_v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#8 0xfe20a8bc in JVM_handle_solaris_signal ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#9 0xff36b82c in __sighndlr () from /usr/lib/libthread.so.1
#10
#11 0xe9f90420 in Java_HelloWorld_displayHelloWorld ()
from /home/usera/wls70/solaris/projectWork/lib/libhello.so

#12 0x90aec in ?? ()
#13 0x8dc54 in ?? ()
#14 0x8dc54 in ?? ()
#15 0x8dc54 in ?? ()
#16 0x8ddbc in ?? ()
#17 0x8dde0 in ?? ()
#18 0x8dc54 in ?? ()
#19 0x8dc54 in ?? ()
#20 0x8dde0 in ?? ()
#21 0x8dc78 in ?? ()
#22 0x8dc54 in ?? ()
#23 0x8ddbc in ?? ()
#24 0x8dc54 in ?? ()
#25 0xfe5324f0 in __1cMStubRoutinesG_code1_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#26 0xfe0cbe9c in
__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#27 0xfe1f6dc4 in __1cJJavaCallsMcall_virtual6FpnJJavaValue_nLKlassHandle_nMsymbolHandle_4pnRJavaCallArguments_pnGThread__v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#28 0xfe1fcd94 in __1cJJavaCallsMcall_virtual6FpnJJavaValue_nGHandle_nLKlassHandle_nMsymbolHandle_5pnGThread__v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#29 0xfe21b708 in __1cMthread_entry6FpnKJavaThread_pnGThread__v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#30 0xfe216208 in __1cKJavaThreadDrun6M_v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#31 0xfe213ed0 in _start ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so

(gdb) thr

[Current thread is 1 (LWP 14 )]

(gdb) info thr
16 LWP 13 0xff29d194 in _poll () from /usr/lib/libc.so.1
15 LWP 12 0xff29f008 in _lwp_sema_wait () from /usr/lib/libc.so.1
14 LWP 11 0xff29f008 in _lwp_sema_wait () from /usr/lib/libc.so.1
13 LWP 10 0xff29bc2c in _so_accept () from /usr/lib/libc.so.1
12 LWP 9 0xff29bc2c in _so_accept () from /usr/lib/libc.so.1
11 LWP 8 0xff29d194 in _poll () from /usr/lib/libc.so.1
10 LWP 7 0xff29d194 in _poll () from /usr/lib/libc.so.1
9 LWP 6 0xff29f008 in _lwp_sema_wait () from /usr/lib/libc.so.1
8 LWP 5 0xff29f008 in _lwp_sema_wait () from /usr/lib/libc.so.1
7 LWP 4 0xff29f008 in _lwp_sema_wait () from /usr/lib/libc.so.1
6 LWP 3 0xff29d194 in _poll () from /usr/lib/libc.so.1
5 LWP 2 0xff29e958 in _signotifywait () from /usr/lib/libc.so.1
4 LWP 1 0xff29d194 in _poll () from /usr/lib/libc.so.1
3 LWP 16 0xff29c4fc in door_restart () from /usr/lib/libc.so.1
2 LWP 15 0xff369774 in private___lwp_cond_wait ()
from /usr/lib/libthread.so.1
* 1 LWP 14 0xff369764 in __sigprocmask ()
from /usr/lib/libthread.so.1

(gdb) thread apply 1 bt

Thread 1 (LWP 14 ):
#0 0xff369764 in __sigprocmask () from /usr/lib/libthread.so.1
#1 0xff35e978 in _resetsig () from /usr/lib/libthread.so.1
#2 0xff35e118 in _sigon () from /usr/lib/libthread.so.1
#3 0xff361158 in _thrp_kill () from /usr/lib/libthread.so.1
#4 0xff24b908 in raise () from /usr/lib/libc.so.1
#5 0xff2358f4 in abort () from /usr/lib/libc.so.1
#6 0xfe3c6904 in __1cCosFabort6Fl_v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#7 0xfe3c59f8 in __1cCosbBhandle_unexpected_exception6FpnGThread_ipCpv_v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#8 0xfe20a8bc in JVM_handle_solaris_signal ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#9 0xff36b82c in __sighndlr () from /usr/lib/libthread.so.1
#10
#11 0xe9f90420 in Java_HelloWorld_displayHelloWorld ()
from /home/usera/wls70/solaris/projectWork/lib/libhello.so
#12 0x90aec in ?? ()
#13 0x8dc54 in ?? ()
#14 0x8dc54 in ?? ()
#15 0x8dc54 in ?? ()
#16 0x8ddbc in ?? ()
#17 0x8dde0 in ?? ()
#18 0x8dc54 in ?? ()
#19 0x8dc54 in ?? ()
#20 0x8dde0 in ?? ()
#21 0x8dc78 in ?? ()
#22 0x8dc54 in ?? ()
#23 0x8ddbc in ?? ()
#24 0x8dc54 in ?? ()
#25 0xfe5324f0 in __1cMStubRoutinesG_code1_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#26 0xfe0cbe9c in __1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#27 0xfe1f6dc4 in __1cJJavaCallsMcall_virtual6FpnJJavaValue_nLKlassHandle_nMsymbolHandle_4pnRJavaCallArguments_pnGThread__v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#28 0xfe1fcd94 in __1cJJavaCallsMcall_virtual6FpnJJavaValue_nGHandle_nLKlassHandle_nMsymbolHandle_5pnGThread__v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#29 0xfe21b708 in __1cMthread_entry6FpnKJavaThread_pnGThread__v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#30 0xfe216208 in __1cKJavaThreadDrun6M_v_ ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so
#31 0xfe213ed0 in _start ()
from /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/jre/lib/sparc/server/libjvm.so

(gdb) quit

Top of Page


Gathering Core information from: LINUX

GDB is the default preferred Linux debugger and it is powerful and stable. There are also various visual debuggers available, but only a simple command-line debugger is really needed to get the stacktrace from the core.

  1. Do file /core to verify if the core file is from the Java VM.
  2. Make sure you are using the latest GDB version from GNU on Linux to avoid any known bugs.
  3. See: http://ftp.gnu.org/gnu/gdb/
  4. Also make sure on Linux that ulimit for a core file is set. (e.g., ulimit -c unlimited).
  5. On Linux, the coredump is turned off by default on all systems. In RedHat Advanced Server 2.1, it should be under /etc/security. There should be a file called limits.conf. The file itself is self-explanatory, look for the word “core”. If set to 0, then coredump is disabled.
  6. See also: Operating System Values that should be checked for core file generation
  7. Get a stack trace using gdb as follows (same as done previously):

$ java -version (need to use right version of jdk)
$ ls /usr/local/bin/gdb (need to know gdb location) or "which gdb"
$ export DEBUG_PROG=/usr/local/bin/gdb (or wherever "gdb" is located)

For JDK 1.3.X do the following:
$ /java corefile
For JDK 1.4.X do the following:
$ gdb /java corefile

Now you will be in the debugger. Execute the following commands:
(gdb) where ("shows a summary of the stack")
(gdb) thr ("switch among threads or show the current thread")
(gdb) info thr ("inquire about existing threads")
(gdb) thread apply 1 bt ("apply a command to a list of threads, specifically the backtrace to thread #1")
(gdb) quit

Using these commands will produce a stacktrace of the last thread that executed (wherethr command), show the state of all the threads (info thrthread apply 1 bt command). Using the last command (thread apply # bt) is a way to get the stack trace of an individual thread by replacing # with an actual thread number or you can replace 3 with "all" to get the stack trace for all the threads. command), show the current thread ( command), and provide another way to get a stack trace of thread 1 in the core file (


Top of Page


Gathering Core information from: HPUX

The usual command-line debuggers are GDB and ADB.


GDB

Follow the same information previously for gdb if that debugger is available, also provided below:


$ java -version (need to use right version of jdk)
$ ls /usr/local/bin/gdb (need to know gdb location) or "which gdb"
$ export DEBUG_PROG=/usr/local/bin/gdb (or wherever "gdb" is located)

For JDK 1.3.X do the following:
$ /java corefile
For JDK 1.4.X do the following:
$ gdb /java corefile

Now you will be in the debugger. Execute the following commands:
(gdb) where ("shows a summary of the stack")
(gdb) thr ("switch among threads or show the current thread")
(gdb) info thr ("inquire about existing threads")
(gdb) thread apply 1 bt ("apply a command to a list of threads, specifically the backtrace to thread #1")
(gdb) quit

ADB

You should be able to get a stacktrace by doing the following:


$ java -version (need to use right version of jdk)
$ ls /usr/local/bin/adb (need to know adb location) or "which adb"
$ export DEBUG_PROG=/usr/local/bin/adb (or wherever "adb" is located)
$ /java corefile

Now you will be in the debugger. Execute the following commands:
adb> $C ("shows a summary of the stack and you may get an error at this point, see below")
adb> $r ("shows the state of the registers")
adb> $q ("the command to quit adb")

If you get a message such as "can't unwind -- no_entry" when doing the $C command in adb, then it could most likely be that adb doesn't understand shared libraries. For this case, the use of gdb or wdb is recommended. WDB is available at: http://h21007.www2.hp.com/dspp/tech/tech_TechSoftwareDetailPage_IDX/1,1703,1665,00.html


Top of Page


Gathering Core information from: AIX

  1. Follow the same information previously for gdb if it is available and if there is an actual binary core file produced:
    $ java -version (need to use right version of jdk)
    $ ls /usr/local/bin/gdb (need to know gdb location) or "which gdb"
    $ export DEBUG_PROG=/usr/local/bin/gdb (or wherever "gdb" is located)

    For JDK 1.3.X do the following:
    $ /java corefile
    For JDK 1.4.X do the following:
    $ gdb /java corefile

    Now you will be in the debugger. Execute the following commands:
    (gdb) where ("shows a summary of the stack")
    (gdb) thr ("switch among threads or show the current thread")
    (gdb) info thr ("inquire about existing threads")
    (gdb) thread apply 1 bt ("apply a command to a list of threads, specifically the backtrace to thread #1")
    (gdb) quit

  2. However, the JVM on AIX will usually print out a javacore..txt file to debug your application. It has some very useful information and it will show the current thread that was executing when the core happened. For example, the following would tell you the problem happened in your displayHelloWorld() native method you created. Look at that native code to determine why the core occurred.

    Sample information from javacore..txt file:
    Current Thread Details:
    "ExecuteThread: '10' for queue: 'default'" (TID:0x31c70ad0, sys_thread_t:0x3e52df68, state:R, native ID:0xf10) prio=5
    at HelloWorld.displayHelloWorld(Native Method)
    at servlets.NativeServlet.doGet(NativeServlet.java:85)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:740)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
    at weblogic.servlet.internal.ServletStubImpl$ServletInvocationAction.run(ServletStubImpl.java:1058)
    at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:401)
    at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:306)
    at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:5445)
    at weblogic.security.service.SecurityServiceManager.runAs(SecurityServiceManager.java:780)
    at weblogic.servlet.internal.WebAppServletContext.invokeServlet(WebAppServletContext.java:3105)
    at weblogic.servlet.internal.ServletRequestImpl.execute(ServletRequestImpl.java:2588)
    at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:213)
    at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:189)

  3. Look for the Current Thread Details which will give you an idea of the problem. For example, the following shows that the core came from ExecuteThread 24, but that at the time the JVM was doing some JIT. Therefore, it looks like a problem with the JIT compiler on this JVM version from IBM.
    Current Thread Details
    "ExecuteThread: '24' for queue: 'default'" sys_thread_t:0x781
    Native Stack
    at 0xD0F15924 in get_invoke_op
    at 0xD0F1535C in resolve_a_method
    at 0xD0F1E610 in resolve_method_call_graph
    at 0xD0F29C40 in jit_compiler_entry
    at 0xD0F2A404 in _jit_fast_compile

Top of Page


Gathering Core information from: WINDOWS

  1. The drwtsn32.log files are similar to core files on Unix. On Windows 2000, these files are found in the following directory: C:\Documents and Settings\All Users\Documents\DrWatson. After entering drwtsn32 ?, the Dr. Watson for Windows 2000 box appears. The DrWatson log file overview option will display a screen which explains the format of the drwtsn32.log files.
  2. A hs_err_pid<#>.log may also be produced from the JVM itself which may contain some useful information.

Enabling/Disabling Dr. Watson

By default, Dr. Watson will be enabled when Windows NT is installed.

  1. Check under the following registry key to make sure that Dr. Watson is enabled (0 is enabled and 1 is disabled):

    \HKEY_LOCAL_MACHINE\SOFTWARE \Microsoft\Windows NT\CurrentVersion\AeDebug.
  2. There is an entry called "Auto" that corresponds to how Dr. Watson will startup. This will then launch whatever debugger, or application, is under the Debugger registry value.
  3. For Dr. Watson, the Debugger value should contain:

    drwtsn32 -p %ld -e %ld -g

Top of Page


What if I don't have a Debugger?

  1. If you do not have access to a debugger, check to see if you have the pstack and pmap
    utilities on your operating system.
  2. If you do have these utilities (on some operating systems you have to download these utilities separately), you can run those commands on the system core file to gather information for Support.

    The syntax of the command would be something like this:
    $ /usr/proc/bin/pstack core
    $ /usr/proc/bin/pmap core

The following are the commands by operating system:


Solaris

pstack command = pstack

pmap command = pmap


AIX 5.2

AIX 5.2 or greater with an add-on from IBM (these are not available in earlier versions)


pstack command = procstack

pmap command = procmap

See: http://www-106.ibm.com/developerworks/eserver/articles/AIX5.2PerfTools.html


Linux

pstack = lsstack

pmap = pmap


NOTE: You can get lsstack from: http://sourceforge.net/projects/lsstack/ and build on your Linux platform. It is the equivalent of pstack on Solaris.


You can get pmap source from: http://web.hexapodia.org/~adi/pmap.c and build on your Linux platform.


HPUX

(none found)


The following is a snippet of the pstack and pmap data on the same core file from the gdb/dbx output. You can use this to narrow down the library where this is happening. In this example, it is evident that the error is coming from the libhello.so:


pstack output:


core 'core' of 20956: /wwsl/sharedInstalls/solaris/wls70sp2/jdk131_06/bin/../bin/sparc/nativ
----------------- lwp# 14 / thread# 25 --------------------
ff369764 __sigprocmask (ff36bf60, 0, 0, e6181d70, ff37e000, 0) + 8
ff35e110 _sigon (e6181d70, ff385930, 6, e6180114, e6181d70, 6) + d0
ff361150 _thrp_kill (0, 19, 6, ff37e000, 19, ff2c0450) + f8
ff24b900 raise (6, 0, 0, ffffffff, ff2c03bc, 4) + 40
ff2358ec abort (ff2bc000, e6180268, 0, fffffff8, 4, e6180289) + 100
fe3c68fc __1cCosFabort6Fl_v_ (1, fe4c8000, 1, e61802e8, 0, e9f90420) + b8
fe3c59f0 __1cCosbBhandle_unexpected_exception6FpnGThread_ipCpv_v_ (ff2c02ac, fe53895c, fe4dc164, fe470ab4, fe4c8000, e6180308) + 254
fe20a8b4 JVM_handle_solaris_signal (0, 25d5b8, e6180d90, fe4c8000, b, e6181048) + 8ec
ff36b824 __sighndlr (b, e6181048, e6180d90, fe20a8cc, e6181e14, e6181e04) + c
ff3684d8 sigacthandler (b, e6181d70, 0, 0, 0, ff37e000) + 708
--- called from signal handler with signal 11 (SIGSEGV) ---
e9f90420 Java_HelloWorld_displayHelloWorld (25d644, e6181224, e61819b8, 0, 2, 0) + 30

00090ae4 ???????? (e6181224, e61819b8, 25d5b8, fe4c8000, 0, 109a0)
0008dc4c ???????? (e61812c4, ffffffff, ffffffff, 97400, 4, e61811b8)
0008dc4c ???????? (e618135c, e61819b8, fe4c8000, 99600, c, e6181250)
0008dc4c ???????? (e61813ec, f76a2f90, e618147c, 99600, c, e61812f8)
0008ddb4 ???????? (e618147c, f68578b8, 0, 99974, c, e6181388)
0008ddd8 ???????? (e618154c, e61815c8, e61815cc, 99974, 4, e6181410)
......

pmap output snippet:


........
E9500000 1184K read
E9680000 1392K read
E9800000 4608K read
E9F60000 136K read/write/exec
E9F90000 8K read/exec /home/usera/wls70/solaris/projectWork/lib/libhello.so
E9FA0000 8K read/write/exec /home/usera/wls70/solaris/projectWork/lib/libhello.so

E9FB4000 8K read/write/exec
E9FC0000 120K read/exec /usr/lib/libelf.so.1
E9FEE000 8K read/write/exec /usr/lib/libelf.so.1
.......

Notice from the pstack output that the address where this happened is at e9f90420. The pmap output snippet shows that e9f90420 falls between E9F90000 and E9FA0000, so the error is happening somewhere within the libhello.so shared object.


Top of Page


Operating System Values that should be checked for core file generation

  1. Check the ulimit -c (configured size of the core file) at a system and user level.
  2. Check the available disk space for the user (For example: Is there a disk quota?). You can verify the disk quota by using the quota -v command:
    $quota –v

    Disk quotas for weblogic (uid 12908):
    Filesystem usage quota limit timeleft files quota limit timeleft
    /home 896792 2048000 2048000 1121 204800 204800

    Please bear in mind that a core file will be the same size as the memory used for the application, so you will need at least that amount of disk space available.
  3. For Linux, the coredump is turned off by default on all systems. For RedHat Advanced Server 2.1, it should be under /etc/security. There should be a self-explanatory file called limits.conf and look for the word “core”. If set to "0", then coredump is disabled.
  4. Also on Linux, core files may not be generated for the following scenario. If you start Apache (with or without the plugin) as root user on a low port (as normal), core files will not be generated. The workaround is to start Apache as a non-root user on a high port and core files will be generated.
  5. For HP, change the HP OS setting called kernel parm maxdsiz (max_per_proc_data_size
    which increases the User Process Data Segment Size) from the old value of say 64m to something higher like 134M.
  6. For Solaris, you can also make sure core files are enabled with the coreadm command.

    Check to see if per process core files are disabled on the system with the coreadm
    command:
    $ coreadm

    global core file pattern:
    init core file pattern: core
    global core dumps disabled
    per-process core dumps: disabled
    global setid core dumps: disabled
    per-process setid core dumps: disabled
    global core dump logging: disabled

    Enable per process core file creation (this must be run as root):


    $ coreadm -e process

    Verify per process core files are enabled with the coreadm command again:


    $ coreadm

    global core file pattern:
    init core file pattern: core
    global core dumps disabled
    per-process core dumps: enabled
    global setid core dumps: disabled
    per-process setid core dumps: disabled
    global core dump logging: disabled

Top of Page


Core Files due to JIT Compiler

Sometimes a core from the JVM can be due to the JIT Compiler.


In order to determine this, check out the pstack information from the core file.

  1. If it looks somewhat similar to the following:

    Note: Check to see if per process core files are disabled on the system with the coreadm command:


    fe16d550 __1cMURShiftINodeFValue6kMpnOPhaseTransform__pknEType__ (9983c8, 88d00d4c, 1, 0, fe570000, 3b7480) + f8
    fe0d2180 __1cMPhaseIterGVNNtransform_old6MpnENode__2_ (88d00d4c, 3b795c, 9c, 88d00e9c, 11, e4c4d8) + 1d4
    fe19b1e8 __1cMPhaseIterGVNIoptimize6M_v_ (88d00d4c, 0, fe5b89f8, 0, 0, 0) + a0
    fe202008 __1cHCompileIOptimize6M_v_ (88d01298, fe5335c4, 88d011ac, fe570000, 0, 0) + 168
    fe2008b4 __1cHCompile2t6MpnFciEnv_pnHciScope_pnIciMethod_iii_v_ (fe5333f9, 371584, 2f1d24, d30664, ffffffff, 1) + bac
    fe1fd08c __1cKC2CompilerOcompile_method6MpnFciEnv_pnHciScope_pnIciMethod_ii_v_ (2bb80, 88d01ab4, 0, 372918, ffffffff, 0) + 64
    fe1fc850 __1cNCompileBrokerZinvoke_compiler_on_method6FpnLCompileTask__v_ (720, 0, ffffffff, fe5aee50, fe5bbbe4, eaff8) + 61c
    fe2ac1f8 __1cNCompileBrokerUcompiler_thread_loop6F_v_ (fe533c01, fe5af218, eaff8, eb5a8, 306d10, fe269254) + 428
    fe26927c __1cKJavaThreadDrun6M_v_ (eaff8, b, 40, 0, a, ff37c000) + 284
    fe26575c _start (eaff8, ff37d658, 1, 1, ff37c000, 0) + 134
    ff36b01c _thread_start (eaff8, 0, 0, 0, 0, 0) + 40

    Then the JIT may be at fault.


  2. In order to determine what method caused this, add the following flags to the java server line and run your test again to make the server core dump to obtain information:

    -XX:+PrintCompilation -XX:+PrintOpto


    The -XX:+PrintCompilation flag output looks something like this:


    1 sb java.lang.ClassLoader::loadClassInternal (6 bytes)
    2 b java.lang.String::lastIndexOf (12 bytes)
    3 s!b java.lang.ClassLoader::loadClass (58 bytes)

  3. Once you find the offending method, you can tell the JVM to bypass this by creating a .hotspot_compiler file in your current working directory with an exclude statement of the offending method. For example:

    exclude java/lang/String indexOf


    This would stop the java.lang.String.indexOf() method from being compiled by the JVM.


Top of Page


Stop the JVM to get Thread Dumps

You can set the following flags to enable taking a thread dump of the server right before a core happens to get the state of the threads at that moment:


Sun JVM

The option is -XX:+ShowMessageBoxOnError on the SUN JVM (which is not officially documented on the SUN website). When the JVM crashes, the program will prompt: Do you want to debug the problem? You can then take a thread dump of the JVM.


JRockit JVM

The corresponding option will be available on the 8.1 SP2 version of JRockit when this service pack is released. The option in JRockit is -Djrockit.waitonerror.


Top of Page


On-line Debugger Manuals

You may obtain manuals for the debuggers used for core file analysis as follows:


gdb: http://www.gnu.org/software/gdb/documentation/


dbx: Sun: http://docs.sun.com/db/doc/805-4948?q=DBX


dbx: IBM: http://publib16.boulder.ibm.com/pseries/en_US/cmds/aixcmds2/dbx.htm


adb: HP: http://docs.hp.com/hpux/onlinedocs/B2355-90680/00/00/8-con.html

Taking GC logs

If you want to take a GC logs and would like to place the log files in the specific directory
Solution:

-verbose:gc -XX:+PrintGCDetails -Xloggc:/gc.log

WLST script to take thread dump after 5 seconds.

In certain scenarios, it possible to get thread dumps without using manual input such as a 'kill -3" or "CTRL+BREAK." is not This document provides a short WLST script that will take thread dumps at regular intervals.

Solution

connect ('weblogic','weblogic','t3://localhost:7001')

for counter in range(10): serverName = 'AdminServer'
counter = 0
sleepTime = 5000

java.lang.Thread.sleep(sleepTime)
fileName = 'dump' + serverName + '_' + (java.util.Calendar.getInstance().getTimeInMillis()).toString() + '.dmp'
threadDump('true', fileName, serverName)

This script will take 10 thread dumps 5000 ms (that is, 5 seconds) apart. Replace serverName, username, password, and URL with your particular details. Change sleepTime (in ms) if you want to change the frequency of the thread dumps. Change the range (currently 10) if you want to have fewer or more thread dumps taken.

Save the code in a file, e.g. "ThreadDumps.py."

Call it from WLST using the following command:
java weblogic.WLST ThreadDumps.py

Thursday, February 25, 2010

Standalone client to view the JNDI objects

As part of weblogic admin job, many times developers approach us asking us to show JNDI tree to if there objects are binded to the server. Generally we use "server --> View JNDI Tree" but this pages renders very slowly or it doesn't open as it happened to me today on one particular server. So I had written this basic stand alone client to find out objects.

import javax.naming.*;
import java.util.Hashtable;

public class ListJNDIObjects
{
static Hashtable ht = null;

public ListJNDIObjects()
{
ht = new Hashtable();
ht.put(Context.INITIAL_CONTEXT_FACTORY,"weblogic.jndi.WLInitialContextFactory");
ht.put(Context.PROVIDER_URL,"t3://vasserver:9080");
ht.put(Context.SECURITY_PRINCIPAL,"weblogic");
ht.put(Context.SECURITY_CREDENTIALS,"weblogic");

}

public static void main(String args[])
{
try
{
String jndiContext = "";
ListJNDIObjects listJndi = new ListJNDIObjects();
Context context = new InitialContext(ht);

if(args.length != 0)
{
jndiContext = args[0];
}

NamingEnumeration jndiList = context.list(jndiContext);

while(jndiList.hasMore())
{
NameClassPair ncp = (NameClassPair)jndiList.next();
System.out.println(ncp);
}

context.close();


}
catch(NamingException ne)
{
System.out.println("JNDI List failed : " + ne);
}
}
}

Monday, February 8, 2010

Troubleshooting OutofMemory

Most java.lang.OutOfMemoryErrors are the result of a program simply creating and using more objects than can fit in the maximum allowable heap space. The most common resolution to this type of error is:

1. Increasing the maximum heap size using the appropriate JVM command line option (e.g. -mx512m).

Beyond this, you will need to take steps to learn more about your JVM's heap usage. The easiest and best way to do this is with the verbose gc option (Usually specified -verbosegc but sometimes as -verbose:gc or -Xverbose:gc). This setting will usually output a single line for each major and minor garbage collection that takes place. The format is specific to each JVM, but generally each line shows "heap in use", "amount freed", and "time spent in garbage collection".

Some SUN JVM users have received the OutOfMemoryError as the result of permanent generation limitations. The java heap is comprised of several segments; the permanent generation being one of those. Therefore, if the java.lang.OutOfMemoryError is issued when the java heap has not been completely used (as shown with the verbose gc option), the most common resolution will be:

2. Increasing the size of the permanent generation space using the appropriate JVM command option.

e.g. BEA Solaris platform recommendation:
If you have problems with OutOfMemory errors and the JVM crashing with
JDK 1.3, try setting: -XX:MaxPermSize=128m.

There is currently an open bug on Sun's bug parade that describes this problem. See,
http://developer.java.sun.com/developer/bugParade/bugs/4390238.html

When the above conditions and remedies do not help, the problem is often thought to be a memory leak. Real leaks are actually rare because Java Applications are not responsible for freeing memory; the JVM is. Still, when an application allocates java objects but never releases (de-references) them, this condition is very similar to a traditional memory leak seen commonly in C and C++ applications. This type of memory issue generally appears in the verbose GC output as a slow and steady loss of free heap space. Eventually the JVM's Full or Major Garbage Collection task runs more and more frequently trying to reclaim heap space. Eventually, it will not keep up with demand and the java.lang.OutOfMemoryError message is output. At this point the JVM is unable to execute java code and any subsequent results are unpredictable.

To diagnose and resolve this type of problem, you will likely need to obtain a Java Heap Profiling tool. The following procedure should help:

3a. First make sure that you have conducted your tests without JVM JIT optimization. This can be done by adding
"-Djava.compiler=none" to your JVM startup command. This test should be done to avoid any JVM bugs which may exist with the optimization of your
java code. This step is also required for use of the available JVM debugging tools and it will be helpful to establish that the problem you are trying to locate is not a JVM bug but is instead created
by java code.
3b. Use the JProbe (http://www.jprobe.com or http://www.sitraka.com/software/jprobe) utility to inspect your JVM heap in order to determine which object class instances are being accumulated.

3c. Another similar product is OptimizeIt (http://www.optimizeit.com or http://www.borland.com/optimizeit)

3d. The JVM itself has the ability to dump its heap contents upon process termination. Therefore, you may find it helpful to supply the following JVM options:

-Xrunhprof:heap=dump,format=a (Use java -Xrunhprof:help for details)

and invoke the following code within your JVM when you wish to inspect the current heap contents:

System.gc(); // Request a Full Garbage Collection
Thread.sleep(5000); // Wait for completion
System.exit(); // Terminate the JVM process

The resulting java.hprof.txt file can be inspected to determine if any application changes can or should be made to
reduce the number of active objects. More information on this Java API can be found at:

http://java.sun.com/j2se/1.3/docs/guide/jvmpi/jvmpi.html#hprof-heap

There is still one more possibility or concept to explore. The amount of heap being used is often driven by the multi-threaded nature of a server JVM (such as WebLogic). If you allocate 100 threads for handling server requests, then it is possible for all 100 to be running in parallel. Under load, this configuration can use approximately 10 times the amount of Java Heap Space as one with only 10 threads. Therefore, if your verbose gc output shows a rapid or sudden climb in heap usage until none is available (free), you may simply have too many simultaneous activities for the amount of available heap. In this case, the resolution will be to:

4. Make your java heap as large as is possible (for the physical machine configuration) and then reduce server thread counts until your server application can stay within it limits.

The above suggestions should adequately address most Java Heap related memory problems. However, it is still possible to encounter system limitations with memory and/or JVM memory leaks.

Most UNIX operating systems allow limits to be placed on various process resources. Such limits may prevent creating Java Threads and other objects which need to allocate native process components such as stacks which are part of the total process size.

5. Carefully inspect the reported error message to make sure that an operating system process limitation is not at play.

JVM memory leaks are very rare but still possible. Therefore:

6. If your process size continues to grow until system resources are exhausted or limits exceeded, you may wish to use native O/S tools to determine which process segments are responsible. Continuous growth in the JVMs native components should be reported to the JVM vendor. Remember, when the JVM heap size reaches its maximum (-mx), the process size will not grow as a result of Java Heap allocation. Therefore, escalating process size is generally a result of native code (e.g. The JVM or JNI libraries).