Thursday, November 12, 2009

Load Test

Past few days i encountered one serious production issue, which the web service timeout before the backend process sent back the message to the SOAP response node in WMB flow. This was due to the high load in production that caused the delay in the backend process.

The whole process is: WMB flow extracts SOAP request message, turns it into a MQ Header message, puts the message to Java adaptor's request queue, then Java adaptor processes the content and puts a response message to the response queue, then WMB flow will read the message in the response queue and will turn it into a SOAP response.

In the progress of investigation, we decided to move one database query out from the java adaptor program to the WMB flow. The reason being is sometimes the db connection will just drop and the java adaptor program has to spend some time reinitiating the db connection. In fact according to the time recorded in the adaptor's log the process was still quite fast, it's even within miliseconds. Well we just gave it a try.

After making all the necessary changes we deployed the works to the testing environment. I then performed some load tests to the web service. The result was negative. The tps was around 3.5 but the average time taken was > 10 seconds and we had to achieve < 10 seconds! So still plenty of work to do :(

We performed a few cycles of investigation, trial and error & testing until we found the solution - increasing the WMB flow instance. This is a setting in the WMB Toolkit. So we increased the number of instances to 3. This time we managed to achieve an average time of 3 seconds! We were so happy and excited!

What about the cause? See below:

There were many messages piling at the response queue which is supposed to be read by the WMB flow. These messages were sent in by the Java Adaptor after it finished processing the data. Apparently these messages were not picked up by the WMB flow in time hence causing the timeout error. By increasing the WMB flow instance, concurrently there are 3 identical WMB flows reading the content of this queue, hence problem resolved!

I was really relieved!

Thursday, September 10, 2009

Classloading in Websphere Administrative Console


I have this web service server application which has been running fine at a Websphere Process Server. One day, my application's log was writing to another application B's log. That is really weird. I checked and found out that this B was recently deployed to the server only. The cause for my application to write to B's log is due to B's log4j jar file was put in the server's common directory ! If we do not change the classloading policy, which is default to PARENT_FIRST, then Wepshere's classloader will load that common directory first only will load the jars in our application's WEB-INF/lib directory.

Hence i logged on to Websphere Administrative Console, went to my application's war file and changed the classloader policy to PARENT_LAST. By doing this Websphere will find and load the neccessary classes from your web module's WEB-INF/lib(web module classloader) first, also you must ensure that the jars you put under your application's WEB-INF/lib folder do not clash with the one at server's common directory, else you will get a java.lang.LinkageError exception due to classloader is trying to load duplicate class.

Friday, July 24, 2009

NullPointerException in SecureRandom.nextBytes

 

I was facing this issue where a web service client which had always been running fine suddenly stopped functioning properly due to an error NullPointerException in SecureRandom.nextBytes

AxisFault
faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
faultSubcode:
faultString: java.lang.NullPointerException
faultActor:
faultNode:
faultDetail:
{http://xml.apache.org/axis/}stackTrace:java.lang.NullPointerException
at java.security.SecureRandom.nextBytes(SecureRandom.java:406)
at org.apache.axis.utils.SessionUtils.generateSessionId(SessionUtils.java:62)
As you can see the exception was thrown when some class in Axis is calling the nextBytes function from SecureRandom Class. I spent quite some time in this and i think that is a bug for IBM version of JRE 1.4.2. If you want to know more, read here.
Therefore i changed the script which starts my web service client by using the java command from Java 5 instead of java 1.4.2 and finally it worked!

Saturday, May 30, 2009

Busy with Web Services on WPS

For the past two weeks i had been very busy with building web services using java 1.4. Bad thing was i had to deploy them on Websphere Process Server 6.0,

I had no problem building the web services using Rational Software Architect(RSA). I tested my work by deploying to Websphere Application Server 6.1 runtime that comes together with the RSA. Everything was working fine.

However i started to have a lot of headaches while trying to deploy these web services by using Websphere Admin Consle to Websphere Process Server 6.0.2 which is running on a remote server. My web service application just could not start!

After trying out several solutions, in the end i migrated my source codes to Websphere Integration Developer 6.0(WID). I regenerated every web service related deployment configuration files and proxy classes by using WID.

This time my web service application started successfully! Well the next problem was it was unable to get database connection! It took me quite some time to figure out what's wrong with it and finally i realized that i had to get the JDBC connection from the WPS itself instead of creating my own OracleDataSource cache.

I created all JDBC connection that the web service applications needs by using the Websphere Admin Console. I also changed my code to get the JDBC connection via JNDI.
The method is simple. Just create a default initialContext, then within the context, get your JDBC connection by name like "JDBC/MyDB" will do.

Saturday, March 21, 2009

Run java program at backend on Windows

Normally, if you want to start a Java program to run as back end process, just type the command follow by a '&'. However, how do you do the same on Windows?

Yesterday my colleague was facing this problem and i remembered i have gone through the same problem before in my previous company. After we went through the windows command list i found something, which is the 'start' command. Then i remembered i might have used this command before when i had faced the same problem last time.

So we tried the 'start' command and it was successful.

Friday, March 20, 2009

Why IT Project Fails!

I have come across an article talking about why IT project fails!

Take a look on it!

My experience told me that business users often do not want to spend much time with the developers during the development of a system. They do not realize that it is very very critical and useful to the final result of the system by spending time reviewing developers work to find out whether something is missing at the early phase of the project! Well i do agree that agile methodology approach is very very good provided the business users are willing to allocate resource to examine the system.

Friday, March 6, 2009

Working on wrong direction

One day the client raised an issue to me. That issue would not happen if the Websphere Message Broker (WMB) flow that i deployed on the server is running fine.

I checked the queue and channel status by using a tool called MQJ explorer and found out both the sender and receiver channels' status were 'inactive'. I then turned it to 'running' again. However their status will still be 'inactive' after some time. I had gone mad about this and spent a whole day trying to find out what went wrong.


In the end i know i went into the wrong direction. It is normal that after some time the status of the channels go to 'inactive'. As long as the transmission trigger is on then there will be no problem. When the next message comes in then the sender and receiver channels will be activated again.

So what actually went wrong. Argh it is because the portal team did not set the correct queue name to the JMS header of the XML message and thus no response from the whole flow!

Saturday, February 28, 2009

Complex investigation, easy solution

Yesterday I was facing one production issue where somehow at some time the connection to a database which is located at another server will be reset. I was getting java.lang.SqlException(io exception: connection resrt). I have reused a method that will reconnect to the database. I hope it works in the next deployment and i believe it should work. Code snippet as below:



public Connection getDBConnection(String dbName) throws OracleConnException, Exception {
if (logger.isInfoEnabled()) {
logger.info("getDBConnection(String) - String dbName=" + dbName); //$NON-NLS-1$
}

Connection cachedConnection = null;
if (ocDataSource == null) {
ocDataSource = new HashMap();
this.printSQL("new HashMap");
}

OracleDataSource ds = (OracleDataSource) ocDataSource.get(dbName);

if(ds == null){
this.printSQL("creatingNewOracleDS");
ds = createNewOracleDS(dbName);
this.printSQL("createdNewOracleDS");
ocDataSource.put(dbName.trim(), ds);
}

int i = 0;
int retry = KenanAdaptorProperties.getORACLE_RETRY_NUM();

for(i = 0; i < retry; i++){
try {
this.printSQL("ds.getConnection() " + ds.getPortNumber());

cachedConnection = ds.getConnection();
this.printSQL("error in connection? ");

if(!isConnectionAlive(cachedConnection)){
this.printSQL("connection is not alive???");
try { ds.close(); } catch(Exception e){
//log.debug("Can't close DataSource object!");
}
ds = null;

ocDataSource.remove(dbName);

ds = createNewOracleDS(dbName);
ocDataSource.put(dbName, ds);
} else {
break;
}

} catch (SQLException se){
logger.error("getDBConnection(String)", se); //$NON-NLS-1$
} catch (Exception e){
this.printSQL("Exception:\r\n" + Util.getStackTraceForLog(e));
}

// sleep for a specific amount of time!
//make it configurable
try {
Thread.sleep(KenanAdaptorProperties.getORACLE_RETRY_SLEEP_TIME());
} catch (InterruptedException e) {
// log.debug("Interrupted Exception:\r\n" + Util.getStackTraceForLog(e));
}

}

if(i == retry){
throw new OracleConnException("Connection is not established. Failed after " + retry + " retries.");
}

return cachedConnection;
}

Wednesday, February 25, 2009

production rollout for Kenan Adapter running on Websphere Business Integration Adapters is successful!

Finally the production rollout for Kenan Adapter is successful!

This is the third deployment after the unsuccessful first two. Well it went well in spite of some minor issues. Most of the scenarios are catered this time. The client's testing team had done a good job this time for testing thoroughly on all kind of possible scenarios! Yeah they should have done that during the UAT time! If they would allocate some time for more scenario testing, the deployment would have gone well even before year 2009! No it's approaching March and the client only managed to use it now. Well, this project does not bring any income to my team anyway. My project manager took this project to cover up some other unit's shit. Now that the long term outsource deal has been closed.

I am studying Websphere Process Server now. It looks far more complicated. I think i need some time to digest some new terms here. I should get myself involved in at least one project related to this product so that i can gain more expertise in this product.

All in all, good news this week. I am glad especially for the successfull rollout for Kenan Adapter. I have put very much effort in making this successful. I would like to thank my project manager for the support! That is important to continue to boost my morale.

Friday, January 16, 2009

Something funny has happened!

I am not sure why. Somehow the XML message that was passed to my request queue contains response tag which is not supposed to be attached to it.

What annoyed me was my WBI adapter for Kenan kept processing the same XML message over and over again! Thank to this and only i found out that there was one database connection remained opened thus causing more and more database sessions created!!! This could be a serious issue and luckily i tackled the problem before the system goes for production! However, i still need to find out why sometimes the XML message is redirected to the request queue even tough it has been processed by the adapter without encountering any Tuxedo connection error! The only clue i know now is this kind of 'problematic' XML message contains response tag! Could it be an error from the portal side? Could it be they acquired the processed XML message from response queue and then sent to request queue again? Well, it is doubtful.

Scratching head... continue tomorrow....