Case Study: Client experiences WebSphere Business Integrator Outage

Project Description

Regional grocery chain (CLIENT) experienced outage in their WebSphere Business Integrator (WBI) application. WBI is no longer supported and the application developer was no longer available.

The Situation

This CLIENT was using an older WebSphere® product, WebSphere Business Integrator (WBI) any they had developed an application called Item Sync (developed by IBM Global). Item Sync was not working properly and the CLIENT needed to take steps to correct.
The application flow is work initiated by vendors and that work is visible in the Vendor Transaction List (VTL) screen. The WBI application is responsible for routing the work to the next step in the approval process. This routing was not occurring and was part of the overall problems.
The other side of the problem was that the sync application MQ Archive queue filled up. Because it was filled up, it was not accepting new messages. The initial corrective steps taken by the customer was to purge the archive queue. They then proceeded to reboot the WBI server; the MQ collectors were verified as active and in their correct state and sequence. New work was then being seen in the VTL screen, but it was not being seen in the next step.

The Response

One week earlier the CLIENT had experienced problems when its archive queue became full and workflow process was not working. They proceeded to purge the queue, which included all messages. As these messages were not persistent the messages had not been logged and were therefore lost.
As part of the initial response the CLIENT proceeded to reboot the WBI server and the MQ collectors were verified as running in their correct state and sequence.
A conference call between TxMQ and the CLIENT was conducted. During this call, in addition to hearing the issue, TxMQ’s consultant also made a recommendation to restore the backed up configuration and determine if the connectivity for the workflow processing would begin working. During this call the recommendation was made to begin using the native MQ alerts which provide an early warning system in the event of problem like queue filling.
The same evening after the conclusion of the conference call, the CLIENT proceeded to restore the configuration, which included MQ, connectors and WBI. The CLIENT then tested the environment by bringing up the environment and testing. The environment came up and was operational.
The Queue depth on the archive queue had also been expanded so that the issue with the archive queue filling would not happen again.
The next morning, the CLIENT and TxMQ reconvened. The current situation had been noted and the last steps included recreating the messages deleted from the archive queue. The customer was looking for TxMQ to help recreate the lost application messages. Unfortunately since TxMQ was unfamiliar with the application schema and process, the restore for the application messages would be better left to the business owners.

The Results

In this scenario, queues should not be purged in an overflow condition. The correct action to would have been to copy the messages to a backup queue or file system to be replayed later.
Before steps were taken to partially initiate a fix, care should have gone into making sure that it was a complete fix that would work.