Case Study: WebSphere Commerce 6.0 To 7.0 Migration

Project Description

A large retailer using WebSphere Commerce 6 had failed in a previous migration to WebSphere Commerce 6.1. The retailer now needed new features found in WebSphere Commerce 7. TxMQ, an IBM Premier Business Partner, added two brand-new clustered WebSphere 7 environments. The result: TxMQ implemented WebSphere 7 and delivered twice the processing capacity using half the number of clustered servers with no downtime to operations.

The Situation

A large retailer (the “client”) was using WebSphere Commerce 6 running clustered WebSphere Application Server (WAS) 6.0 and backend clustered services running WAS 6.0. Both WebSphere environments were on AIX 6. The frontend Commerce servers maintained the website, while the backend services performed shipping, email notifications, billing and inventory. The backend services also hosted a homegrown call-center application. The development environment staged 16 separate JVMs to manage disparate projects. The QA environment housed four clustered application servers in a three-tiered architecture that exactly mirrored production.

The Challenge

A previous migration to WebSphere 6.1 had failed and was subsequently abandoned. With the release of WebSphere Commerce 7, the client’s development team needed to utilize new features found in WebSphere Commerce 7. There was also no tolerance for downtime, whether it be in development or production. Automated deployment scripts were requested for each environment.

The Response

The key to project success was to analyze what had been unsuccessful in the past. From that initial conversation, TxMQ split the project by environment. The first environment was the client’s frontend WebSphere Commerce 6 environment. The second was the backend WAS services.
TxMQ attempted two separate approaches to tackle the frontend environment. The first approach was to migrate the Commerce 6.0 application into version 7 using the supplied IBM tools. When that failed, TxMQ went to the backup approach and installed WebSphere 7 on new, recently purchased hardware. The existing installation of WebSphere 6 was on old, about-to-be-retired hardware. While the original approach was to eventually migrate to the new hardware once WebSphere 7 was installed, TxMQ improvised and installed 7 only on the new hardware. This made it simple to maintain the requirement for downtime, because the team could build alongside the existing architecture. There was little to no downtime and testing proceeded smoothly without interruption to the legacy environment. Cutover was as simple as shutting off the legacy lpars.
Within the second environment, the application was a bit simpler than the Commerce. The IBM-supplied tools migrated that application successfully and TxMQ was able to successfully complete all tasks using the first approach of an in-place migration.
In hindsight, TxMQ would recommend the client perform a full reinstall of WebSphere 7 because of the significant changes to the SSL certificates between WebSphere 6 and 7. In order to re-enable security, TxMQ had to completely rebuild the client’s legacy certificates (that were transferred over by IBM tools). In a PMR to IBM, it was acknowledged that this was a flaw in the migration which resulted in corrupted security keys.

The Result

In summary, TxMQ added two brand-new clustered WebSphere 7 environments with little to no downtime to the client’s project schedule. The client’s development environment was perfectly mirrored from legacy and was switched with little to no impact to the existing schedule. Performance testing indicated that the client could run nearly twice the seasonal load on half of the usual hardware. Whereas the client could process 4,000 orders/hour with 8 clustered servers, the client could now process nearly 8,000 orders/hour with just 4 clustered servers.
TxMQ was able to provide extensive documentation on the installation process for both the backend and frontend environments. Additionally, TxMQ provided documentation for the deployment scripts, as well as updated architecture diagrams for both the deployment script and the new WebSphere 7.

Lessons Learned

Initially, TxMQ experienced problems with the 32-bit application – it would run out of memory due to excessive class loaders. The solution lay within a parameter buried deep in the infocenter that accepted a code to limit the number of classloaders.
WebSphere Commerce is caching-intensive. TxMQ executed a highly significant amount of performance-tuning around the dynacache settings. Some new distributed-object caching methods introduced in WebSphere 7 helped achieve greater order throughput. However, there was a need to rewrite the cachespec.xml to utilize these features.
While migrations were usually successful, in the end everybody agreed that a clean install was faster.
Photo courtesy of Flickr contributor David Precious

Case Study: CICS HealthCheck

Project Description

Large regional bank commissioned TxMQ to conduct a CICS architecture and implementation review to maximize use of resources and technology.

The Situation

TxMQ’s client, a large regional bank, requested a CICS HealthCheck. Under the request, TxMQ would assess the current infrastructure and verify that the environments were running optimally (response times and availability), and assess how the client used them in comparison with other institutions of like size and structure.
In addition, TxMQ was asked to identify opportunities for different and greater functionality, as presented by the technology, and to determine what business solutions might be recommended.

The Objectives

The first key objective for the HealthCheck was to provide senior architecture consulting, plus a broad assessment and review. The assessment and review would be used to provide flow documents and possible recommendations including, but not limited to:

  • Software and system-configuration changes or upgrades/support packs needed
  • Technical impact and performance analysis
  • Response to solutions impact
  • Written recommendations based on meetings with stakeholders
  • Stakeholder concerns mapped to specific infrastructure objectives

A second key objective for the HealthCheck was to analyze the WMQ environment and configuration to eliminate any potential performance, availability and security exposure, and to ensure there were no roadblocks for future growth and new applications. This included an analysis of current staffing structure, number and size.
The final objective was to analyze the proposed capacity-increase solutions and the impact of each solution to support increases in business processes up to 100% of the existing transaction volumes. The analysis would help the client effectively plan MQ/CICS capacity-management strategy, identify technical dependencies and assist with an identification of the costs and benefits.

The Response

TxMQ’s consultant – an MQ Subject Matter Expert – spent a little more than a week onsite with the CICS/MQ team at the bank.
TxMQ dove into a diagnosis of the current OS and middleware environment, which included CICS V3.2, WebSphere MQ V7.01 and DB2 V8. The Client had more than 85 CICS regions deployed, many of which were standalone and used for development and testing.
TxMQ’s consultant completed a review of CICS parameters and specifications for the bank’s main production regions. The consultant also assessed the source of three system outages over the prior 13 months due to application issues.
In conjunction with the bank’s internal teams, the TxMQ consultant was able to deep-dive into the environment and architecture and create a list of observations and recommendations.

Findings

The HealthCheck revealed no major or current issues. The consultant’s general observations were as follows:

  • Sound Architecture
  • Highly motivated and skilled professionals
  • Excellent communication among management, teams and members
  • Need for ongoing skill updates
  • Fast problem resolution
  • Formal structure for support and problem facilitation
  • Central change management
  • No performance issues
  • Management aware of and willing to address issues
  • Good planning

The three system outages were sourced to application issues. TxMQ recommended the client reduce additional outages by:
1) Thorough application testing, and 2) Utilization of a disaster-recovery system identical to the production environment.
As far as CICS parameters, no networking issues were detected and only minor changes were suggested.
The client is proactively planning upgrade activities as well as additional training for its staff.
Photo courtesy of Flickr contributor Bob Mical

Case Study: Z/OS Performance Check

Project Description

National product-transport-logistics company (client) sought WebSphere Z/OS environment performance assessment and analysis to solve memory issues.

The Situation

The client experienced CPU spikes and service timeouts that caused application-hang conditions with a need to recycle the WAS app servers. The workload volume approached peak for the business process, with the need to address JVM configuration and settings changes to potentially alleviate CPU spikes.

The Analysis

Statement Of The Problem: The cause of the client’s WebSphere issue was within one of the WebSphere Servant Regions. The region was running out of memory and performing a large volume of garbage collection – so much that this WebSphere region took over most of the z/AAP capacity. The question was: How could the client control this activity with WLM?

Tools

TxMQ performed an analysis using the following tools:

  • Z/OS RMF reports
  • Javacore analysis using the IBM Support Assistant Workbench (ISAW)
  • WebSphere GC logs analysis using the ISAW
The Assessment

TxMQ logged the following observations:

  • The WebSphere infrastructure was remarkably well-maintained and well-tuned for the hardware environment. In addition, TxMQ was impressed by the client’s knowledge base.
  • The environment was not technically restrained – it could do what the client “wanted it to do.” Instead, lack of hardware and application design were of issue.
  • The Z/OS infrastructure was problematic. The main issue: What does one do when the z/AAP goes away?
  • The situation pointed to an application-design issue. All tests pointed to a condition whereby the application was attempting to do too much, based upon less-than-optimal design and execution.
Javacor Analysis

Following is a summary of notes from the Javacore analysis:

  • Number of threads allocated to ORB should be investigated
  • Recommended: Xmxcl setting (only for IBM Java 5.0, up to and including
  • Service Refresh 4)
  • Thread-status analysis: Noticed 26 threads were actually running. Code pointed to ORB, EJBs
  • No one was waiting on a monitor, yet there were 91 waiters. It appeared the code was not doing the same thing on the other threads. (A monitor is how java keeps code reusable and allows resource sharing.)

ORB thread pool stack: Almost all the running threads look like this:

Memory Segment Analysis

This analysis showed the JVM to be running out of memory.

Recommendations

At the conclusion of the review, TxMQ discussed recommendations with the client, but the client remained concerned that the recommendations still did not address the issue of CPU overload. Some results from that conversation:

  • The client’s applications team decided to support a move to WASv7.x with the expectation of increased performance.
  • WLM management is an area that needs review, but any modifications (re: priority levels for WebSphere) are problematic.
  • The additional zAAP is preventing a system heart attack. The general-processor/zAPP costs must be accounted for before the zAPP expired in the next 6 weeks.
  • Best value-for-spend would be an application-review of the heap usage.
Client To-Do List

Client takeaway from performance review:

  • Add more ORB threads
  • Add more memory to the JVMs
  • Look into JVM –Xmxcl setting
  • Investigate how EJBs are using connection pools
  • Get the GP and keep the zAAP
  • Set GC-threads limitation on the WebSphere JVM to avoid the taking of all CPUs
    when doing GC
  • Profile the application and tune it with the view of system resources, parsing
    and messaging
  • Investigate DataPower appliance as a method of offloading Soap services, XML parsing and messaging
  • Choose a DB Subject Matter Expert to look over the design, tuning and capacity-planning of the subsystem.

Photo courtesy of Flickr contributor xdxd_vs_xdxd

Case Study: CMS Site Design & Deployment

Project Description

Indian nation (client) sought update to current static-html website. TxMQ designed a site that was dynamic and easily updated via Content Management System.

The Situation

As part of an overall grand plan to streamline and track customer data, the client needed the assistance of a system analyst to provide ongoing remote-development work for the Indian nation’s gaming website and several web properties within the site. There were a total of 17 gaming locations, with two of the client’s largest locations most in need of an updated website.

The Objectives

The client’s current static-HTML website was difficult to update on a frequent basis. In order to create a dynamic, ever-changing and easily-updatable website, the client needed to transition the website to a JSP Content Management System (CMS) to promote casinos and gaming centers.
There were two websites in need of a CMS implementation and the client needed to evaluate and choose the proper system to complete the job.

The Response

A TxMQ consultant was supplied with a fairly complete site design in the HTML format. His first steps were to convert the static HTML instance to a CMS-enabled, dynamic-JSP website.
To create the new site, a CMS site was managed on a development server. Thus, all work orders and progress could be completed within the development environment before live deployment.
Using the Alterian Content Management System (ACM), the TxMQ consultant defined the required content types, templates and pages needed to build out the website. After that he installed all the resources (text content, images, xml, Flash, configuration and more) into the ACM repository.
Testing immediately followed to confirm all features and functionality of the CMS, after which it was turned over to an in-house Quality Assurance (QA) team. After the QA-requested changes were completed and enhancements made, the site was deployed to the staging server and then pushed to live.
The outset of the entire project and all its components allowed the client to clearly mine and track customer data, which led to a 15% to 20% growth in revenues for the commerce division.
Photo courtesy of Flickr contributor Peter Dutton