Case Study: WAS ND 6 Install & Configuration

Project Description

American grocery store (client) operating approximately 1,300 supermarkets in 11 states needed assistance with implementing WAS ND v6 configuration and installation.

The Situation

Client was handling several open problem tickets with IBM in conjunction with handling an open issue with the deployment manager hanging. The client needed a Subject Matter Expert (SME) to oversee the project, determine the root cause and provide recommendations and assistance with implementing changes in the WebSphere Application Server ND V6 configuration and installation. TxMQ was asked to provide project leadership and the SME.
The Project Manager was responsible for providing a system analysis of the WAS-­ND application and system logs to identify the problem origin and recommend a course of action to resolve the issue. It was expected that the SME analyze WAS-­?ND security and Tivoli Directory setup, propose changes and technical explanations for potential solutions and then oversee the implementation of said solutions. The expectation at the end of the project was that all IBM open tickets and issues would be resolved and WAS-­ND would be installed and configured properly.

The Response & Methodology

The first step on a long list of ‘to-do’s” was to meet with the client’s on-staff system administrators to discuss all installation procedures and pre?install requirements. This was an imperative step to ensure that procedures were discussed for resolving problems via IBM PMRs for WebSphere issues. After the analysis of the project, a technical explanation was developed which outlined several possible solutions. After the client evaluated recommendations and a solution was chosen our solution was implemented. Subsequent testing was necessary as the system needed to remain in compliance with the client’s change management policies and procedures. Throughout the duration of the project, the TxMQ consultants facilitated technical meetings with onsite resources and provided status reports to all parties involved in the project.

The Result

The TxMQ consultant ensured that the project was completed and the configuration of WAS?ND remained compliant with all of the client’s change management policies and procedures.
Photo courtesy of Flickr contributor Jay Gooby

Case Study: Middleware Gap Analysis

Prepared By: Allan Bartleywood, TxMQ Subject Matter Expert, Senior Consultant and Architect, MQ

Project Description

“Regional Bank A” has a technical infrastructure supporting application integration through the use of an Enterprise Service Bus (“ESB”) serving as mediator between application endpoints and other backend systems. This tier consists of several of the IBM WebSphere products including WebSphere MQ, WTX and Message Broker.
Working together, these products provide data transformation and routing so that data exchange occurs in native-application formats in near-real-time conditions, with data transformation occurring primarily in WebSphere Message Broker with connectivity to WTX for EDI format map transformation following pre-packaged EDI standards. Message flows are created by “Regional Bank A” projects for defining routing and data delivery rules for new or changed applications.
This environment requires regular, ongoing development support as well as quarterly software maintenance for regular applying of software patches related to Linux and Microsoft Windows operating-system software.

Overview of Findings

The recent reviews conducted by the performing consultant include the infrastructure components indicated in subsection
1. In review of infrastructure best practices for the financial services industry, the following findings were noted:
2.1 Monitoring Optimization for Performance Management & Capacity Planning
Generally speaking, there is significant opportunity to improve the monitoring approach to attain monitoring and management objectives in a way that is considerably more cost-effective than what is presently being practiced.
2.2 Infrastructure Security Strategy Following Pre-Regulatory Standards
Of notice with regard to companies operating in the financial services industry, the security-regulatory environment has changed significantly in the past 10 years. The reported number of breaches in 2012 was astoundingly high at more than 1,000 occurrences. With such voracity of hacking efforts focused on financial services companies, it is imperative that security vulnerabilities be addressed as a priority, and that highest standards and practices are implemented to ensure against such attacks.
The areas identified for improvement are reviewed in the Security subsection below. There are several major components that must be addressed for “Regional Bank A” in the very near future.
2.3 Standards & Best Practices
Within the WebSphere product portfolio, there are several IBM standards and recommendations for installation, configuration and performance tuning for the infrastructure stack. In particular, the standards around the middleware-messaging components (“MQ”) were found to be inconsistent and in need of configuration management. Additionally, Java applications brokered on WebSphere Application Server were found to be running on Java Virtual Machines (“JVM”) that were not configured according to best practices across the board.
This type of situation generally occurs when multiple people are involved with installation and configuration activities, without the guidance and oversight of a middleware architect who would generally ensure that such standards are applied and documented across the topology. More observations and recommendations are shared in the subsections below.
2.4 Software Distribution and Deployment Automation
A review of “Regional Bank A’s” application-release process – i.e., how changes are made to the middleware environment – found the current process to be very informal. Because the environment is small, the implementation of automation at this time will provide significant process improvement and thus positioning “Regional Bank A” for growth. Without this automation, the ongoing cost of development efforts will continue to increase without accompanying levels of development output, due to increasing the complexity of changes and the effort required to manage so many moving parts. This area has been identified as a strategic area of investment for “Regional Bank A” organization and application-growth enablement.

Monitoring Observations

For infrastructures that include an ESB, the standard monitoring approach should encompass the entire end-to-end view of the production technical components at both base server level and application level. This will capture end-to-end business transaction success or failure to complete, providing the ability to identify where specific failures are occurring. The approach should also include the ability to capture relevant data used for planning capacity, to understand and characterizing the behavior of the end-to-end system, and provide information used for middleware performance tuning.
“Regional Bank A’s” monitoring was found to be somewhat component focused with primary focus at the hardware level. Some stats are being captured at all levels, but not in a consistent way in terms of granularity or storage of information that would make the data useful for analysis.
Examples of what is being monitored today include:

  • Real-time usage by PID using TOP
  • Some collection of server stats in the O/S

The areas of suggested improvements include:

  • At Operating-System Level – Capture state and usage of each host (physical or virtual); if running virtually, it is critical that the state is known for the physical mapping to virtual.
  • At Application-Monitor Level – Critically available information depends on knowing the state (up/down/hung) of the application stack.
  • At Transaction-Monitor Level – Service management is dependent on knowing three things:
      Number of transactions completed in the SLA
    1. How many failed?
      How many were delayed?
  • It is also useful to know the service-response times, and stats concerning known bottlenecks such as page-load time, JVM utilization and metrics such as user-response time and invocation stats.
  • Proactive Monitoring – The plan for capacity high/low thresholds needs to be defined and regularly evaluated in response to events and situations where thresholds are exceeded but before an outage has actually occurred.
  • Performance Management & Capacity Planning – For effective cost management of this infrastructure, the initial implementation for the environments may be a subset of full capacity, with the intent to add to the environment as application growth occurs. To accompany this strategy, monitoring data must be captured and stored (using a data warehouse) for trending, tuning, and capacity-planning purposes.
  • “Regional Bank A” is currently not storing monitoring data for any significant length of time. Additionally, a data-maintenance strategy and centralized group to analyze and review performance data on a regular basis should be incorporated into the growth strategy.
  • Security – With recent regulatory changes, all unauthorized access of data must be reported. In order to comply, IT must have a logging strategy and log retention of security events expanded into this tier of infrastructure where application messages are currently passing through and security could be compromised.
Security Observations

The IT Security components involved with this particular infrastructure include:

  • SSL Certificate Management
  • Operating System Level Security
  • Message Security
  • Secure Connection Management
  • General Application Level Security
  • Period of Access.

4.1 SSL Certificate Management Observations
There does not appear to be a centralized authority to govern the way certificates are issued, installed and managed for “Regional Bank A.” General process around certificate management includes: certificate issuance (i.e. purchase and download), installation/configuration by administrator, tracking and renewal of expired certs, secure and re-issuance process to avoid multiple use and/or counterfeit certs.
It was observed that SSL certificates were found in various directories on the server. Moving forward, the recommendation is that certificates be stored immediately upon receipt in a secure Key Store. Certificate files should then be deleted from all other locations and system files.

Message Security Observations
  • MQM group on UNIX should not contain any members other than system IDs
  • All application IDs and people-user IDs should be placed in other group IDs that are specifically configured for their access and usage alone
  • Root should never be a member of the MQM group
  • “Minimum privilege” groups should be created and used for “read” access and configured in MQ Security to the objects required for usage
  • In outsourced IT environments, support groups should have minimum access privileges to prevent outages related to accidental operational support activity
  • Best practice is to use an MQ Change Request ID to access the MQM ID via the Unix “sudo” command for applying any changes or maintenance to MQ objects. This approach is also commonly referred to as granting access using a “Firecall” ID for specific instances when access is actually required while fully logging all activities performed by the ID during the period of access.

4.2 Application Connectivity For Message Queuing
MQ Client connectivity provides access to applications running remotely (on the application servers) with the ability to put and get from MQ queues. During the review, it was suggested that all consumers of the MQ environment should use only a single Client Channel definition. This is not recommended and falls outside of best practice for the following reasons:

  • Lack of application association on each individual connect and disconnect.
  • Security Authorization Records become extremely difficult to manage (for example, identifying who had access when an actual breach occurred).
  • Operational support resolution will require longer and possibly multiple outages to identify root cause of connection issues (applications that are long running).
  • Heightened risk of outages to larger groups of users: When a single consumer encounters a connection issue, there is higher risk that all consumers will be “kicked off” while a channel bounce is done to resolve connectivity issues.

4.3 Application Server Management Observations include:

  • WAS processes running on the servers using Root ID – this is a major security violation in financial-services industry.
  • A “wasadmin” Unix non-expiry ID should be used for the running of all WAS processes.
  • Access to the “wasadmin” ID should be managed operationally, granting a “firecall ID” in the same manner as outlined above for access to the MQM ID for changes and support.

4.4 Middleware Security Using “sudo”
In UNIX, the sudo command is enabled to control access via groups or user ids. Sudo can be focused to just explicit commands and options, and should always have full audit enabled for logging of user activity.

Standards and Best Practices

Throughout all aspects of the review, there appeared to be a disconnect between the “Regional Bank A” teams and the managed-services provider teams that were implementing and providing first-level support for both WAS and MQ. This disconnect can be resolved by:
1. Defining a single set of Standards, Practices and Guidelines issued by “Regional Bank A” that require unilateral adherence by MSP as well as by internal teams;
2. Setting up regular reviews of such policies and standards on a quarterly or project-by-project basis.
Architecture standards should exist in an ESB Architecture Guide, including the security policies for connectivity and access.
Other concerns and best practice observations are as follows:
5.1 WebSphere MQ
The key resource manager for all incoming and outgoing data for the ESB is controlled by the WebSphere MQ Queue Managers. Queue Manager base definitions were not found to be consistent and varied from default settings for what appear to be arbitrary reasons with high levels of inconsistencies across system and application-object configurations. These configurations do require some level of cleanup and maintenance for best practices environment management.
Use of NFS within the Linux/VM environment could be a regular source of compromise regarding high availability. When all other attempts have failed to resolve an NFS issue, the last resort is to bounce the NAS server, which results in immediate outage of all NAS services to all system consumers.
Instead, moving to a direct-storage product like Veritas™ Volume Manager is a cost-effective and reliable practice for ensuring high availability across clusters.
Also, consideration should be given to implementing MQ AMS (Advanced Message Security) to ensure compliance with PCI-DSS standards. This product is used to enforce encryption of messages at rest in the MQ queues to ensure that any and all access to queues will not provide access to readable message content. AMS in conjunction with MQ Security restriction of access will go far in preventing unauthorized access within this tier of the overall application architecture.
5.2 WebSphere Application Server
Several concerns were noted with the WAS implementation supporting “Regional Bank A’s” Java applications:

  • Operating systems not tuned according to minimum IBM standards
  • JVMs not tuned
  • Environment variables not being set
  • Single application/JVM profiles used on the assumption of securing data segregation of application data
Software Configuration Management and Deployment Automation

When changes are introduced into the ESB for software maintenance, when new applications are introduced, or when changes are made to enable better performance or transaction growth, a key area of concern for problem reduction and ongoing stability is to look at how such changes are introduced, tested and validated prior to deployment into the production environment where business transactions are running – the environment where interruption may involve loss of revenue for “Regional Bank A.”
Improvements in the following areas could be explored further for future engagement scope:
6.1 Software Configuration Management
How the application code is stored and version controlled is critical in the practice of software-configuration management. In addition, how the code is migrated to production is an area of extreme scrutiny for most financial-services companies. PCI compliance generally requires the demonstration of secure and formal access control around all source code and code-migration activities to production systems to ensure against introduction of rogue code or malware on financial systems.
Generally speaking, this is an area where best practice is quite mature as related to CMM and pre-Y2K efforts to manage the deployment of massive amounts of code change without business interruption – more from a stability and availability-management perspective.
Since most problems are related to changes made within the environment, most financial-services IT organizations are quite strict and process-oriented, with significant automation around the software-development life cycle (“SDLC”) to ensure against business disruption due to release testing in an environment that is not managed and controlled with the same configuration as production.
At “Regional Bank A,” application deployments are a highly manual effort with some utilization of homegrown scripts, which are subject to human error, inconsistent configurations, and are time-consuming to manage and support.
Key concerns with the current software-distribution strategy include:

  • High degree of error and inconsistencies
  • High labor cost in deployment process
  • High risk of losing skills relating to custom-deployment process and administrator knowledge of each application’s configuration and deployment requirements

Use of automation tools should be considered where:

  • Application changes are packaged into deployment “bundle” with clear associations between the configuration changes and application release dates to each environment
  • Automation tracks all individual components that constitutes a “bundle”
  • Fully automated backend process (including automated back-out of changes)
  • Provides push-button controlled/approved and self-service levels of deployment process
  • Logging of changes for configuration-management auditing
  • Maximizes access control for PCI-DSS compliance

6.2 Deployment Automation
In addition to managing the source code repository itself, management of deployment through deployment automation that encompasses both application changes as well as system changes is considered a best practice.
Though it is conceivable that scripts could be written to accommodate all of the various types of changes to all of the possible WebSphere products and components involved in the “REGIONAL BANK A” ESB configuration, it is not recommended due to the high complexity and amount of time required, which contributes to the overall cost of maintaining homegrown deployment scripts.
This reason alone is perhaps why “REGIONAL BANK A” deployments continue to be manual in nature.
In evaluating the available tools and utilities for automation deployments for WebSphere, consider that deployment of ESB changes are generally of two types:

  • Application Changes – Application changes include new message flows, new application queues, new .ear files or WTX maps, all of which have association with each other with regard to version control with the application bundle.
  • System Changes. System-level changes include the applying of hot fixes, fix packs, and major-release software-version levels. They could also involve environment-configuration settings such as adding a new application ID, group access, database driver, resource-connection pooling configuration and other parameters that enable better performance and throughput. Additionally, WebSphere and Java version levels are somewhat independent of each other, though many times showing critical dependencies with each other in terms of application functionality and thus require configuration management with application bundles.

As a result of the above, it is recommended that “Regional Bank A” consider packaged products that will automate systems as well as application deployments and manage technical dependencies without the use and ongoing maintenance of deployment scripts.
Because of the complexity of the ESB configuration, such products as Rational UDeploy in conjunction with Rational Team Concert are now considered a best-of-breed product combination for managing application configurations and software distribution for complex multi-product ESB customers.
In closing, the review and recommendations above should be considered for initiating infrastructure projects that will address and close the items of key concern. Additionally, the initiation of projects for addressing future automation, performance management and growth of the ESB should also be considered in both the near future and beyond for strategic reasons, as well as for ongoing compliance and growth supportability.
Photo courtesy of Flickr contributor “Info Cash”

Case Study: Middleware Compliance For Government Agencies

Project Description

Government client required oversight for proper implementation and development of integral middleware products.

The Situation

Our government client required a project manager to oversee, provide guidance and collaborate with MIP team regarding problems with the MIP and non-MIP application and system issues to include in the design, development, and implementation of enterprise applications using a variety of middleware products.
It was imperative that the job run smoothly as our client’s engineering services provide highly- efficient program management, financial management, systems engineering, and Integrated Logistic Support (ILS) to boost Navy crew safety and streamline processes.

Project Description

TxMQ’s consultant’s job requirement included managing a diversely skilled group of developers, including IBM subcontractors, to identify roles and responsibilities for the development team to execute the correct implementation and development of MIP Hosting Site Migration and System Upgrade.
The guidance for the integration, design and development incorporated advice on using the following middleware products:

  • WebSphere Process Server
  • WebSphere Portal Server
  • DB2 Content Manager
  • IBM HTTP Server
  • Tivoli Web Seal
  • Tivoli Access Manager
  • DB2
  • IBM Directory Server
  • Active Directory
The Response/Methodology

The initial step in the project included dividing and handing off development activities to members of the upgrade test team for MIP and Non-MIP applications.
Coordination and collaboration was imperative through internal cross function teams and external vendors. The project manager supervised these exchanges ensuring that each party had the information needed to complete the task at hand. By overseeing the development team and managing expectations and providing direction, the process of code maintenance and technical deployment of new code and patches began.
Throughout the process our project manager updated the client via status meetings and written internal project reports.

Compliance

It was the responsibility of our project manager to ensure compliance with CMMI SW Level 3 processes for the work managed at ZAI.
Photo by Flickr contributor Andy Piper.

Case Study: WMQ HealthCheck

Project Description

Large regional bank commissioned TxMQ to conduct a WebSphere MQ environment review to maximize use of resources and technology.

The Situation

TxMQ’s client, a large regional bank, requested a WebSphere MQ-environment HealthCheck. The Bank uses WMQ as a shared middleware infrastructure for all critical business applications to send and receive critical data between them.
In addition, the bank requested that TxMQ identify a solution configuration to meet the bank’s needs.

The Objective

The first key objective for the HealthCheck was to provide senior architecture consulting and a broad assessment and review. The assessment and review would be used to provide flow documents and possible recommendations including, but not limited to:

  • Software and system-configuration changes or upgrades/support packs needed
  • Technical impact and performance analysisResponse to solutions impactWritten recommendations based on meetings with stakeholdersStakeholder concerns mapped to specific infrastructure objectivesA second key objective for the HealthCheck was to analyze the WMQ environment and configuration to eliminate any potential performance, availability and security exposure, and to ensure there were no roadblocks for future growth and new applications. This included an analysis of current staffing structure, number and size.
    The final objective was to analyze the proposed capacity-increase solutions and the impact of each solution to support increases in business processes up to 100% of the existing transaction volumes. The analysis would help the client effectively plan MQ-capacity management strategy, identify technical dependencies and assist with identifying the costs and benefits.

    The Response

    TxMQ’s consultant – an MQ Subject Matter Expert – spent a little more than a week onsite with the MQ team at the bank. TxMQ dove into a diagnosis of the current OS and middleware environment, which included CICS V3.2, WebSphere MQ V7.01 and DB2 V8.
    The client had more than 23 MQ Managers spread over Mainframe, AIX and Windows. Some managers were used for development and testing and were standalone. For production, many of the applications went through more than one queue manager.
    In conjunction with the bank’s internal teams, the TxMQ consultant was able to deep-dive into the environment and architecture and create a list of observations and recommendations.

    Findings

    The HealthCheck revealed no major or current issues. Our consultant’s general observations were as follows:

    • Sound architecture and topology
    • WMQ V6 is due for upgrade
    • Uniform procedures and proper validation for software promotion
    • Add additional load-test environments to facilitate updates
    • Excellent logic and implementation of naming conventions
    • Offsite z/OS image available for regular testing
    • Offsite z/OS image available for regular disaster-recovery exercises
    • Solid business-continuity plans
    • Infrastructure is ready, should disaster recovery be needed
    • Tivoli OmegaMon XA used for monitoring
    • No high-level application and architectural documents readily available
    • System is well-tuned
    • Security is solid with only a small number of areas that need to be addressed
    • Excellent IT skills team with good problem-determination skills
    • Excellent working group with positive attitude
    • Staffing is lean compared to similar organizations. ?In the last 13 months there were three system outages due to application issues. There is no concern with the number of instances and it was determined that there is a well-designed and functioning system in place for tracking and logging instances.
    Additional Recommendations

    The TxMQ consultant provided the following additional recommendations to the bank:

    • Evaluate recent WMQ security features
    • Plan for WMQ migration to V 7.1 or 7.5
    • Modify the current WMQ recovery topology to handle possibility of improved recovery time for a mainframe outage
    • Implement shared queue when Sysplex is available
    • Evaluate additional ESB architecture
    • Review staffing

    Photo courtesy of FLickr contributor Howard Lake.

Case Study: WebSphere Commerce 6.0 To 7.0 Migration

Project Description

A large retailer using WebSphere Commerce 6 had failed in a previous migration to WebSphere Commerce 6.1. The retailer now needed new features found in WebSphere Commerce 7. TxMQ, an IBM Premier Business Partner, added two brand-new clustered WebSphere 7 environments. The result: TxMQ implemented WebSphere 7 and delivered twice the processing capacity using half the number of clustered servers with no downtime to operations.

The Situation

A large retailer (the “client”) was using WebSphere Commerce 6 running clustered WebSphere Application Server (WAS) 6.0 and backend clustered services running WAS 6.0. Both WebSphere environments were on AIX 6. The frontend Commerce servers maintained the website, while the backend services performed shipping, email notifications, billing and inventory. The backend services also hosted a homegrown call-center application. The development environment staged 16 separate JVMs to manage disparate projects. The QA environment housed four clustered application servers in a three-tiered architecture that exactly mirrored production.

The Challenge

A previous migration to WebSphere 6.1 had failed and was subsequently abandoned. With the release of WebSphere Commerce 7, the client’s development team needed to utilize new features found in WebSphere Commerce 7. There was also no tolerance for downtime, whether it be in development or production. Automated deployment scripts were requested for each environment.

The Response

The key to project success was to analyze what had been unsuccessful in the past. From that initial conversation, TxMQ split the project by environment. The first environment was the client’s frontend WebSphere Commerce 6 environment. The second was the backend WAS services.
TxMQ attempted two separate approaches to tackle the frontend environment. The first approach was to migrate the Commerce 6.0 application into version 7 using the supplied IBM tools. When that failed, TxMQ went to the backup approach and installed WebSphere 7 on new, recently purchased hardware. The existing installation of WebSphere 6 was on old, about-to-be-retired hardware. While the original approach was to eventually migrate to the new hardware once WebSphere 7 was installed, TxMQ improvised and installed 7 only on the new hardware. This made it simple to maintain the requirement for downtime, because the team could build alongside the existing architecture. There was little to no downtime and testing proceeded smoothly without interruption to the legacy environment. Cutover was as simple as shutting off the legacy lpars.
Within the second environment, the application was a bit simpler than the Commerce. The IBM-supplied tools migrated that application successfully and TxMQ was able to successfully complete all tasks using the first approach of an in-place migration.
In hindsight, TxMQ would recommend the client perform a full reinstall of WebSphere 7 because of the significant changes to the SSL certificates between WebSphere 6 and 7. In order to re-enable security, TxMQ had to completely rebuild the client’s legacy certificates (that were transferred over by IBM tools). In a PMR to IBM, it was acknowledged that this was a flaw in the migration which resulted in corrupted security keys.

The Result

In summary, TxMQ added two brand-new clustered WebSphere 7 environments with little to no downtime to the client’s project schedule. The client’s development environment was perfectly mirrored from legacy and was switched with little to no impact to the existing schedule. Performance testing indicated that the client could run nearly twice the seasonal load on half of the usual hardware. Whereas the client could process 4,000 orders/hour with 8 clustered servers, the client could now process nearly 8,000 orders/hour with just 4 clustered servers.
TxMQ was able to provide extensive documentation on the installation process for both the backend and frontend environments. Additionally, TxMQ provided documentation for the deployment scripts, as well as updated architecture diagrams for both the deployment script and the new WebSphere 7.

Lessons Learned

Initially, TxMQ experienced problems with the 32-bit application – it would run out of memory due to excessive class loaders. The solution lay within a parameter buried deep in the infocenter that accepted a code to limit the number of classloaders.
WebSphere Commerce is caching-intensive. TxMQ executed a highly significant amount of performance-tuning around the dynacache settings. Some new distributed-object caching methods introduced in WebSphere 7 helped achieve greater order throughput. However, there was a need to rewrite the cachespec.xml to utilize these features.
While migrations were usually successful, in the end everybody agreed that a clean install was faster.
Photo courtesy of Flickr contributor David Precious

Case Study: CICS HealthCheck

Project Description

Large regional bank commissioned TxMQ to conduct a CICS architecture and implementation review to maximize use of resources and technology.

The Situation

TxMQ’s client, a large regional bank, requested a CICS HealthCheck. Under the request, TxMQ would assess the current infrastructure and verify that the environments were running optimally (response times and availability), and assess how the client used them in comparison with other institutions of like size and structure.
In addition, TxMQ was asked to identify opportunities for different and greater functionality, as presented by the technology, and to determine what business solutions might be recommended.

The Objectives

The first key objective for the HealthCheck was to provide senior architecture consulting, plus a broad assessment and review. The assessment and review would be used to provide flow documents and possible recommendations including, but not limited to:

  • Software and system-configuration changes or upgrades/support packs needed
  • Technical impact and performance analysis
  • Response to solutions impact
  • Written recommendations based on meetings with stakeholders
  • Stakeholder concerns mapped to specific infrastructure objectives

A second key objective for the HealthCheck was to analyze the WMQ environment and configuration to eliminate any potential performance, availability and security exposure, and to ensure there were no roadblocks for future growth and new applications. This included an analysis of current staffing structure, number and size.
The final objective was to analyze the proposed capacity-increase solutions and the impact of each solution to support increases in business processes up to 100% of the existing transaction volumes. The analysis would help the client effectively plan MQ/CICS capacity-management strategy, identify technical dependencies and assist with an identification of the costs and benefits.

The Response

TxMQ’s consultant – an MQ Subject Matter Expert – spent a little more than a week onsite with the CICS/MQ team at the bank.
TxMQ dove into a diagnosis of the current OS and middleware environment, which included CICS V3.2, WebSphere MQ V7.01 and DB2 V8. The Client had more than 85 CICS regions deployed, many of which were standalone and used for development and testing.
TxMQ’s consultant completed a review of CICS parameters and specifications for the bank’s main production regions. The consultant also assessed the source of three system outages over the prior 13 months due to application issues.
In conjunction with the bank’s internal teams, the TxMQ consultant was able to deep-dive into the environment and architecture and create a list of observations and recommendations.

Findings

The HealthCheck revealed no major or current issues. The consultant’s general observations were as follows:

  • Sound Architecture
  • Highly motivated and skilled professionals
  • Excellent communication among management, teams and members
  • Need for ongoing skill updates
  • Fast problem resolution
  • Formal structure for support and problem facilitation
  • Central change management
  • No performance issues
  • Management aware of and willing to address issues
  • Good planning

The three system outages were sourced to application issues. TxMQ recommended the client reduce additional outages by:
1) Thorough application testing, and 2) Utilization of a disaster-recovery system identical to the production environment.
As far as CICS parameters, no networking issues were detected and only minor changes were suggested.
The client is proactively planning upgrade activities as well as additional training for its staff.
Photo courtesy of Flickr contributor Bob Mical

Case Study: Z/OS Performance Check

Project Description

National product-transport-logistics company (client) sought WebSphere Z/OS environment performance assessment and analysis to solve memory issues.

The Situation

The client experienced CPU spikes and service timeouts that caused application-hang conditions with a need to recycle the WAS app servers. The workload volume approached peak for the business process, with the need to address JVM configuration and settings changes to potentially alleviate CPU spikes.

The Analysis

Statement Of The Problem: The cause of the client’s WebSphere issue was within one of the WebSphere Servant Regions. The region was running out of memory and performing a large volume of garbage collection – so much that this WebSphere region took over most of the z/AAP capacity. The question was: How could the client control this activity with WLM?

Tools

TxMQ performed an analysis using the following tools:

  • Z/OS RMF reports
  • Javacore analysis using the IBM Support Assistant Workbench (ISAW)
  • WebSphere GC logs analysis using the ISAW
The Assessment

TxMQ logged the following observations:

  • The WebSphere infrastructure was remarkably well-maintained and well-tuned for the hardware environment. In addition, TxMQ was impressed by the client’s knowledge base.
  • The environment was not technically restrained – it could do what the client “wanted it to do.” Instead, lack of hardware and application design were of issue.
  • The Z/OS infrastructure was problematic. The main issue: What does one do when the z/AAP goes away?
  • The situation pointed to an application-design issue. All tests pointed to a condition whereby the application was attempting to do too much, based upon less-than-optimal design and execution.
Javacor Analysis

Following is a summary of notes from the Javacore analysis:

  • Number of threads allocated to ORB should be investigated
  • Recommended: Xmxcl setting (only for IBM Java 5.0, up to and including
  • Service Refresh 4)
  • Thread-status analysis: Noticed 26 threads were actually running. Code pointed to ORB, EJBs
  • No one was waiting on a monitor, yet there were 91 waiters. It appeared the code was not doing the same thing on the other threads. (A monitor is how java keeps code reusable and allows resource sharing.)

ORB thread pool stack: Almost all the running threads look like this:

Memory Segment Analysis

This analysis showed the JVM to be running out of memory.

Recommendations

At the conclusion of the review, TxMQ discussed recommendations with the client, but the client remained concerned that the recommendations still did not address the issue of CPU overload. Some results from that conversation:

  • The client’s applications team decided to support a move to WASv7.x with the expectation of increased performance.
  • WLM management is an area that needs review, but any modifications (re: priority levels for WebSphere) are problematic.
  • The additional zAAP is preventing a system heart attack. The general-processor/zAPP costs must be accounted for before the zAPP expired in the next 6 weeks.
  • Best value-for-spend would be an application-review of the heap usage.
Client To-Do List

Client takeaway from performance review:

  • Add more ORB threads
  • Add more memory to the JVMs
  • Look into JVM –Xmxcl setting
  • Investigate how EJBs are using connection pools
  • Get the GP and keep the zAAP
  • Set GC-threads limitation on the WebSphere JVM to avoid the taking of all CPUs
    when doing GC
  • Profile the application and tune it with the view of system resources, parsing
    and messaging
  • Investigate DataPower appliance as a method of offloading Soap services, XML parsing and messaging
  • Choose a DB Subject Matter Expert to look over the design, tuning and capacity-planning of the subsystem.

Photo courtesy of Flickr contributor xdxd_vs_xdxd

Case Study: CMS Site Design & Deployment

Project Description

Indian nation (client) sought update to current static-html website. TxMQ designed a site that was dynamic and easily updated via Content Management System.

The Situation

As part of an overall grand plan to streamline and track customer data, the client needed the assistance of a system analyst to provide ongoing remote-development work for the Indian nation’s gaming website and several web properties within the site. There were a total of 17 gaming locations, with two of the client’s largest locations most in need of an updated website.

The Objectives

The client’s current static-HTML website was difficult to update on a frequent basis. In order to create a dynamic, ever-changing and easily-updatable website, the client needed to transition the website to a JSP Content Management System (CMS) to promote casinos and gaming centers.
There were two websites in need of a CMS implementation and the client needed to evaluate and choose the proper system to complete the job.

The Response

A TxMQ consultant was supplied with a fairly complete site design in the HTML format. His first steps were to convert the static HTML instance to a CMS-enabled, dynamic-JSP website.
To create the new site, a CMS site was managed on a development server. Thus, all work orders and progress could be completed within the development environment before live deployment.
Using the Alterian Content Management System (ACM), the TxMQ consultant defined the required content types, templates and pages needed to build out the website. After that he installed all the resources (text content, images, xml, Flash, configuration and more) into the ACM repository.
Testing immediately followed to confirm all features and functionality of the CMS, after which it was turned over to an in-house Quality Assurance (QA) team. After the QA-requested changes were completed and enhancements made, the site was deployed to the staging server and then pushed to live.
The outset of the entire project and all its components allowed the client to clearly mine and track customer data, which led to a 15% to 20% growth in revenues for the commerce division.
Photo courtesy of Flickr contributor Peter Dutton