TxMQ Staff Consultants Contributed To This Write-Up
NOTE ON SOURCES/DOCUMENT PURPOSE
All guide sources come from well-documented IBM or IBM partner’s reference material. The reason for this document is simple: Take all relevant sources and put their salient points into a single, comprehensive document for reliable set-up and tuning of a z/Linux environment.
The ultimate point is to create a checklist and share best practices.
Assemble a team that can address all aspects of the performance of the software stack. The following skills are usually required:
- Overall Project Coordinator
- VM Systems Programmer. This person set up all the Linux guests in VM.
- Linux Administrator. This person installed and configured Linux.
- WebSphere Administrator.
- Lead Application Programmer. The person can answer questions about what the application does and how it does it.
- Network Administrator.
TIP: Start from the outside and work inward toward the application.
The environment surrounding the application causes about half of the potential performance problems. The other half is caused by the application itself.
Start with the environment that the application runs in. This eliminates potential causes of performance problems. You can then work toward the application in the following manner.
1. LPAR Things to look at: Number of IFLs, weight, caps, total real memory, memory allocation between cstore and xstore
2. VM Things to look at: Communications configuration between Linux guests and other LPARs, paging space, share settings
3. Linux Things to look at: Virtual memory size, virtual CPUs, VM share and limits, swapping, swap file size, kernel tuning
4. WebSphere Things to look at: JVM heap size, connection pool sizes, use of caches. WebSphere application performance characteristics
Defining LPAR resource allocation for CPU, memory, DASD, and network connections
Adjust depending on the environment (prod, test, etc…)
VIRTUAL MACHINE (VSWITCH) FOR ALL LINUX GUEST SYSTEMS A GOOD PRACTICE.
With VSWITCH, the routing function is handled directly by the virtual machine’s (VM’s) Control Program instead of the TCP/IP machine. This can help eliminate most of the CPU time that was used by the VM
router it replaces, resulting in a significant reduction in total system CPU time.
– When a TCP/IP VM router was replaced with VSwitch, decreases ranging from 19% to 33% were observed.
– When a Linux router was replaced with VSwicth, decreases ranging from 46% to 70% were observed.
NOTE: The security of VSwitch is not equal to a dedicate firewall or an external router, so when high security is required of the router function, consider using those instead of VSwitch.
Z/VM VSWITCH LAN
Configuration resulted in higher throughput than the Guest LAN feature.
Guest LAN is ring based. It can be much simpler to configure and maintain.
HIPERSOCKETS FOR LPAR-LPAR COMMUNICATION
Tips for Avoiding Eligible Lists:
- Set each Linux machines virtual-storage size only as large as it needs to be to let the desired Linux application(s) run. This suppresses the Linux guest’s tendency to use its entire address space for file cache. If the Linux file system is hit largely by reads, you can make up for this with minidisk cache (MDC). Otherwise, turn MDC off, because it induces about an 11-percent instruction-path-length penalty on writes, consumes storage for the cached data, and pays off little because the read fraction isn’t high enough.
- Use whole volumes for VM paging instead of fractional volumes. In other words, never mix paging I/O and non-paging I/O on the same pack.
- Implement a one-to-one relationship between paging CHPIDs and paging volumes.
- Spread the paging volumes over as many DASD control units as possible.
- Turn on thw paging control units of they support non-volatile storage (NVS) or DASD fast write (DASDFW), (applies to RAID devices).
- Provide at least twice as much DASD paging space (CP QUERY ALLOC PAGE) as the sum of the Linux guests’ virtual storage sizes.
- Having at least one paging volume per Linux guest is a great thing. If the Linux guest is using synchronous page faults, exactly one volume per Linux guest will be enough. If the guest is using asynchronous page faults, more than one per guest might be appropriate; one per active Linux application will serve the purpose.
- In queued direct I/O (QDIO)-intensive environments, plan that 1.25MB per idling real QDIO adapter will be consumed out of CP below-2GB free storage, for CP control blocks (shadow queues). If the adapter is being driven very hard, this number could rise to as much as 40MB per adapter. This tends to hit the below-2 GB storage pretty hard. CP prefers to resolve below-2GB contention by using expanded storage (xstore).
- Consider configuring at least 2GB to 3GB of xstore to back-up the below-2GB central storage, even if central storage is otherwise large.
- Try CP SET RESERVE to favor storage use toward specific Linux guests.
Memory Management and Allocation
Add 200-256MB for WebSphere overhead per guest.
Configure 70% of real memory as central storage (cstore).
Configure 30% of real memory as expanded storage (xstore). Without xstore VM must page directly to DASD, which is much slower than paging to xstore.
CP SET RESERVED. Consider reserving some memory pages for one particular Linux VM, at the expense of all others. This can be done with a z/VM command (CP SET RESERVED).
If unsure, a good guess at VM size is the z/VM scheduler’s assessment of the Linux guest’s working set size.
Use whole volumes for VM paging instead of fractional volumes. In other words, never mix paging I/O and non-paging I/O on the same pack.
Implement a one-to-one relationship between paging CHPIDs and paging volumes.
Spread the paging volumes over as many DASD control units as you can.
If the paging control units support NVS or DASDFW, turn them on (applies to RAID devices).
CP QUERY ALLOC PAGE. Provide at least twice as much DASD paging space as the sum of the Linux guests’ virtual storage sizes.
Having at least one paging volume per Linux guest is beneficial. If the Linux guest is using
synchronous page faults, exactly one volume per Linux guest will be enough. If the guest is using asynchronous page faults, more than one per guest may be appropriate; one volume per active Linux application is realistic.
In memory over commitment tests with z/VM, increasing the memory over commitment up to a ratio of 3.2:1 occurred without any throughput degradation.
Cooperative Memory Management (CMM1) and Collaborative Memory Management (CMM2) both regulate Linux memory requirements under z/VM. Both methods improve performance when z/VM hits a system memory constraint.
Utilizing Named Saved Segments (NSS), the z/VM hypervisor makes operating system code in shared real memory pages available to z/VM guest virtual machines. With this update, multiple Red Hat Enterprise Linux guest operating systems on the z/VM can boot from the NSS and be run from a single copy of the Linux kernel in memory. (BZ#474646)
Expanded storage for VM. Here are a few thoughts on why:
While configuring some xstore may result in more paging, it often results in more consistent or better response time. The paging algorithms in VM evolved around having a hierarchy of paging devices. Expanded storage is the high speed paging device and DASD the slower one where block paging is completed. This means expanded storage can act as a buffer for more active users as they switch slightly between working sets. These more active users do not compete with users coming from a completely paged out scenario.
The central versus expanded storage issue is related to the different implementations of LRU algorithms used between stealing from central storage and expanded storage. In short, for real storage, you use a reference bit, which gets reset fairly often. While in expanded storage, you have the luxury of having an exact timestamp of a block’s last use. This allows you to do a better job of selecting pages to page out to DASD.
In environments that page to DASD, the potential exists for transactions (as determined by CP) to break up with the paging I/O. This can cause a real-storage-only configuration to look like the throughput rate is lower.
Also configure some expanded storage, if needed, for guest testing. OS/390, VM, and Linux can all use expanded storage.
VM SCHEDULER RESOURCE SETTINGS
Linux is a long-running virtual machine and VM, by default, is set up for short-running guests. This means that the following changes to the VM scheduler settings should be made. Linux is a Q3 virtual machine, so changing the third value in these commands is most important. Include these settings in the profile exec for the operator machine or autolog1 machine:
set srm storbuf=300,200,200
set srm ldubuf=100,100,100
Include this setting in the PROFILE EXEC for the operator machine or AUTOLOG1 machine.
DO I NEED PAGING SPACE ON DASD?
YES. One of the most common mistakes with new VM customers is ignoring paging space. The VM system, as shipped, contains enough page space to get the system installed and running some small trial work. However, you should add DASD page space to do real work. The planning and admin book has details on determining how much space is required.
Here are a few thoughts on page space:
If the system is not paging, you may not care where you put the page space. However, sooner or later the system grows to a point where it pages and then you’ll wish you had thought about it before this happens.
VM paging is most optimal when it has large, contiguous available space on volumes that are dedicated to paging. Therefore, do not mix page space with other space (user, t-disk, spool, etc.).
A rough starting point for page allocation is to add up the virtual machine sizes of virtual servers running and multiple by 2. Keep an eye on the allocation percentage and the block read set size.
See: Understanding poor performance due to paging increases
USER CLASSES AND THEIR DESCRIPTIONS
If you have command privilege class E, issue the following CP command to view information about these classes of user: INDICATE LOAD
A minimal Linux guest system fits onto a single 3390-3 DASD, and this is the recommended practice in the field. This practice requires that you do not use GNOME or KDE window managers in order to retain the small size of the installed system. (The example does not do this because we want to show the use of LVM and KDE).
VM SHARED KERNEL SUPPORT
If your Linux distribution supports the “VM shared kernel support” configuration option, the Linux kernel can be generated as a shareable NSS (named saved system). Once this is done, any VM users can IPL LXSHR and about 1.5M of the kernel is shared among all users. Obviously, the greater number of Linux virtual machines running, the greater the benefit of using the shared system.
Makes a virtual machine exempt from being held back in an eligible list during scheduling when system memory and/or paging resources are constrained. Virtual machines with QUICKDSP set on go directly to the dispatch queue and are identified as Q0 users. We prefer that you control the formation of eligible lists by tuning the CP SRM values and allowing a reasonable over-commitment of memory and paging resources, rather than depending on QUICKDSP.
Defined with an assigned number of virtual CPs, and a SHARE setting that determines each CP’s share of the processor cycles available to z/VM.
When running WebSphere applications in Linux, you are typically able to over-commit memory at a
1.5/1 ratio. This means for every 1000 MB of virtual memory needed by a Linux guest, VM needs to
have only 666 MB of real memory to back that up. This ratio is a starting point and needs to be adjusted based on experience with your workload.
LINUX SWAP – WHERE SHOULD LINUX SWAP?
Try to avoid swapping in Linux whenever possible. It adds path length and causes a significant hit to response time. However, sometimes swapping is unavoidable. If you must swap, these are some pointers:
Prefer swap devices over swap files.
Do not enable MDC on Linux swap Mini-Disks. The read ratio is not high enough to overcome the write overhead.
We recommend a swap device size approximately 15% of the VM size of the Linux guest. For example, a 1 GB Linux VM should allocate 150 MB for the swap device.
Consider multiple swap devices rather than a single, large VDISK swap device. Using multiple swap devices with different priorities can alleviate stress on the VM paging system when compared to a single, large VDISK.
Linux assigns priorities to swap extents. For example, you can set up a small VDISK with higher priority (higher numeric value) and it will be selected for swap as long as there is space on the VDISK to contain the process being swapped. Swap extents of equal priority are used in round-robin fashion. Equal prioritization can be used to spread swap I/O across chpids and controllers, but if you are doing this, be careful not to put all the swap extents on Mini-Disks on the same physical DASD volume. If you do, you will not be accomplishing any spreading. Use swapon-p… to set swap extent priorities.
VDISK VS. DASD
The advantage of VDISK is that a very large swap area can be defined at very little expense. The VDISK is not allocated until the Linux server attempts to swap. Swapping to VDISK with the DIAGNOSE access method is faster than swapping to DASD or SCSI disk. In addition, when using a VDISK swap device, your z/VM performance management product can report swapping by a Linux guest.
Swapping to DCSS is the fastest known method. As with VDISK, the solution requires memory. But lack of memory is the reason for swapping. So it could preferably be used as a small fast swap device in peak situations. The DCSS swap device should be the first in a cascade of swap devices, where the following could be bigger and slower (real disk). The swapping to DCSS adds complexity.
Create an EW/EN DCSS and configure the Linux guest to swap to the DCSS. This technique is useful for cases where the Linux guest is storage-constrained but the z/VM system is not. The technique lets the Linux guest dispose of the overhead associated with building channel programs to talk to the swap device. For one illustration of the use of swap-to-DCSS, read the paper here.
If the storage load on your Linux guest is large, the guest might need a lot of room for swap. One way to accomplish this is simply to ATTACH or DEDICATE an entire volume to Linux for swapping. If you have the DASD to spare, this can be a simple and effective approach.
Using a traditional Mini-Disk on physical DASD requires some setup and formatting the first time and whenever changes in size of swap space are required. However, the storage burden on z/VM to support Mini-Disk I/O is small, the controllers are well-cached, and I/O performance is generally very good. If you use a traditional Mini-Disk, you should disable z/VM Mini-Disk Cache (MDC) for that Mini-Disk (use MINIOPT NOMDC statement in the user directory).
A VM temporary disk (t-disk) could be used. This lets one define disks of various sizes with less consideration for placement (having to find ‘x’ contiguous cylinders by hand if you don’t have DIRMAINT or a similiar product). However, t-disk is temporary, so it needs to be configured (perhaps via PROFILE EXEC) whenever the Linux VM logs on. Storage and performance benefits of traditional Mini-Disk I/O apply. If you use a t-disk, you should disable Mini-Disk cache for that Mini-Disk.
A VM virtual disk in storage (VDISK) is transient like a t-disk is. However, VDISK is backed by a memory address space instead of by real DASD. While in use, VDISK blocks reside in central storage (which makes it very fast). When not in use, VDISK blocks can be paged out to expanded storage or paging DASD. The use of VDISK for swapping is sufficiently complex, so reference this separate tips page.
Attach expanded storage to the Linux guest and allow it to swap to this media. This can give good performance if the Linux guest makes good use of the memory, but it can waste valuable memory if Linux uses it poorly or not at all. In general, this is not recommended for use in a z/VM environment.
GC POLICY SETTINGS
The -Xgcpolicy options have these effects:
Disables concurrent mark. If you do not have pause time problems (as seen by erratic application response times), you get the best throughput with this option. Optthruput is the default setting.
Enables concurrent mark with its default values. If you are having problems with erratic application response times that are caused by normal garbage collections, you can reduce those problems at the cost of some throughput, by using the optavgpause option.
Requests the combined use of concurrent and generational GC to help minimize the time that is spent in any garbage collection pause.
Disables concurrent mark. It uses an improved object allocation algorithm to achieve better performance when allocating objects on the heap. This option might improve performance on SMP systems with 16 or more processors. The subpool option is available only on AIX®, Linux® PPC and zSeries®, z/OS®, and i5/OS®.
ANY FORM OF CACHING
Resulted in a significant throughput improvement over the no caching case, where Distributed map caching generated the highest throughput improvement.
DYNACACHE DISK-OFF LOAD
Interesting feature meant to significantly improve the performance with small caches without additional CPU cost.
PERFORMANCE TUNING FOR WEBSPHERE
The following recommendations from the Washington Systems Center can improve the performance of your WebSphere applications:
- Use the same value for StartServers, MaxClients, and MaxSpareServers parameters in the httpd.conf file.
Identically defined values avoid starting additional servers as workload increases. The HTTP server error log displays a message if the value is too low. Use 40 as an initial value.
- Serve image content (JPG and GIF files) from the IBM HTTP Server (IHS) or Apache
Do not use the file serving servlet in WebSphere. Use the DocumentRoot and
directives, or the ALIAS directive to point to the image file directory.
- Cache JSPs and Servlets using the servletcache.xml file.
A sample definition is provided in the servletcache.sample.xml file. The URI defined in the
servletcache.xml must match the URI found in the IHS access log. Look for GET statements, and a definition for each for each JSP or servlet to cache.
- Eliminate servlet reloading in production.
Specify reloadingEnabled=”false” in the ibm-web-ext.xml file located in the application’s
- Use Resource Analyzer to tune parameter settings.
Additionally, examine the access, error, and native logs to verify applications are functioning correctly.
- Reduce WebSphere queuing.
To avoid flooding WebSphere queues, do not use an excessively large MaxClients value in the httpd.conf file. The Web Container Service General Properties MAXIMUM THREAD SIZE value should be two-thirds the value of MaxClients specified in the httpd.conf file. The Transport MAXIMUM KEEP ALIVE connections should be five more than the MaxClients value.
- sysstat package with sadc, sar, iostat
- dasd statistics
- SCSI statistics
z/VM Performance Toolkit. PERFORMANCE TOOLKIT FOR VM, SG24-6059
THE WAS_APPSERVER.PL SCRIPT
This perl script can help determine application memory usage. It displays memory used by WebSphere as well as memory usage for active WebSphere application servers. Using the Linux ps command, the script displays all processes containing the text “ActiveEJBServerProcess” (the WebSphere application server process). Using the RSS value for these processes, the script attempts to identify the amount of memory used by WebSphere applications.