flam4_manual-flam4_utility-performance_considerations

Performance Considerations

Performance (CPU/memory utilization and throughput) is an important topic when using FLAM. The following points describe some major factors that contribute to the overall performance of FLAM. They should give you a good starting point if you experience performance issues.

Pre-load architecture

In order to minimize consumed CPU cycles during mass data processing, FLAM pre-calculates as much as possible before processing any data, which requires memory on the one hand and spends some CPU time on the other. If you process many small files with FLAM, this can have a negative effect on performance. FLAM is not designed and developed to process small amounts of data.

Hardware accelerator support

FLAM uses special instructions of the respective hardware platforms to be able to accelerate some algorithms. On IBM's mainframes, for example, this concerns the use of CPACF and zEDC.

Outsourcing of computing power to zIIP

For IBM mainframe systems, it is possible to outsource memory-to-memory functions (compression, conversion, formatting) as SRBs to so-called zIIP processors. Although this does not result in a performance advantage, it does save chargeable CPU time (MSU).

Record count and block size

FLAM processes data in self-sufficient segments, which represent a block or a certain number of records. The amount of data that is processed in one go determines the throughput performance.

It is important to understand that there is an overhead per segment, so the segments should not be too small. However, if too much data is put into a segment, then the processing constantly takes place outside the caches, which also reduces the performance. Here, the block size or the record count must be optimally selected for the respective platform.

For historical reasons, FLAM works with a default block size of 64 KiB and a record count of 512. This is a good average value across all platforms, but not always optimal.

It should also be noted that such a segment is expanded after decompression, i.e. in this case one should always start with a smaller block size.

On many systems, 256 KiB is now a good block size. The optimal record count is the optimal block size divided by the average record length.

Character set conversion

Character set conversion can take up a lot of CPU time. By default, FLAM checks each character for validity and stops processing with an error if invalid or unconvertible characters are found (MODE=STOP). This form of error handling is relatively expensive in terms of CPU utilization, but ensures detection of incorrectly encoded text data. Enabling error handling that leads to the replacement of these incorrect characters, processing can be accelerated. The character set conversion is faster if all characters of the input alphabet result in data of the same length in the output alphabet.

With the environment variable FL_DEFAULT_CHAR_MODE (or the system variable &FLCHARM on z/OS), the default mode for error handling during character set conversion can be changed from STOP to SUBSTITUTE, for example, in order to achieve higher performance here.

On z/OS, FLAM uses the IBM Unicode Services as long as it supports the required character set conversion and is known to be faster than FLAM's implementation. This only works for simple conversions without mappings, subsets, reporting or transliterations. The environment variable FL_IBM_UNICODE_SERVICE (or the system variable &FLIBMUS) can be used to enable/disable the use of IBM Unicode Services. By setting the variable to ON instead of the default AUTO, the IBM Unicode Services will also be used if STOP is selected as the character conversion mode, which saves a lot of CPU cycles, but at the cost of no longer noticing some errors in the character data.

Logging

FLAM writes many log messages by default, which also takes up CPU time and I/O. The QUIET or SILENT keywords and the MESSAGE object can be used to control the amount of output and thus limit or increase the amount of work involved. Often, during the transition from test to production, people forget to reduce FLAM's chattiness, which cost both disk space and CPU cycles in production.

Literal cache

FLAM uses a global literal cache to save memory and CPU when comparing strings. Here a hashtable is used. Its size can be defined via the environment variable FL_LITERAL_CACHE_SIZE (or system variable &FLITCS). In the debug log, the appropriate statistics are output, which may help you to make adjustments if necessary.

ZIP archives and PS-VB on Mainframes

When creating ZIP archives, offsets and statistics values must eventually be written into the member headers, which requires repositioning (fseek()) within the file.

This is possible for host datasets on z/OS via so-called RBA access. In a PS-VB file, this requires a tree in the background where the OS remembers which relative byte address (RBA) is in which block, in which records at which offset in the record. This is extremely time-consuming and if there are many members in the ZIP archive, this can increase the runtime considerably. The maintenance of this tree also costs a lot of CPU, which is not charged to the application.

To avoid this, you should use file types for ZIP archives that are suitable for RBA access. Under z/OS, these are primarily never files in the USS. Alternatively, FLAM also supports streamed ZIP archives, as they were developed for piping under Unix systems.

With the keyword STREAM, you can force a purely sequential write, which then also works with a PS-VB, but you should check whether ZIP tool used for extraction understands this streamed format.

With PS-FB datasets, the RBA access is faster because you do not need a tree, but there is padding at the end of the ZIP file. Since ZIP archives have to be read from behind, many other tools cannot cope with this padding because they expect the ZIP directory and not 0 bytes or blanks. With FLAM this is not a problem, but as already mentioned, many other tools then recognize the ZIP file as incomplete or not even as a ZIP file.

z/OS CEEOPTs (from install.txt)

For memory allocation, don't use the realloc control feature of CEE.

_CEE_REALLOC_CONTROL=bound,percentage

FLAM is only tested with default settings. With some memory allocation strategies out of memory errors can occur.

For good performance and enough memory we recommend the IBM default runtime options except for run CEEOPTS below:

//CEEOPTS  DD *
ALL31(ON)
STACK(1M,1M,ANYWHERE,KEEP,512K,128K)
HEAP(1M,1M,ANYWHERE,KEEP,8K,4K)
/*

With ALL31(OFF) and HEAP(,,BELOW,,) the available memory is too small for the stack in some cases, especially when using the FLUC subsystem, table or large-record support. Since version 5.1.25, FLAM uses application-specific run-time option defaults with the CEEUOPT assembly language source program to set these application-specific option defaults. The CEEUOPT3 member used for 31 bit and the CEEUOPT6 member used for 64 bits are located in hlq.FLAM.SRCLIB. We recommend building and linking the examples if you are going to write applications that use our APIs (e.g. FLCREC). For batch processing we recommend to create a storage report (RPTSTG(ON)) and define the CEEOPTS like recommended in the report.