Use of IBM Unicode Services
Since version 5.1.27, FLAM supports the use of IBM Unicode Services on
system Z. The Unicode Services are used only if no user table,
transliteration, case mapping, subsets, whitespace handling, reporting,
normalization or other complex conversion is requested. If a simple
conversion is done, in a few cases the implementation of FLAM is faster
and FLAM will still use this faster implementation. In all other cases,
the IBM Unicode Services will be used by default to reduce CPU and
memory utilization.
It is important to note the differences in behavior of the IBM Unicode Services:
- If an incomplete character is encountered and the error handling is not
STOP, the FLAM character conversion module ignores or substitutes this
incomplete character. The IBM Unicode Services does return an EINVAL
error in this case.
- IBM Unicode Services can produce errors for technically valid code
points when processing UTF-8 encoded data. The FLAM character
conversion module first decodes UTF-8 and verifies the code point
afterwards, independent of the encoding. This means you can encode
code points in multiple forms (e.g. the character 0 as 0x30 or
0xFC80808080B0). The IBM Unicode Services support only the shortest
form (0x30) of an encoding for a code point and all other possible
encodings are not accepted and will result in an error (invalid
character).
- Depending on the Unicode Services version (APAR) a BOM character in
the data might remain for UTF character sets or might be replaced by
a substitution character for single byte character sets.
When calling the Unicode Service FLAM will request the BOM removal, and
FLAM will remove a BOM character if it is causing an invalid character
error, but both will not work with certain versions of Unicode Services.
- No error is caused by a wrong BOM (invalid character) in the middle of
the data and a BOM change from LE to BE or vice versa is also not detected.
For data with mixed endianness the Unicode Services should be deactivated.
To prevent these changes in behavior you must set the system symbol
&FLIBMUS
or the environment variable FL_IBM_UNICODE_SERVICE
to OFF
,
but this will result in more CPU and memory utilization for simple character
conversions (see chapter Used Environment variables).
The environment variable's default is AUTO
which ensures that only the
minimal behavior changes above are in place. To use the Unicode Services
also for character conversion mode STOP
the environment variable has
to be set to ON
. This will cause FLAM to use the round trip conversion
technique of the Unicode Services, which results in the behavior changes
below:
- A lot of non-convertible characters will be converted to a unique
character in the target character set without an error using the round
trip conversion scheme.
- A substitution character is only converted with activated substitution.
Without substitution, the substitution character 0x1A in ASCII/UTF or 0x3F
in EBCDIC will cause an error. Based on that behavior FLAM will request
the substitution if Unicode Services are used.
We recommend to set FL_IBM_UNICODE_SERVICE=ON to maximize performance if
the limitations described above are acceptable because checking each
character if it is valid or not has a significant performance impact.