CHR

Synopsis

HELP:   Convert character sets (ASCII, EBCDIC, UNICODE)
TYPE:   OBJECT
SYNTAX: CHR(BUFSIZ=num,RECCNT=num,METHOD=AUTO/BLOCK/RECORD/KEEP/SKIP,CASMOD=UPPER/LOWER/FOLD/SUPPER/SLOWER/USRTAB,FROM='str'/DEFAULT/ASCII/EBCDIC/BOMUTF/BOMUCS/SYSTEM/LOCAL,MODE=STOP/IGNORE/SUBSTITUTE/IDENTITY/TRANSLIT,TO='str'/DEFAULT/ASCII/EBCDIC/SYSTEM/LOCAL,WRTBOM,HDLBOM/BOM,KEPBOM,ENL2LF,ELF2NL,SUBCHR[num/SYSTEM...],SYSTAB=ICONV,USRTAB='str'/NPAS/SEPA/DELA/DLAX,ONEMAP,CMBFRM=NFD/NFC/AUTO/ON/OFF,REPFIL='str'/STDOUT/STDERR,SKPBIN,SKPEQU)

Description

This component can be used to convert a text stream to a different character set after reading or before writing. Character conversion requires a source CCSID (FROM) and a target CCSID (TO). Using the keyword MODE, you can define the behavior when invalid or incomplete characters are encountered in the input data. It is possible to stop at, ignore or substitute unsupported characters. For the latter case, you can define a substitution character list (SUBCHR) and/or a user substitution table (USRTAB) and/or use one of the pre-defined system substitution tables (SYSTAB) for transliteration. When using a system table, you can override some of the pre-defined code points using a user table. Each character must be defined as Unicode codepoint in hexadecimal notation.

Each ignore or substitution process, including transliteration, can be recorded in a report file for review. Moreover, you can convert to upper, lower, special case or use case folding. The component also provides powerful support for byte order marks (BOM) including:

You can differentiate between block- and record-oriented conversion by the keyword METHOD.

If the input text is record-oriented then record orientation will be used by default, and you must explicitly enable block-oriented conversion if you do not intend to check the data at each end of a record. If you choose block orientation, the record lengths table will be lost. Additionally, you can enforce to skip incomplete character sequences at the end of a record to keep the length information.

In case of an error, all position and counter values printed to the error trace contain the current indexes. All indexes start at zero. The first block or record is unit 0.

It supports replacement of EBCDIC new line (0x15) characters by line feed (0x25) at character conversion takes place. When converting to UTF-8/16/32, this will result in the new line characters to be converted to line feeds (0x0A) instead of UTF-8 next line characters (0xC285).

In read mode, the component supports auto-detection of charsets. This can be activated by using the keyword DEFAULT as CCSID. Auto-detection will analyze the first block of input data to determine if the input uses ASCII, EBCDIC or a UTF encoding. The statistics-based computation cannot determine the actual ASCII or EBCDIC character set. Instead, the final CCSID is derived from the language identifier of the environment variable LANG.

Character conversion can skipped or enforced if the CCSIDs are equal or or binary data is detected.

To get a list of supported CCSIDs, use the command below:

   flcl INFO GET.CCSID

For more information, please have a look at the man pages for each parameter.

Arguments