flcl_manual-flcl_commands-conv-read-xml

XML

Synopsis

HELP:   Read XML data from a file
TYPE:   OBJECT
SYNTAX: XML(NET.{},FILE['str'/STREAM/DUMMY...],BLKSIZE=num,NOCMNT,NODFLT,NOEMPD,ADDEMP,NOPINS,NODTD,CCDATA,DATLEN=num,LENERR,CCSID='str'/DEFAULT/ASCII/EBCDIC/BOMUTF/BOMUCS/SYSTEM/LOCAL,CHRMODE=STOP/IGNORE/SUBSTITUTE/IDENTITY/TRANSLIT,SKIPEQUAL,USRTABLE='str'/NPAS/SEPA/DELA/DLAX,ONEMAP,COMBINED=NFD/NFC/AUTO/ON/OFF,BOM,KEEPBOM,ENL2LF,RPLFFD[=num],DECRYPT[{}...],SUBSYSTEM(),FRCBLK,REMOVE,RENAME='str',LANG='str',PLATFORM=WIN/UNX/ZOS/USS/VSE/BS2/MAC,OWNER='str',ENVID='str',HASH(),SIGNATURE.{},CHECK,AVSCAN(),NOARCH,PREPROCESS[()...],POSTPROCESS/PSTPRO[()...])

Description

Read XML works on blocks of binary or text data. Character conversion takes place before processing the XML data. If no CCSID is provided, then auto detection is used. If UTF-8 is detected, character conversion is skipped. Line delimiters must be one of 0x0A, 0x0D or 0x0D0A (after conversion to UTF-8). If a CCSID is supplied, the character data is converted from the provided CCSID to UTF-8 before performing XML processing, using the supplied CCSID as source encoding. If the input data is encoded (e.g. Base64), encrypted, compressed or contains 4 byte length fields, it is automatically decoded, decrypted, and/or decompressed to build a valid XML data block. During parsing, all line delimiters are normalized to line feed characters (0xA) as defined by the XML specification.

The XML data is parsed using the Expat library and transformed into FLAM elements, which allows various formatting options when writing, including minimizing, pretty printing XML data and restoring an equivalent of the original document.

There are several switches available to exclude certain types of XML elements from the parsed element list.

When reading XML documents through the byte or record interface using the element formatter (format.element()), the character data between starting and closing tags may be split into multiple XML data elements as the data can be of an arbitrary length. The DATLEN parameter controls the minimum length of a data element before being split (default is 1024). Note that this a minimum, so the data elements returned may be actually much larger. As a rule of thumb, the maximum length of data elements is close to the input buffer size (but may exceed it). If you want to get all data between any pair of starting and closing tags as a single data element, simply set the DATLEN parameter to a large number. You must be aware, however, that this might require considerable amounts of memory, depending the maximum data length in tag bodies. With the LENERR parameter, an error occurs if a data element exceeds the specified length. If the ignore empty data (NOEMPD) switch is set, data elements containing only whitespace are suppressed. Additional with the switch ADDEMP switch you can insert a empty data element to the element list if a end tag follows direct after a start tag.

On some EBCDIC machines, if character conversion from an EBCDIC charset is used, the new line character (0x15) is not properly converted, causing the XML parser to fail. In this case, turn on the enl2lf to enable proper conversion of new lines (0x15) to line feeds (0xA).

If reading XML, the semantics of write modes (write.*()) change as follows:

BINARY: The FLAM elements produced by this read mode are used to restore the original XML file as close as possible (standard formatting). The XML will be in UTF-8 and lines are delimited by line feed (0xA).
CHARACTER: The FLAM elements are used to produce a minimized version of the XML data. Line breaks and whitespace are removed unless they are part of actual character data.
TEXT: Human readable XML data with proper indentation is produced (pretty printing). Linebreaks are converted to a system-specific representation.
XML: Produces XML data using the specified formatting method. The XML data will be in UTF-8 and lines are delimited by line feed (0xA) unless explicit character conversion is enabled.
RECORD: Human readable XML data with proper indentation is produced (pretty printing). The produced XML data is separated into records, each record containing one line of XML.
FLAM4: Same as RECORD, but stored in a FLAM4 file.

Known limitations:

If the attributes inside an XML tag are separated by more than one space character, the additional space characters are dropped (as per XML specification).
Due to a parser limitation, declared entities that appear inside attributes will be output in its substituted form.

Arguments

STRING: FILE['str'/STREAM/DUMMY...] - Name/URL of file to read [''==stdin]
- STREAM - Read from stdin or write to stdout
- DUMMY - Read EOF or write nothing
SWITCH: NOCMNT - Ignore XML comments [FALSE]
SWITCH: NODFLT - Ignore XML default elements (i.e. whitespace before/after root tag) [FALSE]
SWITCH: NOEMPD - Ignore XML empty/whitespace data elements (not in CDATA section) [FALSE]
SWITCH: ADDEMP - Add an empty XML data element if end tag follows direct a start tag [FALSE]
SWITCH: NOPINS - Ignore XML processing instructions [FALSE]
SWITCH: NODTD - Ignore XML document type definitions[FALSE]
SWITCH: CCDATA - Collect CDATA to simple data elements [FALSE]
NUMBER: DATLEN=num - Minimum length to collect data elements in one element [1024]
SWITCH: LENERR - Return an error if minimum data element length exceeded [FALSE]
SWITCH: SKIPEQUAL - Skip conversion if formats are equal (e.g. UTF-8 to UTF-8)
SWITCH: KEEPBOM - Keep byte order mark for faster conversion
NUMBER: RPLFFD=num - Replace form feeds, filling rest of page with blank lines assuming n lines per page [60]
SWITCH: FRCBLK - Enforce block orientation on record oriented devices [FALSE]
SWITCH: NOARCH - Disable the attempt to read archives (prevent multiple opens to the same file) [FALSE]