XML

Synopsis

HELP:   Read a data field from XML format
TYPE:   OBJECT
SYNTAX: XML(ROOT='str',PATH='str',VALUE='str',SCHEME=STRUCTURED/UNSTRUCTURED,PREFIX[=IGNOREALL/IGNEXCXML/IGNXMLNS],WARN[=MANDATORY/BRANCH/ATTRIBUTE/DATAFIELD/MISSING/NSPREFIX/ALL],ERROR[=BRANCH/ATTRIBUTE/DATAFIELD/MISSING/ALL],NAMESPACE/NS[()...],NOCHECK/NOCHEK/NOCHK)

Description

This object is used to transform an XML document into table format (i.e. rows and columns). If one column format is set to XML, then all column formats of all row specifications must be set to XML because the FMT.XML() object is used to build an XML element list for the provided XML document. This XML element is transformed to table entries based on the paths specified for each column. The XML data is required to be in UTF-8.

All formatting, comments, processing instructions and document type definitions (DTD) from the XML document are ignored when reading. Only data contained within XML tags or attributes can be accessed.

The maximum data length for a column is 1024 bytes by default. If there is a column that may contain more than 1024 bytes of data, the MAXLEN parameter must be set accordingly.

XML is a very powerful data format, which is converted here into a less powerful form of representation. This means that not every XML format can be converted into a table form. The table support only works for XML formats that have only one infinite repetition of data, whereby each of these repetitions becomes a line and its beginning is marked by the root path; as soon as another infinite repetition of data occurs here, the XML format cannot be converted into a table form. Further arrays must have a maximum defined number of entries, which can be defined via n columns with the same path. It is not possible to define an infinite number of columns per row.

Since version 5.1.25 of FLAM (namespace support) tag and attribute names are stored in a literal cache only once and used over and over again. This saves memory and processing time, but only if the hashtable is not too small. The default is 12 for 4096 entries. With the environment variable FL_LITERAL_CACHE_SIZE this can be adjusted between 256 (=8) and 1,048,576 (=20). For processing large XML documents we recommend to work with FL_LITERAL_CACHE_SIZE=16.

To address XML elements which are to be transformed to table columns, there is a path notation to describe the location of a data field within the XML document. There are two kinds of path: the root path (the branch in the XML tree where all data supposed to go into the table is located under) and the column-specific path (which locates an attribute or data node within the sub-tree under the root). The syntax is described in detail further below.

Schemes

There are two supported schemes of operation. The default scheme is the structured scheme. In this mode, each table row is enclosed by an XML tag. This is required to support default values for missing entries, header data, optional branch checking and other features.

The second supported scheme is the unstructured one. In this variant, table rows do not need to be enclosed in XML tags. Instead, it is considered a new row when a defined column is encountered in the XML data which has already been read before for the current row. As a consequence, the order of columns within a row is arbitrary. There are, however, no optional columns in this mode. Each column must be present in the XML data for every row. Since version 5.1.25 of FLAM the structured format supports also XML formats with only one level, but in this case such a format as only one root level and one or more path level and the whole format is only a table with one row. See example below:

<header>
     <col1>Data1</col1>
     <col2 attr="attr2">Data2</col2>
     <deeper attr="attr3">
        <col4>Data4</col4>
     </deeper>
</header>

Formats containing only the root level and native data like below are not yet supported:

<root>
  Data
</root>

Let's continue with examples for both schemes:

Below is a simple XML example which is suitable for the structured mode:

<?xml version='1.0'?>
<root>
  <row>
     <col1>Data11</col1>
     <col2>Data12</col2>
  </row>
  <row>
     <col1>Data21</col1>
     <col2>Data22</col2>
  </row>
  <row>
     <col2>Data32</col2>
     <col1>Data31</col1>
  </row>
</root>

A table object that to read this XML documentation into a table can be found below. Since no path information is specified, the row and column names are used for the path.

FMT.TABLE(
   FORMAT=XML
#  DEFAULT(XMLSCM=STRUCTURED)  commented out because it is already the default#
   PATH='<root>'
   ROW(
      NAME='row'
      COLUMN(NAME='col1')
      COLUMN(NAME='col2')
   )
)

An XML document suitable for the unstructured variant may look like the following example:

<?xml version='1.0'?>
<root>
  <table1>
     <col1>Data11</col1> <col2>Data12</col2>
     <col2>Data22</col2> <col1>Data21</col1>
     <col1>Data31</col1> <col2>Data32</col2>
  </table1>
</root>

The tags "col1" and "col2" appear repeatedly, but with changing order. The XML could also be encoded in one line without any whitespace. The form of the XML document is completely independent for the parsing process. To turn this document into a table with three rows and two columns, the following table object can be used:

FMT.TABLE(
   FORMAT=XML
   DEFAULT(XMLSCM=UNSTRUCTURED)
   PATH='<root>'
   ROW(
      NAME='table1'
      COLUMN(NAME='col1')
      COLUMN(NAME='col2')
   )
)

The XML document must contain all columns for each specified row. The order of columns per row is arbitrary. If one column is missing, the row specification does not match and another row specification is tried. If no matching row specification is found, an error occurs.

A row specification must not contain all possible data fields and attributes in a XML document. You can only ready what you need.

Root and Paths

Beside the scheme, a 'root' and a 'path' can be specified to locate elements in the XML document. In the structured scheme, 'root' is the path to a row and 'path' is a path within a row. In the unstructured scheme, 'root' is the path to a table and 'path' is a path within that table. The root and path must be specified in the syntax described below and can have any hierarchical depth up to 64 levels.

The structured example above could also be read like this:

FMT.TABLE(
   FORMAT=XML
   ROW(
      NAME='table1'
      COLUMN(NAME='col1' FORMAT.XML(ROOT='<root/row>' PATH='<col1>'))
      COLUMN(NAME='col2' FORMAT.XML(ROOT='<root/row>' PATH='<col2>'))
   )
)

The parameters 'path' and 'root' must be enclosed in angle brackets (<...>) for XML. To reduce redundancies in path specifications, the 'path' attributes on table, row and columns level can be used. All paths are case sensitive. Don't use diacritical characters in tag names. If your UTF-8 encoded XML data contains decomposed combined characters, you must perform conversion to UTF-8 NFC before using table formatting.

To locate an attribute value in the XML, the last path element must be prefixed with the character '@'. The character '&' can be used as a code page independent alternative and is mainly intended for EBCDIC systems.

The last element in a path can also be prefixed with the '%' character. It is only valid for non-attributes (i.e. '%' and '@'/'&' cannot be used at the same time). It is only relevant when writing and encloses the data in a CDATA section. When reading, prefixing a non-attribute with '%' is valid syntax, but has no effect. This allows using the same paths without modification for reading and writing.

Optional columns, default values and required fields

By default, all columns are mandatory, requiring them to be present in the XML document for every single row. To make a branch, a data value or an attribute value optional, the respective path element must be prefixed with a question mark (?). To use the same row specification for reading and writing, the minus (-) character can be used to make a branch optional only when reading and plus (+) when writing. This is useful, for example, if you have different tables in one XML document with a global required file header. This header must be optional when reading because you don't know which transaction type (table format) will be the first. The file header is required when writing so that it is written before each table. In such a case, the minus (-) would be the right choice.

In the example below, the attribute value 'att2' for <col> is optional, but the tag itself is mandatory.

FMT.TABLE(
   FORMAT=XML
   PATH='<root>'
   ROW(name='table1' path='<table/row>'
      COLUMN(NAME='col1' PATH='<col>')
      COLUMN(NAME='col2' PATH='<col/?@att2>')
   )
)

If a default value is specified for a column ('value' parameter) and no optional mark exists anywhere within the path to this column, this parameter has no effect. If no default value is set, but a column is marked optional, a zero-length value for strings and binary columns and the number zero is used for integer and float data types.

In the example below, both values are optional, but the <row> tag must be present:

FMT.TABLE(
   FORMAT=XML
   PATH='<root>'
   ROW(name='table1' path='<table/row>'
      COLUMN(NAME='col1' PATH='<?col>' VALUE='DataXX')
      COLUMN(NAME='col2' PATH='<col/?@att2>' VALUE='DataYY')
   )
)

The example below reads data from XML attributes and shows the different possibilities to define a default value (alternative as line comment):

FMT.TABLE(
   FORMAT=XML
   PATH='<root>'
   ROW(name='table1' path='<table/row>' defaults(value='DataYY')
      COLUMN(NAME='col1' PATH='<?col>' VALUE='DataXX')
     ;COLUMN(NAME='col1' PATH='<?col>' FORMAT.XML(VALUE='DataXX'))
      COLUMN(NAME='col2' PATH='<?col/?@att2>')
   )
)

An example XML file which can be read by this table definition is presented below. For missing data within <col> tags, 'DataXX' is put into the table. Missing attributes result in the value 'DataYY' in the table.

<?xml version='1.0'?>
<root>
  <table>
    <row>
      <col att2="Data12">Data11</col>
    </row>
    <row>
      <col att2="Data22"></col> <!-- '' for col1  is used-->
    </row>
    <row>
      <col>Data31</col> <!-- 'DataYY' for col2  is used-->
    </row>
    <row>
      <col></col> <!-- '' for col1 and 'DataYY' for col2  is used-->
    </row>
    <row>
      <col/> <!-- '' for col1 and 'DataYY' for col2  is used-->
    </row>
    <row>
    </row> <!-- 'DataXX' for col1 and 'DataYY' for col2  is used-->
  </table>
</root>

To get an error or a warning for an empty data field or attribute value the exclamation mark ('!' or '|') can be used at the last path component (leaf). The same character is used at write for path entries of header elements but is ignored at a leaf, so that the same row definition can be used for read and write.

The next example shows an optional branch with mandatory fields and some changed default values if the complete optional branch is missing.

FMT.TABLE(
   FORMAT=XML
   PATH='<root>'
   ROW(name='table' PATH='<table1/row>'
      COLUMN(NAME='col0' PATH='<data>')
      COLUMN(NAME='col1' PATH='<?optbr/req1>' value='DataXX')
      COLUMN(NAME='col2' PATH='<?optbr/req2>')
      COLUMN(NAME='col3' PATH='<?optbr/?opt3>')
      COLUMN(NAME='col4' PATH='<?optbr/?opt4>' value='DataYY')
   )
)

In this example the column 'col0' is required. If the optional <optbr> branch is missing from the row in the XML document, 'col1' is set to its default value 'DataXX', 'col2' and 'col3' are set to an empty string and 'col4' is set to 'DataYY'. If the <optbr> branch exists, it must contain tags <req1> (col1) and <req2> (col2), the tags <opt3> (col3) and <opt4> (col4) are optional and, if missing, are set to an empty string or 'DataYY', respectively.

<?xml version='1.0'?>
<root>
  <table1>
    <row>                   <!-- 1st row-->
      <data>Data11</data>
    </row>
    <row>                   <!-- 2nd row-->
      <data>Data11</data>
      <optbr>
        <req1>dd</req1>
        <reg2>kk</req2>
      </optbr>
    </row>
    <row>                   <!-- 3rd row-->
      <data>Data11</data>
      <optbr>
        <req1>dd</req1>
        <reg2>kk</req2>
        <opt3>xx</opt3>
        <opt4>yy</opt4>
      </optbr>
    </row>
    <row>                   <!-- 4th row-->
      <data>Data11</data>
      <optbr>
        <req1>dd</req1>
        <opt3>kk</opt3>
      </optbr>
    </row>
  </table1>
</root>

To prevent an error for 'col2' in the 4th row you can deactivate the check of optional branches for mandatory fields (NOCHECK). If the switch is activated then the corresponding default values are used in the table and no error occurs. Otherwise you can activate a warning for missing mandatory fields in optional branches.

If neither the field or attribute nor its containing XML branch is marked as optional, it must be present, otherwise an error occurs. If the optional mark is used on a branch (i.e. non-leaf XML node), the whole branch is optional. If a field in this branch is mandatory (no optional mark) then this field must be present if the branch is present, but if the complete optional branch does not exist in the XML document, a mandatory part will result in an error by default.

Checking for mandatory parts in an optional branch and the reset to the original default values at the begin of each branch is CPU-expensive and in this case often the last determined value are required in the table. Therefore, this kind of XML validation can be deactivated through the NOCHECK switch, if necessary. Effectively, every column is treated as if it was marked as optional and the last parsed value are still valid.

If an XML branch is marked as optional in one column description, then this branch is implicitly also optional for all other column descriptions that refer to children of the same branch. In consequence, a branch must only be mark as optional in one column description to make it optional for all columns of the same branch. Please note that this does NOT imply that the columns itself become optional, but only the marked branch.

Warnings and limitations

All other paths, attributes or values which do not match the provided column specifications are ignored by default. With the selection WARN you can activate several warnings to be sure you don't forget a column specification. The example below will still result in the same logical table as the samples above:

<?xml version='1.0'?>
<root>
  <ign>
    ignored
  </ign>
  <table1>
    <row>                   <!-- 1st row-->
      <data ign3="ignored">Data11</data>
    </row>
     <ign>
       ignored
     </ign>
    <row>                   <!-- 2nd row-->
      <data>Data11</data>
      <optbr>
        <req1>dd</req1>
        <reg2>kk</req2>
      </optbr>
    </row>
    <row>                   <!-- 3rd row-->
      <data>Data11</data>
      <optbr>
        <req1>dd</req1>
        <reg2>kk</req2>
        <opt3>xx</opt3>
        <opt4>yy</opt4>
      </optbr>
    </row>
  </table1>
  <ign1>
    <ign2>
       ignored
    </ign2>
    ignored
  </ign1>
</root>

There is no support for reading nested tables. The root path of different row definitions must be unique. Otherwise, undefined behavior and/or unpredictable errors are the result.

Here is an example for this kind of XML:

<?xml version='1.0'?>
<root>
  <row1>
     <col1>Data11</col1>
     <other>
        <row2>
          <col1>Data112</col2>
          <col2>Data112</col2>
        </row2>
     </other>
     <col2>Data12</col2>
  </row1>
  <row1>
     <col1>Data21</col1>
     <other>
        <row2>
          <col1>Data212</col2>
          <col2>Data212</col2>
        </row2>
     </other>
     <col2>Data22</col2>
  </row1>
  <row1>
     <col2>Data32</col2>
     <other>
        <row2>
          <col1>Data312</col2>
          <col2>Data312</col2>
        </row2>
     </other>
     <col1>Data31</col1>
  </row1>
</root>

The root path (<root/row1>) must be unique. The second root path for this XML document would be <root/row1/other/row2>. However, this latter path is within a subtree of the first root path, which is not supported. The corresponding table object would look as follows, but such a specification will not work properly:

FMT.TABLE(
   FORMAT=XML
   PATH='<root>'
   ROW(
      NAME='row1' PATH='<row1>'
      COLUMN(NAME='col1')
      COLUMN(NAME='col2')
   )
   ROW(
      NAME='row' PATH='<row1/other/row2>'
      COLUMN(NAME='col1')
      COLUMN(NAME='col2')
   )
)

Only the first row specification is valid and can be used to read the XML document. You can extend this to a valid specification to read the document as a table with 4 columns:

FMT.TABLE(
   FORMAT=XML
   ROW(
      NAME='row1'
      COLUMN(NAME='col1' root='<root/row1>' path=<col1>)
      COLUMN(NAME='col2' root='<root/row1>' path=<col2>)
      COLUMN(NAME='col3' root='<root/row1>' path=<other/row2/col1>)
      COLUMN(NAME='col4' root='<root/row1>' path=<other/row2/col2>)
   )
)

The maximum hierarchical depth of the XML document is currently limited to 64 levels including the root tag for performance reasons. This limit can be extended if required. Please open an issue on 'www.flam.de' if you need support for more levels.

Header and trailer support

Since XML is a hierarchical format, there might be information on the same or higher level of the XML document tree that is relevant for every single row of a table. This information typically occurs only once in XML before the repetitive data that constitutes the actual table data. We call this type of data 'header data'. By converting the XML document to a two-dimensional table, we want to copy the mentioned header data into every row of the resulting table.

Header data may exist on the same or a higher level in the document tree and must occur before the table data. To specify a column that carries header data, the column MUST have the root parameter set to a path that is shorter than the containing row. For example, if the path to a row is <root/table/row>, then a column containing a header value must have a root parameter with either <root/table> or <root> as path. The path to locate the header field within this new root path may be arbitrarily complex. Specifying multiple columns with multiple different root paths is also supported.

The root path of a header column also defines the scope at which a header data value remains valid during parsing of the XML document. Whenever a header columns's root path is left, i.e. a closing XML tag occurs that causes the current path to be shorter than the header's root path, the header column's data is reset to it's default value (either explicitly specified by the 'value' parameter or implicitly set to an empty string or zero, depending on the data type).

If a header column is not marked as optional (?/-), the header element must be found in the XML document, otherwise an error occurs. You can use the column change detection functionality in your custom application to be notified about header changes.

Below is an example XML document with a list of customers, which own a specific product, grouped by country. For every country, the sales tax and a reduced tax is stored:

<?xml version='1.0'?>
<customers product="myProduct">
  <country name="Germany">
    <tax>
      <sales>19</sales>
      <reduced>7</reduced>
    </tax>
    <customer>
      <name>Cust1</name>
      <address>Street 1, City 1</address>
    </customer>
    <customer>
      <name>Cust2</name>
      <address>Street 2, City 2</address>
    </customer>
  </country>
</customers>

To goal is to create a customer table from this document with product, country name and the tax rates that apply to each customer. There are multiple ways to specify the columns:

FMT.TABLE(
   FORMAT=XML
   ROW(
      NAME='customers' path='<customers/country/customer>'
      COLUMN(name='product'    root='<customers>'         path='<@product>')
      COLUMN(name='country'    root='<customers/country>' path='<@name>')
      COLUMN(name='salesTax'   root='<customers/country>' path='<tax/sales>')
      COLUMN(name='reducedTax' root='<customers/country>' path='<tax/reduced>')
      COLUMN(name='name')
      COLUMN(name='address')
   )
)

or

FMT.TABLE(
   PATH='customers'
   FORMAT=XML
   ROW(
      NAME='customers' path='<country/customer>'
      COLUMN(name='product'    root='<customers>'         path='<@product>')
      COLUMN(name='country'    root='<customers/country>' path='<@name>')
      COLUMN(name='salesTax'   root='<customers/country>' path='<tax/sales>')
      COLUMN(name='reducedTax' root='<customers/country>' path='<tax/reduced>')
      COLUMN(name='name')
      COLUMN(name='address')
   )
)

or

FMT.TABLE(
   FORMAT=XML
   ROW(
      NAME='customers' path='<customers/country/customer>'
      COLUMN(name='product'    format.xml(root='<customers>'         path='<@product>'))
      COLUMN(name='country'    format.xml(root='<customers/country>' path='<@name>'))
      COLUMN(name='salesTax'   format.xml(root='<customers/country>' path='<tax/sales>'))
      COLUMN(name='reducedTax' format.xml(root='<customers/country>' path='<tax/reduced>'))
      COLUMN(name='name'       path='name')
      COLUMN(name='address'    path='address')
   )
)

or

FMT.TABLE(
   FORMAT=XML
   ROW(
      NAME='customers'
      COLUMN(name='product'    format.xml(root='<customers>'         path='<@product>'))
      COLUMN(name='country'    format.xml(root='<customers/country>' path='<@name>'))
      COLUMN(name='salesTax'   format.xml(root='<customers/country>' path='<tax/sales>'))
      COLUMN(name='reducedTax' format.xml(root='<customers/country>' path='<tax/reduced>'))
      COLUMN(name='name'       format.xml(root='<customers/country/customer>' path='name'))
      COLUMN(name='address'    format.xml(root='<customers/country/customer>' path='address'))
   )
)

Result:

product   | country | salesTax | reducedTax | name  | address
myProduct | Germany |       19 |          7 | Cust1 | Street 1, City 1
myProduct | Germany |       19 |          7 | Cust2 | Street 2, City 2

When writing using the same column specifications, header attributes and header data fields are written only once in front of the table, resulting in the original XML document.

A simple XML document with a complex header structure could look like in the example below:

<?xml version='1.0'?>
<root xmlns="http://www.w3.org/1999/xhtml">
  <header>
    <value1>hv1</value1>
    <value2>hv2</value2>
    <value3>hv3</value3>
  </header>
  <table>
    <tabhdr>
      <header>
        <value1>hv1</value1>
        <value2>hv2</value2>
        <value3>hv3</value3>
      </header>
      <row><col1>Data11</col1><col2>Data21</col2></row>
      <row><col1>Data12</col1><col2>Data22</col2></row>
      <row><col1>Data13</col1><col2>Data23</col2></row>
      <row><col1>Data14</col1><col2>Data24</col2></row>
    </tabhdr>
  </table>
  <trailer>
    <value1>hv1</value1>
    <value2>hv2</value2>
    <value3>hv3</value3>
  </trailer>
</root>

For this document, it make sense to define a row specification for the dedicated header, one for the <table> subtree and a third one for the dedicated trailer. The corresponding table object for the example above could look like this:

FMT.TABLE(NAME='XMLDOC' FORMAT=XML PATH='<root>'
   ROW(NAME='header'
      COLUMN(NAME='rootns'  root='<root>' path='<@xmlns>')
      COLUMN(NAME='value1')
      COLUMN(NAME='value2')
      COLUMN(NAME='value3')
   )
   ROW(NAME='table' PATH='<table/tabhdr/row>'
      COLUMN(NAME='tabhdrval1'  root='<root/table/tabhdr>' path='<header/value1')
      COLUMN(NAME='tabhdrval2'  root='<root/table/tabhdr>' path='<header/value2')
      COLUMN(NAME='tabhdrval3'  root='<root/table/tabhdr>' path='<header/value3')
      COLUMN(NAME='col1')
      COLUMN(NAME='col2')
      COLUMN(NAME='col3')
   )
   ROW(NAME='trailer'
      COLUMN(NAME='value1')
      COLUMN(NAME='value2')
      COLUMN(NAME='value3')
   )
)

For XML documents which are structured like in the example above, but without the dedicated trailer, it may be desirable to create only one table with more columns. To achieve this, the dedicated header can be added as short roots (header columns) to the row specification of the table.

FMT.TABLE(NAME='XMLDOC' FORMAT=XML PATH='<root>'
   ROW(NAME='table' PATH='<table/tabhdr/row>'
      COLUMN(NAME='rootns'      root='<root>' path='<@xmlns>')
      COLUMN(NAME='hdrval1'     root='<root>' path='<header/value1>')
      COLUMN(NAME='hdrval2'     root='<root>' path='<header/value2>')
      COLUMN(NAME='hdrval3'     root='<root>' path='<header/value3>')
      COLUMN(NAME='tabhdrval1'  root='<root/table/tabhdr>' path='<header/value1')
      COLUMN(NAME='tabhdrval2'  root='<root/table/tabhdr>' path='<header/value2')
      COLUMN(NAME='tabhdrval3'  root='<root/table/tabhdr>' path='<header/value3')
      COLUMN(NAME='col1')
      COLUMN(NAME='col2')
      COLUMN(NAME='col3')
   )
)

The trailer could also be added to this row specification. In this case, a last record is produced containing the trailer elements, but table (the row tag) and header elements must be optional.

Header elements are only supported when using the structured scheme of XML parsing, because the end of a row must be known. In the unstructured version, there is no capability to store such header information in the columns.

Non-leaf data and array support

Data fields which are not in leafs of the XML tree, but somewhere on the path (<root><table>data<row>...</row>other data</...) can also be addressed except data on root tag level. If more than one data element is present and only one column are defined, the last value is used.

Example:

<?xml version='1.0'?>
<root xmlns="http://www.w3.org/1999/xhtml">
  <table>
    SomeData1
    <row><col1>Data11</col1><col2>Data21</col2></row>
    <row><col1>Data12</col1><col2>Data22</col2></row>
    <row><col1>Data13</col1><col2>Data23</col2></row>
    <row><col1>Data14</col1><col2>Data24</col2></row>
    SomeData2
    <row><col1>Data15</col1><col2>Data25</col2></row>
    <row><col1>Data16</col1><col2>Data26</col2></row>
    <row><col1>Data17</col1><col2>Data27</col2></row>
    <row><col1>Data18</col1><col2>Data28</col2></row>
    SomeData3
  </table>
</root>

When addressing '<root><?table>' as one of your table columns, then the first four rows contain 'SomeData1' and the following four rows contain 'SomeData2' in this column. For 'SomeData3', an additional row with default values for missing items is created except if there is a mandatory field in the row. So, when specifying a column with a path which addresses non-leaf data, the column will always contain the last chunk of data encountered within that non-leaf XML node.

It is also possible to address data and attributes of multiple consecutive XML tags with the same name one the same level by simply specifying multiple columns consecutively with the same path. If the XML data contains more consecutive tags than columns with identical paths are specified, the data of these additional tags is lost (ignored). For this case, a warning can be activated.

Example:

<?xml version='1.0'?>
<tables xmlns="http://www.w3.org/1999/xhtml" rootattr1="RootAttr1">
 <header>
   <hdat hatr="hatr1xx">hdat1xx</hdat>
   <hdat hatr="hatr2xx">hdat2xx</hdat>
   <hdat hatr="hatr3xx">hdat3xx</hdat>
   <hdat hatr="hatr4xx">hdat4xx</hdat>
   <row>
    <rdat ratr1="ratr11xx" ratr2="ratr21xx">rdat1xx</rdat>
    <rdat ratr1="ratr12xx" ratr2="ratr22xx">rdat2xx</rdat>
    <rdat ratr1="ratr13xx" ratr2="ratr23xx">rdat3xx</rdat>
   </row>
   <row>
    <rdat ratr1="ratr11xx" ratr2="ratr21xx">rdat1xx</rdat>
    <rdat ratr1="ratr12xx" ratr2="ratr22xx">rdat2xx</rdat>
    <rdat ratr1="ratr13xx" ratr2="ratr23xx">rdat3xx</rdat>
   </row>
   <row>
    <rdat ratr1="ratr11xx" ratr2="ratr21xx">rdat1xx</rdat>
    <rdat ratr1="ratr12xx" ratr2="ratr22xx">rdat2xx</rdat>
    <rdat ratr1="ratr13xx" ratr2="ratr23xx">rdat3xx</rdat>
   </row>
   <row>
    <rdat ratr1="ratr11xx" ratr2="ratr21xx">rdat1xx</rdat>
    <rdat ratr1="ratr12xx" ratr2="ratr22xx">rdat2xx</rdat>
    <rdat ratr1="ratr13xx" ratr2="ratr23xx">rdat3xx</rdat>
   </row>
 </header>
</tables>

Using multiple columns with identical paths, this XML document can be read as follows:

table(format=XML PATH='<tables>' defaults(witspc=collaps)
   row(name=multi path='<header/row>'
       col(name=hdat__1 PATH='<hdat>'       ROOT='<tables/header>')
       col(name=hatr__1 PATH='<hdat/&hatr>' ROOT='<tables/header>')
       col(name=hdat__2 PATH='<hdat>'       ROOT='<tables/header>')
       col(name=hatr__2 PATH='<hdat/&hatr>' ROOT='<tables/header>')
       col(name=hdat__3 PATH='<hdat>'       ROOT='<tables/header>')
       col(name=hatr__3 PATH='<hdat/&hatr>' ROOT='<tables/header>')
       col(name=rdat__1 PATH='<rdat>')
       col(name=ratr1_1 PATH='<rdat/&ratr1>')
       col(name=ratr2_1 PATH='<rdat/&ratr2>')
       col(name=rdat__2 PATH='<rdat>')
       col(name=ratr1_2 PATH='<rdat/&ratr1>'
       col(name=ratr2_2 PATH='<rdat/&ratr2>')
       col(name=rdat__3 PATH='<rdat>')
       col(name=ratr1_3 PATH='<rdat/&ratr1>')
       col(name=ratr2_3 PATH='<rdat/&ratr2>')
   ))

Name space support

XML name spaces must normally be handled by the application. Since version 5.1.24 with the selection 'PREFIX' the name space abbreviations can be excluded from tag comparisons. At write this requires, that only tree wide name space definitions are possible and must be defined as attribute for this tree by the application.

In a later version the name space object can be used to define the name space URI for each column. At read this URI will be validated unless the prefix selection is defined. At write the name space will be defined if possible per tree and only for elements where a local name space is required a name space abbreviation is used.

Other things and recommendations

When using the structured scheme (default), a default value can be specified (see one of the examples above), which is used if an element marked as optional ('/?' or '/-') is not found in the row.

Sometimes it may be useful to get a warning instead of an error if a mandatory field is missing. For this purpose, the selection WARN with keyword MANDATORY exists. If the selection is activated, a warning is written to the log for each mandatory column that is not found in the document. Additionally, it is possible to activate warnings or errors for missing branches, data fields or attributes in the column specifications.

To implement/validate choices (this or this branch) you have two possibilities. First you can work with different row specification matching the different XML formats. Or you make such branches optional and validate correctness within your application.

Using different row specification you can define different tables in the same XML document. For example, a certain header and/or trailer could be a table with one row. In between, there could be different table formats with many rows for the different transaction types. In an extreme case, the whole document could be a single row with lots of columns.

With the parameter PREFIX the handling of XML prefixes in front of a colon can be defined. There are currently 3 methods implemented to ignore such prefixes at tag comparison. The first method ignores all prefixes if a colon found in the tag and compares only the remaining tag behind the colon. The second ignores anything except prefixes starting with 'xml' and the third method works like the second one, but only for 'xmlns:' and it also disables the missing warning or error for name space definitions. To be backward compatible with existing row definitions the ignore methods are only used if the tag does not match. If you specify a XML prefix in a column definition then the tag must still match, but with one of the supported ignore method the prefix must not defined anymore. Be aware that these prefix handling are only relevant at read and you must ensure at write, that required XML prefix definitions are done. For name space handling the better approach is to use the name space support introduced with FLAM version 5.1.25.

There is no possibility to mix XML data with other supported table formats. An error will occur if you define an XML column and another type of column format. If you need such mixed formats, then the tag value delimiter (TVD) support can be used to define sections of XML or JSON within a record or data structure.

Arguments