flcl_manual-flcl_commands-archive-filter-dataset

DATASET

Synopsis

HELP:   Build signatures for clear records
TYPE:   OBJECT
SYNTAX: DATASET[(NAME='str',METHOD=AUTO/BLOCK/RECORD,FORMAT=FIX/LFD/DLM/ARY/VAR/CSV/TVD/XML,DEFAULTS(),ROW(),RECORD['str'...],IGNRST,PATH/ROOT='str',SUBSET[=SEGMENT/RECORD/AUTO],MULTIRECS=INFO/WARNING/ERROR,RNNOTFND=INFO/WARNING/ERROR,CNNOTFND=INFO/WARNING/ERROR,NOINDEX=INFO/WARNING/ERROR,INVSIG=INFO/WARNING/ERROR,SIGNWC=INFO/WARNING/ERROR,INCASE=INFO/WARNING/ERROR)...]

Description

The DATASET object can be used to set a format specification (row specification) and a clear data set (record) in this format for selecting segments. The corresponding parameter file can later be used to form a subset of compressed and encrypted segments by the archive carrier as a result set, where the clear search information is now provided by the owner of the data. The resulting signature consists of hash codes for a Bloom filter, which does not allow any conclusions to be drawn about the data.

Any number of filters can be defined, which can be transferred as an array to the COPY, DECODE or other commands. If more than one filter is passed, these are implemented as an OR link. In other words, all data is selected where one or the other filter applies.

By specifying a unique and non-empty name for a filter, it is counted across all search queries, so that a record that matches more than one search query is recorded several times. Only if the filter names are all empty (NAME='') or not provided the first hit will stop processing for this record.

If the filter names are set, a column of type STRING with the name 'FILTER' is added to the end of each matching record, where the name of the respective search is listed. If you do not want this column or want it in a different position (for example at the beginning), you must enter a corresponding row specification where the 'FILTER' column is not accessed or is in a different position.

A format specification (ROW) can be defined for each filter, where the search values can be specified mainly for the indexed columns. If more than one column is defined, the AND link applies here. This means that only records are selected where all specified columns match. You can work with wildcards for columns of type STRING, but this is not possible for all other types, as a wildcard is not permitted when converting to the neutral binary representation of this type. This means that for all these types you can only compare against concrete values. If you want to be able to search with wildcards in amounts or similar, then you have to define these of type STRING.

For performance reasons, indexed columns should be used for the search, but non-indexed columns are also permitted (use the NOINDEX parameter to change the error handling to a warning or to stop processing in this case). However, the latter only has an effect if the clear data is specified in the DECODE, where this is used to compare the records. If only the signature is available for comparison, the records of a segment are only compared against this signature in DECODE. With a DECODE, the compressed and encrypted segments concerned are always determined using the signature. If only non-indexed values are searched for, this means that all segments for a member must be read. It then depends on whether only a signature (FILTER) or a record (DATASET) was transferred for the search.

Wildcards can also be used for the STRING data type for data comparison. If these wildcards occur in indexed areas, then this partial signature is not used for the signature comparison, which means that such a column is treated like a non-indexed column. The SIGNWC parameter can be used to control the error handling for this case, which leads to a informational by default. If you index only a part of a column using a data offset, then you can use escape characters and the question mark (?) in front of the signed area. As soon as you use a plus (+) or asterisk (*) here, i.e. no longer have a fixed offset, then this column is completely regarded as a wildcard for the signature comparison and the indexing no longer plays a role. The record comparison against the data is not affected by this.

The first two examples below do not result in a warning, but the third does, because the wildcards are part of the partial signature generation.

    ...
    COLUMN(NAME='ARCHTRX-LTWBIC')
    RECORD='...,"UK*"'
    ...
    COLUMN(NAME='ARCHTRX-LTWBIC' INDEX(METHOD=BF1 SIGLEN=2 DATLEN=2))
    RECORD='...,"UK*"'
    ...
    COLUMN(NAME='ARCHTRX-LTWBIC' INDEX(METHOD=BF3 SIGLEN=128))
    RECORD='...,"UK*"'

All three examples could lead to other warnings if the indexing provided doesn't match the indexing in the archive for that column.

If no indexing for a column given and a archives is accessed with MATCH.DATASET() the indexing is taken from the row specification stored in the archive. This means that you do not have to specify the indexing of the individual columns, as long as they can be clearly determined from the row specifications.

The name of the row specification (logical format name) must be defined for the search. Only segments where this identifier is the same are used, as the corresponding signature for the search can only be created for a specific format. FLAM also supports wildcards for the format name, which makes it possible to use the same search for different row specifications. This is particularly useful if you want to search for specific values across various column names.

You can use a row specification for the data in the archive and you must only specify the columns whose content you want to search for. This is best done in CSV format, as this is the easiest way to specify the clear record with the search values. If you specify several records here, they must both be in the same segment for a match to occur. As the segment boundaries are not usually known, this AND link should be avoided, which is also indicated by a warning. With the parameter MULTIRECS the default error handling can be defined for this case.

The following is an example of such a filter specification.

  (FORMAT=CSV ROW(NAME='BBK50'
      COLUMN(NAME='*' INDEX(METHOD=BF2 SIGLEN=64))
      COLUMN(NAME='*' INDEX(METHOD=BF2 SIGLEN=64))
      COLUMN(NAME='*' INDEX(METHOD=BF2 SIGLEN=64))
      COLUMN(NAME='*' INDEX(METHOD=BF2 SIGLEN=64))
    )
    RECORD='"20110815","0000002","0000000","CRERET"'
    SUBSET=RECORD)

This example searches for 4 values across all columns (*) that use the same signature method and signature length and correspond to the format 'BBK50'. In this case the indexing of the columns must be defined, because determination based on the column names are not possible. Searching across all columns requires the most effort.

If the name of the columns given the offset for the indexing and partial signature of a column is determined on the basis of the column name and the specified value for this column must occur in exactly this column.

  (NAME='SEARCH01'FORMAT=CSV ROW(NAME='BBK50'
      COLUMN(NAME='ATRX-BTAG'        INDEX(METHOD=BF2 SIGLEN=64 ))
      COLUMN(NAME='ATRX-OUTBKTXNBR'  INDEX(METHOD=BF3 SIGLEN=128))
      COLUMN(NAME='ATRX-OUTERRCNTR'  INDEX(METHOD=BF2 SIGLEN=64 ))
      COLUMN(NAME='ATRX-PMNTTYPE-IN' INDEX(METHOD=BF2 SIGLEN=64 ))
      COLUMN(NAME='ARCHTRX-LTWBIC'                               )
    RECORD='"20110815","0000002","0000000","CRERET","UK*"'
    SUBSET=RECORD)

If this is done against an open archive, the indexing of the respective column is also applied if this has not been specified. If the indexing is explicitly specified, a plausibility check is carried out, where the error handling can be changed from a warning to an info or an error (cancellation).

  (NAME='SEARCH01'FORMAT=CSV ROW(NAME='BBK50'
      COLUMN(NAME='ATRX-BTAG'       )
      COLUMN(NAME='ATRX-OUTBKTXNBR' )
      COLUMN(NAME='ATRX-OUTERRCNTR' )
      COLUMN(NAME='ATRX-PMNTTYPE-IN')
      COLUMN(NAME='ARCHTRX-LTWBIC'  )
    RECORD='"20110815","0000002","0000000","CRERET","UK*"'
    SUBSET=RECORD)

A filter can be used to select an entire member that contains the information you are looking for. If you only want to receive the segments or records that match the search queries (filter), then you have to request this subset using the parameter of the same name. The parameters for generating a partial signature should match the parameters used for this in the archive. If this is not the case, this column is not included in the signature comparison and a warning is written to the log. Error handling can also be influenced here via the INVSIG parameter. Being tolerant here makes it possible to switch to a new indexing and to be able to search in old archives with such a query.

When accessing an archive (DECODE, COPY, ...), the indexing does not have to be specified explicitly when specifying the column names (without wildcards). The RNNOTFND selection can be used to forces an error if no corresponding row name is found in the archive. The same applies to the FILTER command if a ROWLIST is specified. In these cases, the indexing is taken from the format specifications noted in the archive. If these are specified as above, a plausibility check is carried out and error handling can be controlled via the INVSIG parameter.

The CNNOTFND selection can be used to forces an error if no corresponding column name is found in the data for any column name in the search query. If this error not requested, this column is interpreted as a wildcard, i.e. this column always leads to a match. The fact that no error is generated here by default is useful if search fields are added later that do not exist in earlier archives, then these columns that were not indexed at the time must always lead to a match so that the sequential search over the data can then select the correct rows. In such a case, the search will simply be slower because this column was not yet indexed in this archive. Beside ERROR the selection supports also WARNING and by default an informational is written to the log.

The comparison of strings can be set to non-case sensitive via the NOCASE parameter. When indexing a column, you must specify when writing whether the comparison should be case sensitive or not. If a case sensitive column is not to be compared later, this is adjusted and a warning is issued. You can use the INCASE parameter to turn this into an INFO message or an error with cancellation.

The SUBSET parameter copies only matching segments. If SUBSET is not defined, the whole member is selected. For DECODE, the SUBSET=RECORD is default and forces only matching records written into the output stream. With SUBSET=AUTO (must be explicit set per filter), the individual records would only be placed in the output if a record is specified via the DATASET object. If there is only a signature (FILTER object), all records of this segment are transferred, which is generally done with SUBSET=SEGMENT, which is the default for the COPY command. The default value for the UPDATE command is SUBSET=RECORD. In this case, it does not make sense to define a different value, so the default parameter is not available for this and the per-column parameter should not be used.

For the UPDATE command, there is also the UPDATE flag per column, which indicates that the value from this column is to be set for the records that hit it. These columns are ignored during the search (not part of the signature and the value comparison) even if these columns are indexed or even marked as primary keys.

  (FORMAT=CSV ROW(NAME='BBK50'
      COLUMN(NAME='ATRX-BTAG'       )
      COLUMN(NAME='ATRX-OUTBKTXNBR' )
      COLUMN(NAME='ATRX-OUTERRCNTR' )
      COLUMN(NAME='ATRX-PMNTTYPE-IN')
      COLUMN(NAME='ARCHTRX-INBKTXNBR' UPDATE)
    RECORD='"20110815","0000002","0000000","CRERET","0000005"'
    SUBSET=RECORD)

This example illustrates a DATASET specification for updating a column (ARCHTRX-INBKTXNBR with "0000005") of records that correspond to the search criteria ("20110815" for ATRX-BTAG, "0000002" for ATRX-OUTBKTXNBR, "0000000" for ATRX-OUTERRCNTR and "CRERET" for column ATRX-PMNTTYPE-IN).

Arguments

STRING: NAME='str' - Logical name for this filter/search (used for input output name mapping)
NUMBER: METHOD=AUTO/BLOCK/RECORD - Method for handling incoming data [AUTO]
- AUTO - Use block for blocks and record for records
- BLOCK - Parse the blocks
- RECORD - Parse records
STRING: RECORD['str'...] - Records in row specification format to build signature for search (and update)
SWITCH: IGNRST - Ignore a remaining rest at the end of the data [FALSE]
STRING: PATH/ROOT='str' - Define path to table (root) in a structured format (prefix for higher level)
NUMBER: SUBSET=SEGMENT/RECORD/AUTO - Select only matched segments or records to build a subset [AUTO]
- SEGMENT - Select complete segment
- RECORD - Select matching records
- AUTO - Select segment at COPY and records at DECO
NUMBER: MULTIRECS=INFO/WARNING/ERROR - Handling if more than one record is specified for filter matching (otherwise AND for signature / OR for record matching) [WARNING]
- INFO - Informational
- WARNING - Warning
- ERROR - Error
NUMBER: RNNOTFND=INFO/WARNING/ERROR - Handling if row name is not found in the archive (otherwise no adjustment done) [WARNING]
- INFO - Informational
- WARNING - Warning
- ERROR - Error
NUMBER: CNNOTFND=INFO/WARNING/ERROR - Handling if column name is not found in the archive (otherwise handled as wildcard) [WARNING]
- INFO - Informational
- WARNING - Warning
- ERROR - Error
NUMBER: NOINDEX=INFO/WARNING/ERROR - Handling if column is not indexed (signed) in the archive (otherwise handled as wildcard) [INFO]
- INFO - Informational
- WARNING - Warning
- ERROR - Error
NUMBER: INVSIG=INFO/WARNING/ERROR - Handling if generated partial signature in archive are not identical to filter definition (otherwise handled as wildcard) [WARNING]
- INFO - Informational
- WARNING - Warning
- ERROR - Error
NUMBER: SIGNWC=INFO/WARNING/ERROR - Handling if wildcards are used in signed/indexed data portion (otherwise handled as wildcard) [INFO]
- INFO - Informational
- WARNING - Warning
- ERROR - Error
NUMBER: INCASE=INFO/WARNING/ERROR - Handling if case sensitivity is inconsistent (otherwise adjust) [WARNING]
- INFO - Informational
- WARNING - Warning
- ERROR - Error