flcl_manual-flcl_commands-conv-read-input-save-file-fmt-tab-row-column-index

INDEX

Synopsis

HELP:   Activate and define indexing for the column
TYPE:   OBJECT
SYNTAX: INDEX(METHOD=BF1/BF2/BF3/BF4,DATOFS=num,DATLEN=num,SIGLEN=num/SL1/SL2/SL4/SL8/SL16/SL32/SL64/SL128/SL256/SL512/SL1024/SL2048/SL4096/SL8192,PRIKEY)

Description

This object can be used to activate indexing for a column. Bloom filters are used for indexing, but their quality also increases in cost the more different hash methods are used to calculate the signature.

The method is used to specify whether 1, 2, 3 or 4 different hashes are calculated, with the lowest effort (1 hashes) being the default. Since each hash leads to one bit in the signature, the optimal signature length results from the number of rows in a matrix/segment multiplied by the number of bits (1, 2, 3 or 4) divided by 8 as the byte length depending on he variation of the data, whereby the signature length may not fall below 1 byte and may reach a maximum of 8 kib. Since the optimum leads to a large overhead or bad match rate, the signature length used for a column must be specified explicitly to be known for verification. If not specified the standard length of 4 bytes (32 bit) is used. The signature length must be a multiple of power of 2.

Normally the entire content of a column is hashed per line. If you want to restrict this, you can define an offset for the starting point and a length to let the hash be calculated only over a part of the date. The latter is mainly intended for defining only one column for whole records and performing indexing quasi like a VSMA KSDS.

If a KSDS or any record-oriented dataset is read with KEYLEN and KEYPOS (note: KEYPOS started with 1 but DATOFS started with 0) and no explicit row specification has been specified, the following row specification is automatically added if the record is still unchanged.

   ROW(NAME='STDREC' COLUMN(NAME='RECORD' INDEX(METHOD=BF4 DATLEN=llll DATOFS=oooo SIGLEN=SL128 PRIKEY) LOOKUP))

This means that the indexing of a KSDS is automatically transferred (DATLEN=KEYLEN, DATOFS=KEYPOS-1) to the archive. If this is not desired, a corresponding row specification must be specified for table name STDREC.

Several indexed columns can represent a primary key for a data record, which is important for inserts, updates and deletes. This can be indicated using the PRIKEY keyword. All indexed columns marked in this way are later used to create a signature in order to update, insert or delete a record. If the LOOKUP switch has also been set for the same columns, the record comparison is carried out via a hash table when a matrix has been hit. This is much faster with a large number of records in the matrix, but also consumes more memory.

To search later in the archive you only need to enter the same name for the row specification and also only need to describe each column that is required for the signature calculation in the same way, which is often done in CSV format, as this is the easiest way to enter the data. The non-indexed columns can be omitted here and the column names must match to determine the correct offset for the partial signature.

The examples below shows the 2 different possibilities to match 2 of n indexed columns:

  #SAMPLE 1: with only the 2 required columns and correct columns names#
  MATCH.DATA(FORMAT=CSV ROW(NAME='BBK50'
      COLUMN(NAME='ARCHTRX-MSGID-IN' )
      COLUMN(NAME='ARCHTRX-MSGID-OUT')
    )
    RECORD='"NOLADE20XXX110815102443SCT001","20110815-600000-001"'
    SUBSET)
    
  #SAMPLE 2: with only the 2 values matching all columns of this kind#
  MATCH.DATA(FORMAT=CSV ROW(NAME='*'
      COLUMN(NAME='*' INDEX(METHOD=BF4 SIGLEN=128))
      COLUMN(NAME='*' INDEX(METHOD=BF4 SIGLEN=128))
    )
    RECORD='"NOLADE20XXX110815102443SCT001","20110815-600000-001"'
    SUBSET)

If you specify the format and column names, FLAM determines the indexing based on the format information from the archive. If you search across all columns and formats, the indexing for the columns of the formats cannot be determined and therefore the indexing must be specified.

Arguments

NUMBER: METHOD=BF1/BF2/BF3/BF4 - Method used for indexing [BF1 (Bloom filter with 1 hashes (simplest form))]
- BF1 - Bloom filter with 1 hash (DJB2)
- BF2 - Bloom filter with 2 hashes (DJB2 and FNV)
- BF3 - Bloom filter with 3 hashes (DJB2, FNV and SDBM)
- BF4 - Bloom filter with 4 hashes (DJB2, FNV, SDBM and Murmur2)
NUMBER: DATOFS=num - Offset in the data from which the hash calculation starts [0 (from beginning)]
NUMBER: DATLEN=num - Length of the data over which the hash is calculated [0 (to the end)]
NUMBER: SIGLEN=num/SL1/SL2/SL4/SL8/SL16/SL32/SL64/SL128/SL256/SL512/SL1024/SL2048/SL4096/SL8192 - Length of the partial signature for the segment [4 Byte (32 Bit)]
- SL1 - Signature length of 1 (e.g. used up to maximum of 8 different values per column with BF1)
- SL2 - Signature length of 2 (e.g. used up to maximum of 65536 different values per column with BF1)
- SL4 - Signature length of 4 (standard)
- SL8 - Signature length of 8
- SL16 - Signature length of 16
- SL32 - Signature length of 32
- SL64 - Signature length of 64
- SL128 - Signature length of 128
- SL256 - Signature length of 256
- SL512 - Signature length of 512
- SL1024 - Signature length of 1024
- SL2048 - Signature length of 2048
- SL4096 - Signature length of 4096
- SL8192 - Signature length of 8192 (maximum)
SWITCH: PRIKEY - Define indexed column as part of the primary key [FALSE]