flcl_manual-formats-flam5_segment_format

FLAM5 Segment Format

The following description of the segment format is intended to allow you to program against the data format yourself, so that dependencies with regard to persistence are avoided.

A FLAM5 segment is the serialized representation of a FLAM5 matrix which holds the neutral FLAM5 elements in memory and allows random access to the data regardless of their format (XML, CSV, FB/VB/...).

For the persistent storage of these neutral data matrices in our FLAM5 archives, the matrices are serialized to segments. This involves compression, encryption and signing as well as indexing of the data. The serialized segments are self-sufficient images of a data matrix in the memory and serve to build it up when read.

The following describes the structure of these self-sufficient segments, which are persistently managed by our various segment storage implementations (flat file, folder, cloud).

For serialization, half bytes (4 bit), bytes (8 bit), integers (32 bit or 64 bit), variable length string or binary data blobs of variable length are used. The integers are represented in Big Endian format and strings are encoded in UTF-8.

Such a segment basically consists of 3 parts. The so-called segment header, the segment attributes and the segment data. The segment header is divided into a static and a variable part, whereby the static part, as the name suggests, is constant and is therefore stored in a separate matrix for global format descriptions. The variable part, as the name suggests, contains the values, which can be different for each segment. These include, for example, the original and compressed lengths. The index to the static part of the header, the variable part of the segment header and the segment attributes are stored by the archive management in the associated member matrix, whose entries thus always describe an entire file in the archive. The segment data consist only of a checksum and the encrypted compressed data for the individual components of one of the defined pots. This results in the following schematic structure of a segment (as Backus-Naur diagram with bit length in parenthesis, square brackets for optional stuff and ampersand for the empty set and hash for comment):


FL5-Segment        = Segment-Header Segment-Attributes Segment-Data
Segment-Header     = Static-Segment-Header Variable-Segment-Header
Segment-Attributes = Key-Derivation-Data Segment-Verification-Data
Segment-Data       = [Type-Buffer] Segment-Data-Pot-List

Static-Segment-Header   = Static-Base-Header Static-Column-List Static-Pot-List
Variable-Segment-Header = Variable-Base-Header Variable-Pot-List

Static-Column-List = Static-Header-Column Static-Column-List
Static-Pot-List    = Static-Header-Pot Static-Pot-List
Variable-Pot-List  = Variable-Header-Pot Variable-Pot-List

Static-Base-Header   = MB(8) VN(4) INTPRT(4) CRYSUT(8) CMPSUT(8) MATTYP(8) MATFLG(8) TYPCNF(8) TYPMOD(8) COLCNT(32) POTCNT(32) MAXREC(32) SIGLEN(32)
Static-Header-Column = COLFLG(8) DATTYP(8) SIGMTD(8) POTIND(8) [SIGLEN(32) SIGOFS(32) DATOFS(32) DATLEN(32)]
Static-Header-Pot    = FLGCNF(8) FLGLVL(8) LENCNF(8) LENLVL(8) DATCNF(8) DATLVL(8)

Variable-Base-Header = RECCNT(32) BYTCNT(32) # for all matrix types accept streams
                     | RECCNT(32) BYTCNT(32) CMPTYP(32) # only for streams (XML, JSON, ...)
Variable-Header-Pot  = POTFLG(32) ELMCNT(32) DATLEN(32) Variable-Header-Pot-Compressed-Length Variable-Header-Pot-Equal-Data
Variable-Header-Pot-Compressed-Length = [CMPFLG(32)] [[[[CMPLN3(32)] CMPLN2(32)] CMPLN1(32)] CMPLN0(32)] [CMPDAT(32)]
Variable-Header-Pot-Equal-Data = [ELMFLG(8)] [ELMLEN(32) [ELMDAT(var)]]

Key-Derivation-Data       = HCS(32) NUM(32) RND(32) KTV(32)
Segment-Verification-Data = MAC(64) DCS(64)

Segment-Data-Pot-List = Segment-Data-Pot Segment-Data-Pot-List
                      | @

Segment-Data-Pot = Flag-Buffer Length-Buffer Data-Buffer

Type-Buffer      = PCS(32) DAT(var) # only for streams (XML, JSON, ...)
                 | @

Flag-Buffer      = PCS(32) DAT(var)
                 | @

Length-Buffer    = PCS(32) DAT(var)
                 | @

Data-Buffer      = PCS(32) DAT(var)
                 | @

General
HCS    - 32 Bit checksum over the complete segment header
NUM    - 32 Bit sequence number (different for each segment)
RND    - 32 Bit random number
KTV    - 32 Bit key test value of the current key
PCS    - 32 Bit pot checksum calculated over the compressed and encrypted buffers
DCS    - 64 Bit data checksum over the several pot check sums (PCS)
MAC    - 64 Bit message authentication code (CBC MAC over the DCS and the encrypted)
DAT    - variable length binary compressed and encrypted data

Static-Base-Header
MB     - Magic byte (0x46 ('F') in ASCII/UTF-8)
VN     - Version (0x05)
INTPRT - Integrity protection
CRYSUT - Encryption suite
CMPSUT - Compression suite
MATTYP - Matrix type
MATFLG - Matrix flag (high oder 8 bit of matrix flag word)
TYPCNF - Confidentiality for type buffer
TYPMOD - Compression mode for type buffer
COLCNT - Column count for this matrix
POTCNT - Pot count for this matrix (maximum is 256)
MAXREC - Maximum amount of records which should hold be this matrix 
SIGLEN - Signature length (sum of all Bloom filters of all indexed columns)

Static-Header-Column
COLFLG - First 8 high value bits of column flag word
DATTYP - Data type for this column
SIGMTD - Signature method (Bloom-Filter-1/2/3/4)
POTIND - Pot index for this column
SIGLEN - Signature length for this column
SIGOFS - Signature offset for this column (in the signature field)
DATOFS - Data offset for signature calculation for this column
DATLEN - Data length for signature calculation for this column

Static-Header-Pot
FLGCNF - Confidentiality for flag buffer
FLGLVL - Compression mode for flag buffer
LENCNF - Confidentiality for length buffer
LENLVL - Compression mode for length buffers
DATCNF - Confidentiality for data buffer
DATLVL - Compression mode for data buffer

Variable-Base-Header
RECCNT - Amount of records in the matrix (is equal to the amount of elements for streamed matrix types (has only one column in the matrix, but the column table represent the element types))
BYTCNT - Total byte count in this matrix (for RBA access)
CMPTYP - Length of compressed type buffer (only if streamed type of matrix (most significant bit indicates compression))

Variable-Header-Pot
POTFLG - The flag word for this pot
ELMCNT - The element count for this pot (original length for flag and length buffers)
DATLEN - The original data buffer length for this pot 

Variable-Header-Pot-Compressed-Length
CMPFLG - Length of compressed flag buffer
CMPLN3 - Length of compressed length buffer 3 (Bit 31-24 of element length)
CMPLN2 - Length of compressed length buffer 2 (Bit 23-16 of element length)
CMPLN1 - Length of compressed length buffer 1 (Bit 15- 8 of element length)
CMPLN0 - Length of compressed length buffer 0 (Bit  7- 0 of element length)
CMPDAT - Length of compressed data buffer 

Variable-Header-Pot-Equal-Data
ELMFLG - The equal flag byte (most significant 8 bit of the element flag word)
ELMLEN - The equal length (all elements of the matrix has the same length)
         If the data pointer NULL then the first of the 32 bits are set to mark the initialization at read
ELMDAT - The equal data (all elements of the matrix are identical and smaller than 128 byte)

Optimization has been implemented for flags, lengths, and dates if these are the same for all elements in the matrix. This occurs, for example, for the lengths when archiving an FB dataset. If all flags are the same, then one flag is encoded in the dynamic part of the header and no buffer is compressed and encrypted for the pot. The same applies to the lengths. There is a special feature with the data on the one hand the optimization is not used if encryption is used, so that the data is encrypted with the data key and not the member key. Furthermore, this mechanism is only used for data that are smaller than 128 bytes. As soon as larger lengths come, the buffers are compressed and encrypted even if these elements were all the same.

To save memory, the static part of the header is managed by the archive management in a separate format matrix, as already mentioned, so that this part is not kept per segment. The same applies to the row specification string, which describes the restoration of the original table and this also managed in the separate format matrix so that this long string does not have to be kept in each segment header. The signature, which includes the Bloom filters for the indexed columns, is also not part of the segment header, but is managed separately and placed in the member matrix together with the format index for the static header and the row specification (both together are pointed by a 4 byte link field), the variable part of the segment header and the segment attributes, whose structure is defined as follows.


CLPROW_FL5MBR_ORG = ROW(NAME='FL5MBR' MATTYP=FL5MBR
    COL(NAME='SEGIND' TYPE.BINARY() POT='VALPOT')
    COL(NAME='SEGLNK' TYPE.BINARY() POT='VALPOT')
    COL(NAME='SEGVAR' TYPE.BINARY() POT='VARPOT')
    COL(NAME='SEGATR' TYPE.BINARY() POT='CRYPOT')
    COL(NAME='SIGNAT' TYPE.BINARY() POT='SIGPOT')
    POT(NAME='CRYPOT' CMPLEVEL=COPY)
    POT(NAME='VALPOT' CMPLEVEL=ARITHMETIC)
    POT(NAME='VARPOT')
    POT(NAME='SIGPOT')
  )

SEGIND(64)  - Index to the segment data
SEGLNK(32)  - 32 bit index to row specification and static header (format)
SEGVAR(var) - variable part of segment header
SEGATR(256) - Segment attributes
SIGNAT(var) - Signature of Bloom filters to search in the compressed and encrypted data

This member matrix is serialized and thus compressed and encrypted again when a member is closed and its segment header is then noted with the other values in the directory matrix of the FLAM5 archive, which in turn is serialized and thus compressed and encrypted together with the two special matrices for recording the formats (row specifications (CLP string) and static portion of the segment header) and literal strings when the archive is closed. Below you will find the table definitions for the directory, the format and literal matrix:


CLPROW_FL5DIR_ORG = ROW(NAME='FL5DIR' MATTYP=FL5DIR
    COL(NAME='SEGIND'  TYPE.BINARY() POT='VALPOT')
    COL(NAME='SEGLNK'  TYPE.BINARY() POT='VALPOT')
    COL(NAME='SEGVAR'  TYPE.BINARY() POT='VARPOT')
    COL(NAME='SEGATR'  TYPE.BINARY() POT='CRYPOT')
    COL(NAME='FILSTR'  TYPE.STRING() POT='STRPOT' LOOKUP)
    COL(NAME='STASTR'  TYPE.STRING() POT='STRPOT')
    COL(NAME='MBRSTA'  TYPE.BINARY() POT='VALPOT')
    COL(NAME='COMMENT' TYPE.STRING() POT='STRPOT')
    COL(NAME='BINBLOB' TYPE.BINARY() POT='VARPOT')
    POT(NAME='CRYPOT' CMPLEVEL=COPY)
    POT(NAME='VALPOT' CMPLEVEL=ARITHMETIC)
    POT(NAME='VARPOT')
    POT(NAME='STRPOT')
  )

CLPROW_FL5FMT_ORG = ROW(NAME='FL5FMT' MATTYP=FL5FMT NOCHANGE
    COL(NAME='ROWSTR' TYPE.STRING() POT='STRPOT' LOOKUP)
    COL(NAME='BINHDR' TYPE.BINARY() POT='BINPOT' LOOKUP)
    POT(NAME='STRPOT')
    POT(NAME='BINPOT')
  )

CLPROW_FL5LIT_ORG = ROW(NAME='FL5LIT' MATTYP=FL5LIT NOCHANGE
    COL(NAME='LITERAL' TYPE.BINARY() POT='BINPOT' LOOKUP)
    POT(NAME='BINPOT')
  )

SEGIND(64)   - Index to the segment data or special entry type (directory, empty file, pipe, ...)
SEGLNK(32)   - 32 bit index to row specification and to static header (member matrix format)
SEGVAR(var)  - variable part of segment header
SEGATR(256)  - Segment attributes
FILSTR(var)  - File name
STASTR(var)  - File state (CLP-String (see state string documentation))
MBRSTA(fix)  - 72 byte structure of member statistics encoded in big endian
COMMENT(var) - User comment
BINBLOB(var) - User header

ROWSTR(var)  - Row specification string
BINHDR(var)  - Static portion of segment header

LITERAL(var) - Literal strings (represent as binary blob)

The statistics for a member has the following 72 byte structure:

VsnDmy(32) - ID for this version of the structure (0x52000000U)
MatCnt(32) - Amount of matrix (formats) used for this member
SegCnt(64) - Amount of segments used for this member
RecCnt(64) - Amount of records processed for this member
ElmCnt(64) - Amount of elements processed for this member
OrgLen(64) - Original data length for this member
CmpLen(64) - Compressed data length for this member
OvrHed(64) - Overhead processed for this member
CpuTim(64) - Used CPU time for this member
RunTim(64) - Elapse time used for this member

This structure is treated as a binary blob whose length (72) and structure are fixed, but may change in the future, as indicated by the length and ID/version at the beginning.

Thus, a FLAM5 archive has the following basic structure. This data structure is the root/main element and is again compressed, encrypted and protected from changes using the same means as a matrix.

Again, the data is divided into a clear header, which describes the structure of the data, where only the integrity is protected. If you change this, then you destroy the keys that protect the part that is compressed and encrypted. The message authentication code (MAC) via the archive is further calculated via another part, which contains the segment attributes of the 3 archive matrices, whereby these 96 bytes are not compressed, as they only contain randomness. This is followed by the compressed and encrypted data of the 3 archive matrices. After the static header as the second part of this root segment, an optional key management packet follows, which contains the encrypted session keys. This package is completely self-sufficient, so that it can be re-keyed (access to the data changed) without having to touch the data.


FLAM5-Archive = MBVN(32) Archive-Header-Segment Archive-Data MBVN(32)

Archive-Header-Segment   = Archive-Clear-Header Archive-Crypto-Header-List Archive-Protected-Header
Archive-Crypto-Header-List = Archive-Crypto-Header Archive-Crypto-Header-List
                           | Archive-Crypto-Header

Archive-Clear-Header     = CRYSUT( 8) CMPSUT( 8) ARCTYP( 4) INTPRT( 4) CRYCNF( 4) CMPMOD( 4) ORGLEN(32) CMPLEN(32) ATRLEN(32)
Archive-Crypto-Header    = FKMEID(32) [KEYLEN(32) [KEYSET(var)]]
Archive-Protected-Header = ATR(256) PCS(32) CMPENC(Archive-Headers Archive-Info) MAC(Archive-Attributes)
Archive-Headers          = DIRHLN(32) DIRATR(32) DIRDLN(32) DIRHDR(var) FMTHLN(32) FMTATR(32) FMTDLN(32) FMTHDR(var) LITHLN(32) LITATR(32) LITDLN(32) LITHDR(var)
Archive-Info             = CMTLEN(32) USRCMT(var) HDRLEN(32) USRHDR(var)
Archive-Attributes       = DIRATR(256) HDRATR(256) LITATR(256)
Archive-Data             = DIRPCS(4) DIRDAT(var) FMTPCS(4) FMTDAT(var) LITPCS(4) LITDAT(var)

Archive-Clear-Header (20 Bytes used for header checksum)
MBVN   - 32 bit Magic bytes and version ("FL50" in ASCII/UTF-8)
CRYSUT -  8 bit crypto suite used to protect directory header
CMPSUT -  8 bit compression suite used to compress the directory header
ARCTYP -  4 bit archive type (1-FULL, 2-SubSet (search results))
INTPRT -  4 bit integrity protection used for archive header segment
CRYCNF -  4 bit confidential mode used for archive header segment
CMPMOD -  4 bit compression mode used for archive header segment
ORGLEN - 32 bit original length of the archive headers (used for compression)
CMPLEN - 32 bit compressed length of the archives headers (used for encryption and MAC calculation)
ATRLEN - 32 bit length of segment attributes of the archive data segments (used for MAC calculation)

Archive-Crypto-Header (holds optionally encrypted data keys)
FKMEID - 32 bit FKME identifier to know the right FKME at read
         - if 0 no FKME used and KEYLEN will not be encoded
         - the high order bit indicate an additional archive crypto header
KEYLEN - 32 bit length of the keyset (not encoded if no encryption (former FKMC)) depending on FKMEID)
KEYSET - variable length keyset (encrypted meta and data keys (former FKMC)) for the archive

Archive   -Protected-Header (compressed and encrypted archive headers)
ATR       - 32 byte (256 bit) segment attributes (including the archive MAC)
PCS       - 32 bit pot checksum calculated over the compressed and encrypted buffers
CMPENC    - variable length compressed and encrypted archive headers of directory, global format and literal matrix (including MAC calculation)
MAC       - only signed (MAC calculation) archive attributes of directory, global format and literal matrix

Archive-Attributes
DIRATR    - 32 byte (256 bit) segment attributes of directory matrix (only for MAC calculation)
FMTATR    - 32 byte (256 bit) segment attributes of global format matrix (only for MAC calculation)
LITATR    - 32 byte (256 bit) segment attributes of global literal matrix (only for MAC calculation)

Archive-Headers (the segment data length fields and the segment headers for the 3 root matrixes (directory, formats and literals))
DIRHLN - 32 bit length of the directory segment header
DIRATR - 32 bit length of the directory segment attributes
DIRDLN - 32 bit length of the directory data segment (including PCS)
DIRHDR - variable length directory segment header
FMTHLN - 32 bit length of the global format segment header
FMTATR - 32 bit length of the global format segment attributes
FMTDLN - 32 bit length of the global format data segment (including PCS)
FMTHDR - variable length global format segment header
STRHLN - 32 bit length of the global literal segment header (0 if not used)
STRATR - 32 bit length of the global literal segment attributes (0 if not used)
STRDLN - 32 bit length of the global literal data segment (0 if not used, including PCS)
STRHDR - variable length global literal segment header

Archive-Info (User comment and header (both could be 0 length))
CMTLEN - 32 bit length of the global user commant
USRCMT - variable length of global user comment in UTF-8
HDRLEN - 32 bit length of user header
USRHDR - variable length of global user header

Archive-Data (the segment data of the 4 root matrixes)
DIRPCS - 32 bit pot checksum calculated over the compressed and encrypted DIRDAT
DIRDAT - variable length binary compressed and encrypted data of directory matrix
FMTPCS - 32 bit pot checksum calculated over the compressed and encrypted FMTDAT
FMTDAT - variable length binary compressed and encrypted data of global format matrix
STRPCS - 32 bit pot checksum calculated over the compressed and encrypted STRDAT
STRDAT - variable length binary compressed and encrypted data of global literal matrix

The following bit masks are used to interpret the FKMEID:

    // high order 8 bit used as bit mask
    #define FKMEID_BITMASK                          0xFF000000U // bit mask
    #define FKMEID_DATKEYSET                        0x01000000U // data key access
    #define FKMEID_MBRKEYSET                        0x02000000U // member key access
    #define FKMEID_DIRKEYSET                        0x04000000U // directory key access
    #define FKMEID_PROTECTDAT                       0x10000000U // data key protection
    #define FKMEID_PROTECTMBR                       0x20000000U // member key protection
    #define FKMEID_PROTECTDIR                       0x40000000U // directory key protection
    #define FKMEID_CONTINUE                         0x80000000U // an additional crypto header will follow

    // bit 12 to 16 used for the method
    #define FKMEID_METHOD                           0x000F0000U // Continuously numbering (1,2,3,...,15) per variant
    #define FKMEID_PWD_OTP_PBKDF2_10000_SHA3_512    0x00010000U // key derivation method for passphrases
    #define FKMEID_PGP_SEIP_AES256_SHA512           0x00010000U // encryption method for PGP

    // bit 16 to 24 used for flags
    #define FKMEID_FLAGS                            0x0000FF00U // bit mask
    #define FKMEID_FULDATACS                        0x00000100U // full data access possible with this keyset
    #define FKMEID_FULMBRACS                        0x00000200U // full member access possible with this keyset
    #define FKMEID_FULDIRACS                        0x00000400U // full directory access possible with this keyset
    #define FKMEID_FULACCESS                        0x00000800U // full access possible with this keyset
    #define FKMEID_LIMDATRGT                        0x00001000U // limited data rights assigned for this keyset
    #define FKMEID_LIMMBRRGT                        0x00002000U // limited member rights assigned for this keyset
    #define FKMEID_LIMDIRRGT                        0x00004000U // limited directory rights assigned for this keyset
    #define FKMEID_LIMRIGHTS                        0x00008000U // limited key rights assigned for this keyset

    // low order 4 bit used to identify the key management engine
    #define FKMEID_VARIANT                          0x0000000FU // Continuously numbering (1,2,3,...,15)
    #define FKMEID_PWD                              0x00000001U // passphrase protection used
    #define FKMEID_PGP                              0x00000002U // PGP (RCF4880) protection used

If the length of the original corresponds to the length of the compressed directory segment, then no compression was possible and the data was copied or taken over unchanged.

The segment attributes for the directory, the global format and literal matrix cannot be compressed because they are random values. Therefore, these 16 bytes are only add behind the segment header for MAC calculation.

The segment attributes have a constant length of 32 bytes, whereby the MAC calculation includes the derivation data (second 16 bytes), which means that these segment attributes must always be located directly before the segment data. The segment data it self starts with a 32 bit checksum called PCS. Therefore, they are simply treated as one unit of segment data.

The derivation data part of the attribute field (first 16 byte) contains the header checksum. For re-keying only the first 16 byte archive clear header portion are used for the header checksum. The archive crypto header is not included.

Segment data compression

For compression, the self-sufficient buck compression routines are used from the supported standard methods (GZIP, BZIP2, LZ4, ZSTD, ...). See compression code later in this chapter.

Autarkic buffers for the high-quality flag byte, the respective length bytes and the data are created for each pot. The 6 buffers (flags, 4*length and data) are filled for each element assigned to the respective pot. The original length of the flag buffer and the length of the 4 length buffers is determined by the element count of the pot. The data is written directly one after the other in a separate buffer for each pot. The compressed length of this buffers are optional encoded in the variable portion of the header. In the case the compression will result in a expansion or the copy mode is defined, then the corresponding compression flags are not defined and the length of the compressed data is not encoded.

Depending on the maximum length of the individual data elements, only the compressions whose length bytes are not all 0x00 are coded from the 4 length buffers. Which length buffers have been coded is noted in the flag word for the pot. There is another mechanism for saving compression effort here. If all flags are the same, or all lengths are the same or the data is shorter than 128, does not have to be encrypted (ENC) and is also the same, the respective compression buffer is not encoded but the equal value is written to the variable part of the segment header, which is also indicated by the pot flags listed below.

Pot flags:

 #define FLMPOT_FLAG_EQUFLG    0x00010000U // all flag bytes are the same
 #define FLMPOT_FLAG_EQULEN    0x00020000U // all element length are the same
 #define FLMPOT_FLAG_EQUDAT    0x00040000U // all data elements are equal

 #define FLMPOT_FLAG_DATLN0    0x00000000U // only the low oder 8 bits of the length are encoded
 #define FLMPOT_FLAG_DATLN1    0x00000001U // only the low oder 16 bits of the length are encoded
 #define FLMPOT_FLAG_DATLN2    0x00000002U // only the low oder 24 bits of the length are encoded
 #define FLMPOT_FLAG_DATLN3    0x00000003U // full 32 bits of the length are encoded (all 4 puffers)
 #define FLMPOT_FLAG_CMPLN0    0x00000010U // compressed length for the first length buffer is encoded
 #define FLMPOT_FLAG_CMPLN1    0x00000020U // compressed length for the second length buffer is encoded
 #define FLMPOT_FLAG_CMPLN2    0x00000040U // compressed length for the third length buffer is encoded
 #define FLMPOT_FLAG_CMPLN3    0x00000080U // compressed length for the forth length buffer is encoded
 #define FLMPOT_FLAG_CMPFLG    0x00000100U // compressed length for the flag buffer is encoded
 #define FLMPOT_FLAG_CMPDAT    0x00000200U // compressed length for the data buffer is encode

If a single buffer expands or is not to be compressed in the first place (CPY), the compressed lengths are not coded, the associated flags are not set, the data is copied in this case and its length corresponds to the length of the original data or the element count.

A special case of compression occurs with stream matrices. Such a matrix is a list of elements where the element type determines the column (as index in the column table). In this case, a type buffer must be written for each element for the entire matrix, whereby the type here takes up a maximum of one byte (8 bits). With a stream matrix, the number of records corresponds to the number of elements and the length of the compressed type buffer is coded at offset 8 at the start of the variable part of the header. If this length is not smaller than the record count, the type buffer would then be expanded and copied.

The algorithms used for compression depend on the respective suite and the corresponding compression mode. In copy mode, the data is simply copied, which always makes sense if the data has no redundancies. In fast (FST) mode, a low compression level is generally used. The dynamic (DYN) mode uses the default medium compression level and the Compact (CPT) mode uses the highest compression level. See chapter below for more information. The Arithmetic (ART) mode is a special mode where BZIP2 with 900KiB blocking is always used so that an arithmetic compression for numbers and certain binary data is available in each suite. This means that in the BZIP2 suite, the CPT mode is identical to the ART mode.

The compression mode can be specified generally for the segment or specifically for each pot or for each of the buffers (flag, lengths, data) of the certain pot. It can make sense to use arithmetic compression for the lengths in particular. However, the default mode in all cases is the dynamic, i.e. medium variant.

All buffers (types, flags, lengths and data) are compressed independently for each segment with an atomic call against the respective suite as described below.

GZIP compression suite

The GZIP suite happens on the zlib and is called as follows with the specified levels.

MODE=FAST -> LEVEL=2 MODE=DYNAMIC -> LEVEL=6 MIDE=COMPACT -> LEVEL=9 MODE=ARITHMETIC -> BZIP2(900KiB)

err = deflateInit2(&stStream, LEVEL, Z_DEFLATED, GZIP_NO_HDR, GZIP_MEM_LEVEL, Z_DEFAULT_STRATEGY); err = deflate(&stStream, Z_FINISH); deflateEnd(&stStream);

BZIP2 compression suite

The BZIP2 suite happens on the libbzip2 and is called as follows with the specified levels.

 MODE=FAST       -> LEVEL=1(100KiB)
 MODE=DYNAMIC    -> LEVEL=6(600KiB)
 MIDE=COMPACT    -> LEVEL=9(900KiB)
 MODE=ARITHMETIC -> BZIP2(900KiB)

 err = BZ2_bzBuffToBuffCompress((char*)pcOutDat, &ol, (char*)pcInDat, uiInLen, LEVEL, 0, 30);

LZ4 compression suite

The LZ4 suite happens on the liblz4 and is called as follows with the specified levels.

 MODE=FAST       -> LEVEL=16
 MODE=DYNAMIC    -> LEVEL=7
 MIDE=COMPACT    -> LEVEL=1
 MODE=ARITHMETIC -> BZIP2(900KiB)

 err = LZ4_compress_fast((const char*)pcInDat, (char*)pcOutDat, uiInLen, uiInLen, LEVEL);

ZSTD compression suite

The ZSTD suite happens on the libzstd and is called as follows with the specified levels.

 MODE=FAST       -> LEVEL=1
 MODE=DYNAMIC    -> LEVEL=9
 MIDE=COMPACT    -> LEVEL=19
 MODE=ARITHMETIC -> BZIP2(900KiB)

 err = ZSTD_compress(pcOutDat, ol, pcInDat, uiInLen, LEVEL);

Segment data protection

The description of the security mechanisms is not only for the programmer to reproduce but also for the public to check the security mechanisms.

The protection of the segment data is determined by the specification of the segment integrity (INTPROT) and the specification of the segment confidentiality (CRYCONF). In the following, the maximum cryptographic protection is described first (INTPROT=MAC and CRYCONF=ENC). If the confidentiality and integrity are selected lower, the corresponding calculations are omitted.

The segment attributes are divided into derivation data and verification data. The 16 byte long derivation data consists of the header checksum (32 bit CRC or FNV depending on the crypto suite), a 32 bit wide sequence number for each segment, a 32 bit wide random number and the 32 bit wide key test value for the current member-specific data key (4 byte CMAC via the complete key material (64 byte) with a corresponding part from data key). The verification data consists of a 64-bit or 128-bit checksum (CRC or FNV depending on the crypto suite) and a 64-bit MAC if this is calculated (INTPROT=AUTH/MAC). With INTPROT=AUTH, no complete AES-CMAC is calculated over the compressed data, but only the simple checksum is encrypted using the AES_CMAC standard. With INTPROT=CHKS, the CRC32 or FNV32 checksum is expanded to 128 bits. In all other cases, the CRC32 or FNV32 data checksums are expanded to 64 bits. The header checksum and the pot checksum are 32 bits in length and are also calculated using standard CRC32 or FNV32 routines. The use of standard routines for checksums, encryption and integrity protection allows hardware acceleration (e.g. CPACF) to be used where available.

The AES key length (128, 196 or 256 bits) for standard CBC mode encryption and AES-CMAC integrity protection is determined by the crypto suite used. The 16-byte IV and the 16, 24 or 32-byte AES key for CBC encryption and/or AES-CMAC integrity protection are taken from the correspondingly derived 80-byte key material at pre-defined offsets.

Since we work exclusively with self-sufficient segments, all calls to derivation functions, encryption routines, MAC or hash calculation are atomic and no chaining takes place. The protection of the completeness of the segments is simply achieved by the fact that the member matrix above, which manages the data segments of this member, also includes the segment attributes of each data segment of this member with the checksums and MACs in addition to the segment headers, which in turn means that these checksums and MACs are included in the checksum and MAC calculation of the matrix above. The same applies to the director matrix, which includes the segment attributes of the entire member matrices. This has the advantage that only 3 matrices need to be updated when a certain data record is updated.

The basic functions used for the processes described here are described below.

Header checksum (HCS)

The header check sum (CRC32 or FNV32) is calculated over the complete segment header (static and variable part). The header checksum (32 bits) is written as part of the segment attributes to the derivation data.

   CRC32/FNV32(headerLen, header, 4, hcs);

The header checksum (HCS) is part of the derivation data data and is used for simple validation to ensure that the segment header has not been accidentally corrupted. Since the header checksum is included in the key derivation, manipulation would prevent access to the data with cryptographic security, since the wrong derived segment key values would be used.

Generation of key test vale (KTV)

The key test value is generated on the basis of the respective main key and serves to verify it. It is similar to key derivation and also represents a one-way hash function where the key encrypts itself. For this purpose, the CMAC is calculated in length 4 with the key at offset 24 and the IV at offset 8 over the 64 bytes of the master key, which is shown schematically here.

   AESCMAC(16/24/32, mainKey+24, IVLEN, mainKey+8, 64 mainKey, 4, ktv));

The key test pattern is used for early and unambiguous detection that something is wrong with the password or the higher-level key management before the data is accessed. It is also part of the derivation data, which means that it is also specific to the respective master key.

Key derivation

The key derivation uses the derivation data of the segment attributes as IV (16 byte) for the CBC-MODE encryption and encrypts the 64 bytes of the corresponding main key (data, member or directory) and in turn uses the main key from offset 8 as the key in the respective length, which is specified by the crypto suite for the AES key.

   AESCBCENC(16/24/32, mainKey+8, 16, drvData, 64, mainKey, drvdKey);

The function above schematically shows the call of the CBC mode encryption with AES. As the key is part of the data here and is therefore self-encrypting, the key derivation is a one-way function. This ensures cryptographically that no conclusions can be drawn about the master key on the basis of the derivation result.

The derivation data is always unique for each segment due to the counter. The random number makes this even more dynamic, and the inclusion of the header checksum means that falsification of the segment header leads to the destruction of the segment key, which guarantees the integrity protection of the header data. The inclusion of the KTV in the derivation data ensures that this value cannot be falsified, thus protecting its integrity.

Segment encryption

The segment encoding takes place across all compresses of the respective pots, which are simply concatenated. The lengths of the compressions are coded in the variable part of the segment header. Therefore, all compressions can simply be packed one after the other and encrypted with one call. To do this, however, the segment-specific encryption key must first be derived.

OFB mode is used to encrypt the segment data. For this purpose, the key is taken from the derived key material at offset 0 and the initialization vector at offset 40. The call is shown schematically below.

   XOR(drvdKey+0,segNum); XOR(drvdKey+40,segNum);
   AESOFBENC(16/24/32, drvdKey+0, 16, drvdKey+40, dataLen, data);
   XOR(drvdKey+0,segNum); XOR(drvdKey+40,segNum);

Encryption in OFB mode is carried out in place and overwrites the data in the same place in the memory so that the original data does not have to be deleted here. OFB mode also has the advantage that no padding is necessary and the total length of the encoded compresses does not change.

The derived segment-specific key material is XOR'ed once again before it is used as a key or IV with the respective segment number at the respective offset in length 4 (in network byte order (big endian)), which is subsequently reversed in order to restore the key material in its original form.

Pot checksum (PCS)

The pot check sum (CRC32 or FNV32) is calculated using the previously encrypted compressions of the individual pots. The pot checksum (32 bits) is written directly before the compressed and, if applicable, encrypted data stream.

   CRC32/FNV32(dataLen, data, 4, data-4/*PCS*/);

The pot checksum is part of the segment data and is used for simple validation to ensure that the segment data has not been accidentally corrupted.

Data checksum (DCS)

The data check sum (CRC32 or FNV32) is calculated simply over the pot checksum (4 byte) and expanded to 8 or 16 byte depending on the integity mode.

   CRC32/FNV32(4, data-4/*PCS*/, 8/16, data-12/20/*DCS*/);

The data checksum is part of the segment attributes, which are stored together with the segment header in the higher-level matrix and, like the pot checksum, are simply used to detect random changes to the data.

Segment integrity protection

The cryptographic protection of the integrity of a segment is carried out via an AES CMAC calculation, whereby the key is tapped at offset 32 and the IV at offset 16 of the derived key material (64 bytes). The CMAC is already calculated via the 8-byte long data checksum, which is located directly before the pot checksum, followed by any data that has already been encrypted.

   XOR(drvdKey+32,segNum); XOR(drvdKey+16,segNum);
   AESCMAC(16/24/32, drvdKey+32, 16, drvdKey+16, dataLen+12, data-12/*DCS*/, data-20/*MAC*/);
   XOR(drvdKey+32,segNum); XOR(drvdKey+16,segNum);

The data checksum (8 bytes) and the pot checksum (4 bytes) are included in the CMAC, which results in a shift of the calculation to 12 bytes. The first half of the resulting MAC is written before the data checksum, which means an offset of -20 from the beginning of the compressions, whereby the first 8 bytes are filled with the second half of the segment attributes.

If INTPROT=AUTH is used instead of INTPROT=MAC, the CMAC calculation is only carried out using the 8-byte long data checksum.

   XOR(drvdKey+32,segNum); XOR(drvdKey+16,segNum);
   AESCMAC(16/24/32, drvdKey+32, 16, drvdKey+16, 8, data-12/*DCS*/, data-20/*MAC*/);
   XOR(drvdKey+32,segNum); XOR(drvdKey+16,segNum);

This saves the MAC calculation via the compresses and ensures that willful manipulations are recognized, but not via a cryptographic checksum.

FLAM5 key management

For a better understanding, it is important to know that a FLAM segment is encrypted on the one hand and, on the other hand, protected against changes with a key-dependent cryptographic checksum (MAC). Depending on the crypto suite, the necessary keys, ICVs, etc. are determined from 64 bytes (512 bit) of key material and 16 bytes (128 bit) for dynamisation, i.e. a total of 80 bytes.

Furthermore, FLAM works here with the following crypto cores in total.

to protect the net data (what really has to be encrypted)
to protect the member index (incl. the Bloom filters over the data)
to protect the directory (table of contents of the archive)

So the last two protect the meta-data in the archive. Therefore, there are three 80-byte random session keys that must be protected by the respective encryption object of this overlay.

Access to the contents of the archive can be controlled by this 3-partitioning. If you do not have access to any of the 3 keys, you cannot do anything with the archive except store it. With access to the directory key, you can only list the contents. With access to the member key, you can still collect compressed and encrypted result sets for a search against the Bloom filter signature via indexing. And only with access to the data key can the actual content, i.e. the clear data, be accessed.

To ensure that this works accordingly, the data key, the member key and the directory key are protected with a higher-level key encryption key or password.

For searching in the compressed and encrypted archives, a superordinate member key is used, which protects the directory and member key, and finally there can be a key encryption key (KEK) only for the directory key, if one is only to have access to the table of contents. Depending on the procedure, this package of the 3 session keys is signed so that the originator can be authenticated.

This three-way split means that the key management procedures offered here always support one key for accessing the data, one for searching (for the index (member key)) and one for listing the directory content.

When encrypting the session key material, access rights can be additionally restricted using a parameter string. This CLP string is encoded in UTF-8 and encrypted together with the session keys and protected against manipulation, thus providing a cryptographic binding. If such an additional restriction of rights to certain members, formats or columns has been defined, then this user cannot assign any further rights himself and his rights in the archive are limited to the rights bound here. It is the task of the respective FKME implementation to ensure the cryptographic binding of the rights.

The FLAM syntax allows several self-sufficient encrypted key packages to occur one after the other (FKMEID_CONTINUE), which are created by default during implicit or explicit re-keying. The keysets are only overwritten with the explicit specification of KEYMODE=REPLACE and the previous rights are lost in the new version of the archive.

Password protection

The password encryption of the one, two or three times 80 byte long key material takes place in GCM mode, whereby the FKMEID and the three key lengths are each entered as 32 bit big endian values (together 16 bytes) as header data in the GCM-MAC. The AES 256 bit key for GCM encryption with MAC calculation is derived from the passphrase according to PBKDF2 with a 16 byte random salt over 10000 rounds and SHA3_512 as the hash method. Here, 12 bytes derived as IV, 4 bytes are derived as KTV and 32 bytes derived for the AES key (in total 48 bytes).

The key and IV derived in this way are then used for GCM mode encryption via the 16-byte header, the right string and the key data, whereby the resulting MAC (16 bytes) is noted directly after the SALT and KTV and therefore before the encrypted data. The following diagram shows the structures used:

 FKMC      -> Header EncDatKey EncMbrKey EncDirKey
 Header    -> FKMEID DATKL MBRKL DIRKL
 EncXxxKey -> SALT KTV MAC Enc
 Enc       -> RL RS KEY
    
    FKMEID -  32 bit - FKME ID
    DATKL  -  32 bit - Encrypted data key package length
    MBRKL  -  32 bit - Encrypted member key package length
    DIRKL  -  32 bit - Encrypted directory key package length

    SALT   - 128 bit - Salt for key derivation (random generated)
    KTV    -  32 bit - Key test value (offset 12 of derived data)
    MAC    - 128 bit - Message authentication code of GCM encryption
    
    RL     -  32 bit - Length of right string
    RS     -  var    - Right string in UTF-8
    KEY    - 240 byte- Data key set
           - 160 byte- Member key set
           -  80 byte- Directory key set

PGP protection

PGP encryption with integrity protection and optional signing is carried out per user ID with the Cyphersuite SEIP_AES256_SHA512 in accordance with RFC4880, where the key material described below is treated as a self-sufficient data stream.

 EncData   -> FKMEID RL RS KEY
    
    FKMEID -  32 bit - FKME ID (big endian format)
    RL     -  32 bit - Length of right string (big endian format)
    RS     -  var    - Right string in UTF-8
    KEY    - 240 byte- Data key set
           - 160 byte- Member key set
           -  80 byte- Directory key set

The encrypted data packets are simply encoded one after the other with their length and the corresponding encrypted data.

FKMEID definitions

Below you can find the FKMEID definitions used.

 // high order 8 bit used as bit mask
 #define FKMEID_BITMASK                          0xFF000000U // bit mask
 #define FKMEID_DATKEYSET                        0x01000000U // data key access
 #define FKMEID_MBRKEYSET                        0x02000000U // member key access
 #define FKMEID_DIRKEYSET                        0x04000000U // directory key access
 #define FKMEID_PROTECTDAT                       0x10000000U // data key protection
 #define FKMEID_PROTECTMBR                       0x20000000U // member key protection
 #define FKMEID_PROTECTDIR                       0x40000000U // directory key protection
 #define FKMEID_CONTINUE                         0x80000000U // an additional crypto header will follow
 #define FKMEID_COMPLETE                         0x7FFFFFFFU

 // bit 12 to 16 used for the method (till 15, FKMEID_ZERO must be adjusted for more)
 #define FKMEID_METHOD                           0x000F0000U // Continuously numbering (1,2,3,...,15) per variant
 #define FKMEID_PWD_OTP_PBKDF2_10000_SHA3_512    0x00010000U // key derivation method for passphrases
 #define FKMEID_PGP_SEIP_AES256_SHA512           0x00010000U // encryption method for PGP

 // bit 16 to 24 used for flags
 #define FKMEID_FLAGS                            0x0000FF00U // bit mask
 #define FKMEID_FULDATACS                        0x00000100U // full data access possible with this key set
 #define FKMEID_FULMBRACS                        0x00000200U // full member access possible with this key set
 #define FKMEID_FULDIRACS                        0x00000400U // full directory access possible with this key set
 #define FKMEID_FULACCESS                        0x00000800U // full access possible with this key set
 #define FKMEID_LIMDATRGT                        0x00001000U // limited data rights assigned for this key set
 #define FKMEID_LIMMBRRGT                        0x00002000U // limited member rights assigned for this key set
 #define FKMEID_LIMDIRRGT                        0x00004000U // limited directory rights assigned for this key set
 #define FKMEID_LIMRIGHTS                        0x00008000U // limited key rights assigned for this key set

 // low order 4 bit used to identify the key management engine (till 15, FKMEID_ZERO must be adjusted for more)
 #define FKMEID_VARIANT                          0x0000000FU // Continuously numbering (1,2,3,...,15)
 #define FKMEID_PWD                              0x00000001U // passphrase protection used
 #define FKMEID_PGP                              0x00000002U // PGP (RCF4880) protection used

The LIST command displays a list of the encrypted key sets and the respective FKMEID. The DELKEY command can be used to remove these encrypted key sets (FKMCs) from a new version using the respective index or by specifying the FKMEID.