The following description of the segment format is intended to allow you to program against the data format yourself, so that dependencies with regard to persistence are avoided.
A FLAM5 segment is the serialized representation of a FLAM5 matrix which holds the neutral FLAM5 elements in memory and allows random access to the data regardless of their format (XML, CSV, FB/VB/...).
For the persistent storage of these neutral data matrices in our FLAM5 archives, the matrices are serialized to segments. This involves compression, encryption and signing as well as indexing of the data. The serialized segments are self-sufficient images of a data matrix in the memory and serve to build it up when read.
The following describes the structure of these self-sufficient segments, which are persistently managed by our various segment storage implementations (flat file, folder, cloud).
For serialization, half bytes (4 bit), bytes (8 bit), integers (32 bit or 64 bit), variable length string or binary data blobs of variable length are used. The integers are represented in Big Endian format and strings are encoded in UTF-8.
Such a segment basically consists of 3 parts. The so-called segment header, the segment attributes and the segment data. The segment header is divided into a static and a variable part, whereby the static part, as the name suggests, is constant and is therefore stored in a separate matrix for global format descriptions. The variable part, as the name suggests, contains the values, which can be different for each segment. These include, for example, the original and compressed lengths. The index to the static part of the header, the variable part of the segment header and the segment attributes are stored by the archive management in the associated member matrix, whose entries thus always describe an entire file in the archive. The segment data consist only of a checksum and the encrypted compressed data for the individual components of one of the defined pots. This results in the following schematic structure of a segment (as Backus-Naur diagram with bit length in parenthesis, square brackets for optional stuff and ampersand for the empty set and hash for comment):
FL5-Segment = Segment-Header Segment-Attributes Segment-Data Segment-Header = Static-Segment-Header Variable-Segment-Header Segment-Attributes = Key-Derivation-Data Segment-Verification-Data Segment-Data = [Type-Buffer] Segment-Data-Pot-List Static-Segment-Header = Static-Base-Header Static-Column-List Static-Pot-List Variable-Segment-Header = Variable-Base-Header Variable-Pot-List Static-Column-List = Static-Header-Column Static-Column-List Static-Pot-List = Static-Header-Pot Static-Pot-List Variable-Pot-List = Variable-Header-Pot Variable-Pot-List Static-Base-Header = MB(8) VN(4) INTPRT(4) CRYSUT(8) CMPSUT(8) MATTYP(8) MATFLG(8) TYPCNF(8) TYPMOD(8) COLCNT(32) POTCNT(32) MAXREC(32) SIGLEN(32) Static-Header-Column = COLFLG(8) DATTYP(8) SIGMTD(8) POTIND(8) [SIGLEN(32) SIGOFS(32) DATOFS(32) DATLEN(32)] Static-Header-Pot = FLGCNF(8) FLGLVL(8) LENCNF(8) LENLVL(8) DATCNF(8) DATLVL(8) Variable-Base-Header = RECCNT(32) BYTCNT(32) # for all matrix types accept streams | RECCNT(32) BYTCNT(32) CMPTYP(32) # only for streams (XML, JSON, ...) Variable-Header-Pot = POTFLG(32) ELMCNT(32) DATLEN(32) Variable-Header-Pot-Compressed-Length Variable-Header-Pot-Equal-Data Variable-Header-Pot-Compressed-Length = [CMPFLG(32)] [[[[CMPLN3(32)] CMPLN2(32)] CMPLN1(32)] CMPLN0(32)] [CMPDAT(32)] Variable-Header-Pot-Equal-Data = [ELMFLG(8)] [ELMLEN(32) [ELMDAT(var)]] Key-Derivation-Data = HCS(32) NUM(32) RND(32) KTV(32) Segment-Verification-Data = MAC(64) DCS(64) Segment-Data-Pot-List = Segment-Data-Pot Segment-Data-Pot-List | @ Segment-Data-Pot = Flag-Buffer Length-Buffer Data-Buffer Type-Buffer = PCS(32) DAT(var) # only for streams (XML, JSON, ...) | @ Flag-Buffer = PCS(32) DAT(var) | @ Length-Buffer = PCS(32) DAT(var) | @ Data-Buffer = PCS(32) DAT(var) | @ General HCS - 32 Bit checksum over the complete segment header NUM - 32 Bit sequence number (different for each segment) RND - 32 Bit random number KTV - 32 Bit key test value of the current key PCS - 32 Bit pot checksum calculated over the compressed and encrypted buffers DCS - 64 Bit data checksum over the several pot check sums (PCS) MAC - 64 Bit message authentication code (CBC MAC over the DCS and the encrypted) DAT - variable length binary compressed and encrypted data Static-Base-Header MB - Magic byte (0x46 ('F') in ASCII/UTF-8) VN - Version (0x05) INTPRT - Integrity protection CRYSUT - Encryption suite CMPSUT - Compression suite MATTYP - Matrix type MATFLG - Matrix flag (high oder 8 bit of matrix flag word) TYPCNF - Confidentiality for type buffer TYPMOD - Compression mode for type buffer COLCNT - Column count for this matrix POTCNT - Pot count for this matrix (maximum is 256) MAXREC - Maximum amount of records which should hold be this matrix SIGLEN - Signature length (sum of all Bloom filters of all indexed columns) Static-Header-Column COLFLG - First 8 high value bits of column flag word DATTYP - Data type for this column SIGMTD - Signature method (Bloom-Filter-1/2/3/4) POTIND - Pot index for this column SIGLEN - Signature length for this column SIGOFS - Signature offset for this column (in the signature field) DATOFS - Data offset for signature calculation for this column DATLEN - Data length for signature calculation for this column Static-Header-Pot FLGCNF - Confidentiality for flag buffer FLGLVL - Compression mode for flag buffer LENCNF - Confidentiality for length buffer LENLVL - Compression mode for length buffers DATCNF - Confidentiality for data buffer DATLVL - Compression mode for data buffer Variable-Base-Header RECCNT - Amount of records in the matrix (is equal to the amount of elements for streamed matrix types (has only one column in the matrix, but the column table represent the element types)) BYTCNT - Total byte count in this matrix (for RBA access) CMPTYP - Length of compressed type buffer (only if streamed type of matrix (most significant bit indicates compression)) Variable-Header-Pot POTFLG - The flag word for this pot ELMCNT - The element count for this pot (original length for flag and length buffers) DATLEN - The original data buffer length for this pot Variable-Header-Pot-Compressed-Length CMPFLG - Length of compressed flag buffer CMPLN3 - Length of compressed length buffer 3 (Bit 31-24 of element length) CMPLN2 - Length of compressed length buffer 2 (Bit 23-16 of element length) CMPLN1 - Length of compressed length buffer 1 (Bit 15- 8 of element length) CMPLN0 - Length of compressed length buffer 0 (Bit 7- 0 of element length) CMPDAT - Length of compressed data buffer Variable-Header-Pot-Equal-Data ELMFLG - The equal flag byte (most significant 8 bit of the element flag word) ELMLEN - The equal length (all elements of the matrix has the same length) If the data pointer NULL then the first of the 32 bits are set to mark the initialization at read ELMDAT - The equal data (all elements of the matrix are identical and smaller than 128 byte)
Optimization has been implemented for flags, lengths, and dates if these are the same for all elements in the matrix. This occurs, for example, for the lengths when archiving an FB dataset. If all flags are the same, then one flag is encoded in the dynamic part of the header and no buffer is compressed and encrypted for the pot. The same applies to the lengths. There is a special feature with the data on the one hand the optimization is not used if encryption is used, so that the data is encrypted with the data key and not the member key. Furthermore, this mechanism is only used for data that are smaller than 128 bytes. As soon as larger lengths come, the buffers are compressed and encrypted even if these elements were all the same.
To save memory, the static part of the header is managed by the archive management in a separate format matrix, as already mentioned, so that this part is not kept per segment. The same applies to the row specification string, which describes the restoration of the original table and this also managed in the separate format matrix so that this long string does not have to be kept in each segment header. The signature, which includes the Bloom filters for the indexed columns, is also not part of the segment header, but is managed separately and placed in the member matrix together with the format index for the static header and the row specification (both together are pointed by a 4 byte link field), the variable part of the segment header and the segment attributes, whose structure is defined as follows.
CLPROW_FL5MBR_ORG = ROW(NAME='FL5MBR' MATTYP=FL5MBR COL(NAME='SEGIND' TYPE.BINARY() POT='VALPOT') COL(NAME='SEGLNK' TYPE.BINARY() POT='VALPOT') COL(NAME='SEGVAR' TYPE.BINARY() POT='VARPOT') COL(NAME='SEGATR' TYPE.BINARY() POT='CRYPOT') COL(NAME='SIGNAT' TYPE.BINARY() POT='SIGPOT') POT(NAME='CRYPOT' CMPLEVEL=COPY) POT(NAME='VALPOT' CMPLEVEL=ARITHMETIC) POT(NAME='VARPOT') POT(NAME='SIGPOT') ) SEGIND(64) - Index to the segment data SEGLNK(32) - 32 bit index to row specification and static header (format) SEGVAR(var) - variable part of segment header SEGATR(256) - Segment attributes SIGNAT(var) - Signature of Bloom filters to search in the compressed and encrypted data
This member matrix is serialized and thus compressed and encrypted again when a member is closed and its segment header is then noted with the other values in the directory matrix of the FLAM5 archive, which in turn is serialized and thus compressed and encrypted together with the two special matrices for recording the formats (row specifications (CLP string) and static portion of the segment header) and literal strings when the archive is closed. Below you will find the table definitions for the directory, the format and literal matrix:
CLPROW_FL5DIR_ORG = ROW(NAME='FL5DIR' MATTYP=FL5DIR COL(NAME='SEGIND' TYPE.BINARY() POT='VALPOT') COL(NAME='SEGLNK' TYPE.BINARY() POT='VALPOT') COL(NAME='SEGVAR' TYPE.BINARY() POT='VARPOT') COL(NAME='SEGATR' TYPE.BINARY() POT='CRYPOT') COL(NAME='FILSTR' TYPE.STRING() POT='STRPOT' LOOKUP) COL(NAME='STASTR' TYPE.STRING() POT='STRPOT') COL(NAME='MBRSTA' TYPE.BINARY() POT='VALPOT') COL(NAME='COMMENT' TYPE.STRING() POT='STRPOT') COL(NAME='BINBLOB' TYPE.BINARY() POT='VARPOT') POT(NAME='CRYPOT' CMPLEVEL=COPY) POT(NAME='VALPOT' CMPLEVEL=ARITHMETIC) POT(NAME='VARPOT') POT(NAME='STRPOT') ) CLPROW_FL5FMT_ORG = ROW(NAME='FL5FMT' MATTYP=FL5FMT NOCHANGE COL(NAME='ROWSTR' TYPE.STRING() POT='STRPOT' LOOKUP) COL(NAME='BINHDR' TYPE.BINARY() POT='BINPOT' LOOKUP) POT(NAME='STRPOT') POT(NAME='BINPOT') ) CLPROW_FL5LIT_ORG = ROW(NAME='FL5LIT' MATTYP=FL5LIT NOCHANGE COL(NAME='LITERAL' TYPE.BINARY() POT='BINPOT' LOOKUP) POT(NAME='BINPOT') ) SEGIND(64) - Index to the segment data or special entry type (directory, empty file, pipe, ...) SEGLNK(32) - 32 bit index to row specification and to static header (member matrix format) SEGVAR(var) - variable part of segment header SEGATR(256) - Segment attributes FILSTR(var) - File name STASTR(var) - File state (CLP-String (see state string documentation)) MBRSTA(fix) - 72 byte structure of member statistics encoded in big endian COMMENT(var) - User comment BINBLOB(var) - User header ROWSTR(var) - Row specification string BINHDR(var) - Static portion of segment header LITERAL(var) - Literal strings (represent as binary blob)
The statistics for a member has the following 72 byte structure:
VsnDmy(32) - ID for this version of the structure (0x52000000U) MatCnt(32) - Amount of matrix (formats) used for this member SegCnt(64) - Amount of segments used for this member RecCnt(64) - Amount of records processed for this member ElmCnt(64) - Amount of elements processed for this member OrgLen(64) - Original data length for this member CmpLen(64) - Compressed data length for this member OvrHed(64) - Overhead processed for this member CpuTim(64) - Used CPU time for this member RunTim(64) - Elapse time used for this member
This structure is treated as a binary blob whose length (72) and structure are fixed, but may change in the future, as indicated by the length and ID/version at the beginning.
Thus, a FLAM5 archive has the following basic structure. This data structure is the root/main element and is again compressed, encrypted and protected from changes using the same means as a matrix.
Again, the data is divided into a clear header, which describes the structure of the data, where only the integrity is protected. If you change this, then you destroy the keys that protect the part that is compressed and encrypted. The message authentication code (MAC) via the archive is further calculated via another part, which contains the segment attributes of the 3 archive matrices, whereby these 96 bytes are not compressed, as they only contain randomness. This is followed by the compressed and encrypted data of the 3 archive matrices. After the static header as the second part of this root segment, an optional key management packet follows, which contains the encrypted session keys. This package is completely self-sufficient, so that it can be re-keyed (access to the data changed) without having to touch the data.
FLAM5-Archive = MBVN(32) Archive-Header-Segment Archive-Data MBVN(32) Archive-Header-Segment = Archive-Clear-Header Archive-Crypto-Header-List Archive-Protected-Header Archive-Crypto-Header-List = Archive-Crypto-Header Archive-Crypto-Header-List | Archive-Crypto-Header Archive-Clear-Header = CRYSUT( 8) CMPSUT( 8) ARCTYP( 4) INTPRT( 4) CRYCNF( 4) CMPMOD( 4) ORGLEN(32) CMPLEN(32) ATRLEN(32) Archive-Crypto-Header = FKMEID(32) [KEYLEN(32) [KEYSET(var)]] Archive-Protected-Header = ATR(256) PCS(32) CMPENC(Archive-Headers) MAC(Archive-Attributes) Archive-Headers = DIRHLN(32) DIRATR(32) DIRDLN(32) DIRHDR(var) FMTHLN(32) FMTATR(32) FMTDLN(32) FMTHDR(var) LITHLN(32) LITATR(32) LITDLN(32) LITHDR(var) Archive-Attributes = DIRATR(256) HDRATR(256) LITATR(256) Archive-Data = DIRPCS(4) DIRDAT(var) FMTPCS(4) FMTDAT(var) LITPCS(4) LITDAT(var) Archive-Clear-Header (20 Bytes used for header checksum) MBVN - 32 bit Magic bytes and version ("FL50" in ASCII/UTF-8) CRYSUT - 8 bit crypto suite used to protect directory header CMPSUT - 8 bit compression suite used to compress the directory header ARCTYP - 4 bit archive type (1-FULL, 2-SubSet (search results)) INTPRT - 4 bit integrity protection used for archive header segment CRYCNF - 4 bit confidential mode used for archive header segment CMPMOD - 4 bit compression mode used for archive header segment ORGLEN - 32 bit original length of the archive headers (used for compression) CMPLEN - 32 bit compressed length of the archives headers (used for encryption and MAC calculation) ATRLEN - 32 bit length of segment attributes of the archive data segments (used for MAC calculation) Archive-Crypto-Header (holds optionally encrypted data keys) FKMEID - 32 bit FKME identifier to know the right FKME at read - if 0 no FKME used and KEYLEN will not be encoded - the high order bit indicate an additional archive crypto header KEYLEN - 32 bit length of the keyset (not encoded if no encryption (former FKMC)) depending on FKMEID) KEYSET - variable length keyset (encrypted meta and data keys (former FKMC)) for the archive Archive -Protected-Header (compressed and encrypted archive headers) ATR - 32 byte (256 bit) segment attributes (including the archive MAC) PCS - 32 bit pot checksum calculated over the compressed and encrypted buffers CMPENC - variable length compressed and encrypted archive headers of directory, global format and literal matrix (including MAC calculation) MAC - only signed (MAC calculation) archive attributes of directory, global format and literal matrix Archive-Attributes DIRATR - 32 byte (256 bit) segment attributes of directory matrix (only for MAC calculation) FMTATR - 32 byte (256 bit) segment attributes of global format matrix (only for MAC calculation) LITATR - 32 byte (256 bit) segment attributes of global literal matrix (only for MAC calculation) Archive-Headers (the 4 segment headers and segment data length for the 4 root matrixes) DIRHLN - 32 bit length of the directory segment header DIRATR - 32 bit length of the directory segment attributes DIRDLN - 32 bit length of the directory data segment (including PCS) DIRHDR - variable length directory segment header FMTHLN - 32 bit length of the global format segment header FMTATR - 32 bit length of the global format segment attributes FMTDLN - 32 bit length of the global format data segment (including PCS) FMTHDR - variable length global format segment header STRHLN - 32 bit length of the global literal segment header (0 if not used) STRATR - 32 bit length of the global literal segment attributes (0 if not used) STRDLN - 32 bit length of the global literal data segment (0 if not used, including PCS) STRHDR - variable length global literal segment header Archive-Data (the segment data of the 4 root matrixes) DIRPCS - 32 bit pot checksum calculated over the compressed and encrypted DIRDAT DIRDAT - variable length binary compressed and encrypted data of directory matrix FMTPCS - 32 bit pot checksum calculated over the compressed and encrypted FMTDAT FMTDAT - variable length binary compressed and encrypted data of global format matrix STRPCS - 32 bit pot checksum calculated over the compressed and encrypted STRDAT STRDAT - variable length binary compressed and encrypted data of global literal matrix
The following bit masks are used to interpret the FKMEID:
// high order 8 bit used as bit mask #define FKMEID_BITMASK 0xFF000000U // bit mask #define FKMEID_DATKEYSET 0x01000000U // data key access #define FKMEID_MBRKEYSET 0x02000000U // member key access #define FKMEID_DIRKEYSET 0x04000000U // directory key access #define FKMEID_PROTECTDAT 0x10000000U // data key protection #define FKMEID_PROTECTMBR 0x20000000U // member key protection #define FKMEID_PROTECTDIR 0x40000000U // directory key protection #define FKMEID_CONTINUE 0x80000000U // an additional crypto header will follow // bit 12 to 16 used for the method #define FKMEID_METHOD 0x000F0000U // Continuously numbering (1,2,3,...,15) per variant #define FKMEID_PWD_OTP_PBKDF2_10000_SHA3_512 0x00010000U // key derivation method for passphrases #define FKMEID_PGP_SEIP_AES256_SHA512 0x00010000U // encryption method for PGP // bit 16 to 24 used for flags #define FKMEID_FLAGS 0x0000FF00U // bit mask #define FKMEID_FULDATACS 0x00000100U // full data access possible with this keyset #define FKMEID_FULMBRACS 0x00000200U // full member access possible with this keyset #define FKMEID_FULDIRACS 0x00000400U // full directory access possible with this keyset #define FKMEID_FULACCESS 0x00000800U // full access possible with this keyset #define FKMEID_LIMDATRGT 0x00001000U // limited data rights assigned for this keyset #define FKMEID_LIMMBRRGT 0x00002000U // limited member rights assigned for this keyset #define FKMEID_LIMDIRRGT 0x00004000U // limited directory rights assigned for this keyset #define FKMEID_LIMRIGHTS 0x00008000U // limited key rights assigned for this keyset // low order 4 bit used to identify the key management engine #define FKMEID_VARIANT 0x0000000FU // Continuously numbering (1,2,3,...,15) #define FKMEID_PWD 0x00000001U // passphrase protection used #define FKMEID_PGP 0x00000002U // PGP (RCF4880) protection used
If the length of the original corresponds to the length of the compressed directory segment, then no compression was possible and the data was copied or taken over unchanged.
The segment attributes for the directory, the global format and literal matrix cannot be compressed because they are random values. Therefore, these 16 bytes are only add behind the segment header for MAC calculation.
The segment attributes have a constant length of 32 bytes, whereby the MAC calculation includes the derivation data (second 16 bytes), which means that these segment attributes must always be located directly before the segment data. The segment data it self starts with a 32 bit checksum called PCS. Therefore, they are simply treated as one unit of segment data.
The derivation data part of the attribute field (first 16 byte) contains the header checksum. For re-keying only the first 16 byte archive clear header portion are used for the header checksum. The archive crypto header is not included.
For compression, the self-sufficient buck compression routines are used from the supported standard methods (GZIP, BZIP2, LZ4, ZSTD, ...). See compression code later in this chapter.
Autarkic buffers for the high-quality flag byte, the respective length bytes and the data are created for each pot. The 6 buffers (flags, 4*length and data) are filled for each element assigned to the respective pot. The original length of the flag buffer and the length of the 4 length buffers is determined by the element count of the pot. The data is written directly one after the other in a separate buffer for each pot. The compressed length of this buffers are optional encoded in the variable portion of the header. In the case the compression will result in a expansion or the copy mode is defined, then the corresponding compression flags are not defined and the length of the compressed data is not encoded.
Depending on the maximum length of the individual data elements, only the compressions whose length bytes are not all 0x00 are coded from the 4 length buffers. Which length buffers have been coded is noted in the flag word for the pot. There is another mechanism for saving compression effort here. If all flags are the same, or all lengths are the same or the data is shorter than 128, does not have to be encrypted (ENC) and is also the same, the respective compression buffer is not encoded but the equal value is written to the variable part of the segment header, which is also indicated by the pot flags listed below.
Pot flags:
#define FLMPOT_FLAG_EQUFLG 0x00010000U // all flag bytes are the same #define FLMPOT_FLAG_EQULEN 0x00020000U // all element length are the same #define FLMPOT_FLAG_EQUDAT 0x00040000U // all data elements are equal #define FLMPOT_FLAG_DATLN0 0x00000000U // only the low oder 8 bits of the length are encoded #define FLMPOT_FLAG_DATLN1 0x00000001U // only the low oder 16 bits of the length are encoded #define FLMPOT_FLAG_DATLN2 0x00000002U // only the low oder 24 bits of the length are encoded #define FLMPOT_FLAG_DATLN3 0x00000003U // full 32 bits of the length are encoded (all 4 puffers) #define FLMPOT_FLAG_CMPLN0 0x00000010U // compressed length for the first length buffer is encoded #define FLMPOT_FLAG_CMPLN1 0x00000020U // compressed length for the second length buffer is encoded #define FLMPOT_FLAG_CMPLN2 0x00000040U // compressed length for the third length buffer is encoded #define FLMPOT_FLAG_CMPLN3 0x00000080U // compressed length for the forth length buffer is encoded #define FLMPOT_FLAG_CMPFLG 0x00000100U // compressed length for the flag buffer is encoded #define FLMPOT_FLAG_CMPDAT 0x00000200U // compressed length for the data buffer is encode
If a single buffer expands or is not to be compressed in the first place (CPY), the compressed lengths are not coded, the associated flags are not set, the data is copied in this case and its length corresponds to the length of the original data or the element count.
A special case of compression occurs with stream matrices. Such a matrix is a list of elements where the element type determines the column (as index in the column table). In this case, a type buffer must be written for each element for the entire matrix, whereby the type here takes up a maximum of one byte (8 bits). With a stream matrix, the number of records corresponds to the number of elements and the length of the compressed type buffer is coded at offset 8 at the start of the variable part of the header. If this length is not smaller than the record count, the type buffer would then be expanded and copied.
The algorithms used for compression depend on the respective suite and the corresponding compression mode. In copy mode, the data is simply copied, which always makes sense if the data has no redundancies. In fast (FST) mode, a low compression level is generally used. The dynamic (DYN) mode uses the default medium compression level and the Compact (CPT) mode uses the highest compression level. See chapter below for more information. The Arithmetic (ART) mode is a special mode where BZIP2 with 900KiB blocking is always used so that an arithmetic compression for numbers and certain binary data is available in each suite. This means that in the BZIP2 suite, the CPT mode is identical to the ART mode.
The compression mode can be specified generally for the segment or specifically for each pot or for each of the buffers (flag, lengths, data) of the certain pot. It can make sense to use arithmetic compression for the lengths in particular. However, the default mode in all cases is the dynamic, i.e. medium variant.
All buffers (types, flags, lengths and data) are compressed independently for each segment with an atomic call against the respective suite as described below.
The GZIP suite happens on the zlib and is called as follows with the specified levels.
MODE=FAST -> LEVEL=2 MODE=DYNAMIC -> LEVEL=6 MIDE=COMPACT -> LEVEL=9 MODE=ARITHMETIC -> BZIP2(900KiB)
err = deflateInit2(&stStream, LEVEL, Z_DEFLATED, GZIP_NO_HDR, GZIP_MEM_LEVEL, Z_DEFAULT_STRATEGY); err = deflate(&stStream, Z_FINISH); deflateEnd(&stStream);
The BZIP2 suite happens on the libbzip2 and is called as follows with the specified levels.
MODE=FAST -> LEVEL=1(100KiB) MODE=DYNAMIC -> LEVEL=6(600KiB) MIDE=COMPACT -> LEVEL=9(900KiB) MODE=ARITHMETIC -> BZIP2(900KiB) err = BZ2_bzBuffToBuffCompress((char*)pcOutDat, &ol, (char*)pcInDat, uiInLen, LEVEL, 0, 30);
The LZ4 suite happens on the liblz4 and is called as follows with the specified levels.
MODE=FAST -> LEVEL=16 MODE=DYNAMIC -> LEVEL=7 MIDE=COMPACT -> LEVEL=1 MODE=ARITHMETIC -> BZIP2(900KiB) err = LZ4_compress_fast((const char*)pcInDat, (char*)pcOutDat, uiInLen, uiInLen, LEVEL);
The ZSTD suite happens on the libzstd and is called as follows with the specified levels.
MODE=FAST -> LEVEL=1 MODE=DYNAMIC -> LEVEL=9 MIDE=COMPACT -> LEVEL=19 MODE=ARITHMETIC -> BZIP2(900KiB) err = ZSTD_compress(pcOutDat, ol, pcInDat, uiInLen, LEVEL);
The description of the security mechanisms is not only for the programmer to reproduce but also for the public to check the security mechanisms.
The protection of the segment data is determined by the specification of the segment integrity (INTPROT) and the specification of the segment confidentiality (CRYCONF). In the following, the maximum cryptographic protection is described first (INTPROT=MAC and CRYCONF=ENC). If the confidentiality and integrity are selected lower, the corresponding calculations are omitted.
The segment attributes are divided into derivation data and verification data. The 16 byte long derivation data consists of the header checksum (32 bit CRC or FNV depending on the crypto suite), a 32 bit wide sequence number for each segment, a 32 bit wide random number and the 32 bit wide key test value for the current member-specific data key (4 byte CMAC via the complete key material (64 byte) with a corresponding part from data key). The verification data consists of a 64-bit or 128-bit checksum (CRC or FNV depending on the crypto suite) and a 64-bit MAC if this is calculated (INTPROT=AUTH/MAC). With INTPROT=AUTH, no complete AES-CMAC is calculated over the compressed data, but only the simple checksum is encrypted using the AES_CMAC standard. With INTPROT=CHKS, the CRC32 or FNV32 checksum is expanded to 128 bits. In all other cases, the CRC32 or FNV32 data checksums are expanded to 64 bits. The header checksum and the pot checksum are 32 bits in length and are also calculated using standard CRC32 or FNV32 routines. The use of standard routines for checksums, encryption and integrity protection allows hardware acceleration (e.g. CPACF) to be used where available.
The AES key length (128, 196 or 256 bits) for standard CBC mode encryption and AES-CMAC integrity protection is determined by the crypto suite used. The 16-byte IV and the 16, 24 or 32-byte AES key for CBC encryption and/or AES-CMAC integrity protection are taken from the correspondingly derived 80-byte key material at pre-defined offsets.
Since we work exclusively with self-sufficient segments, all calls to derivation functions, encryption routines, MAC or hash calculation are atomic and no chaining takes place. The protection of the completeness of the segments is simply achieved by the fact that the member matrix above, which manages the data segments of this member, also includes the segment attributes of each data segment of this member with the checksums and MACs in addition to the segment headers, which in turn means that these checksums and MACs are included in the checksum and MAC calculation of the matrix above. The same applies to the director matrix, which includes the segment attributes of the entire member matrices. This has the advantage that only 3 matrices need to be updated when a certain data record is updated.
The basic functions used for the processes described here are described below.
The header check sum (CRC32 or FNV32) is calculated over the complete segment header (static and variable part). The header checksum (32 bits) is written as part of the segment attributes to the derivation data.
CRC32/FNV32(headerLen, header, 4, hcs);
The header checksum (HCS) is part of the derivation data data and is used for simple validation to ensure that the segment header has not been accidentally corrupted. Since the header checksum is included in the key derivation, manipulation would prevent access to the data with cryptographic security, since the wrong derived segment key values would be used.
The key test value is generated on the basis of the respective main key and serves to verify it. It is similar to key derivation and also represents a one-way hash function where the key encrypts itself. For this purpose, the CMAC is calculated in length 4 with the key at offset 24 and the IV at offset 8 over the 64 bytes of the master key, which is shown schematically here.
AESCMAC(16/24/32, mainKey+24, IVLEN, mainKey+8, 64 mainKey, 4, ktv));
The key test pattern is used for early and unambiguous detection that something is wrong with the password or the higher-level key management before the data is accessed. It is also part of the derivation data, which means that it is also specific to the respective master key.
The key derivation uses the derivation data of the segment attributes as IV (16 byte) for the CBC-MODE encryption and encrypts the 64 bytes of the corresponding main key (data, member or directory) and in turn uses the main key from offset 8 as the key in the respective length, which is specified by the crypto suite for the AES key.
AESCBCENC(16/24/32, mainKey+8, 16, drvData, 64, mainKey, drvdKey);
The function above schematically shows the call of the CBC mode encryption with AES. As the key is part of the data here and is therefore self-encrypting, the key derivation is a one-way function. This ensures cryptographically that no conclusions can be drawn about the master key on the basis of the derivation result.
The derivation data is always unique for each segment due to the counter. The random number makes this even more dynamic, and the inclusion of the header checksum means that falsification of the segment header leads to the destruction of the segment key, which guarantees the integrity protection of the header data. The inclusion of the KTV in the derivation data ensures that this value cannot be falsified, thus protecting its integrity.
The segment encoding takes place across all compresses of the respective pots, which are simply concatenated. The lengths of the compressions are coded in the variable part of the segment header. Therefore, all compressions can simply be packed one after the other and encrypted with one call. To do this, however, the segment-specific encryption key must first be derived.
OFB mode is used to encrypt the segment data. For this purpose, the key is taken from the derived key material at offset 0 and the initialization vector at offset 40. The call is shown schematically below.
XOR(drvdKey+0,segNum); XOR(drvdKey+40,segNum); AESOFBENC(16/24/32, drvdKey+0, 16, drvdKey+40, dataLen, data); XOR(drvdKey+0,segNum); XOR(drvdKey+40,segNum);
Encryption in OFB mode is carried out in place and overwrites the data in the same place in the memory so that the original data does not have to be deleted here. OFB mode also has the advantage that no padding is necessary and the total length of the encoded compresses does not change.
The derived segment-specific key material is XOR'ed once again before it is used as a key or IV with the respective segment number at the respective offset in length 4 (in network byte order (big endian)), which is subsequently reversed in order to restore the key material in its original form.
The pot check sum (CRC32 or FNV32) is calculated using the previously encrypted compressions of the individual pots. The pot checksum (32 bits) is written directly before the compressed and, if applicable, encrypted data stream.
CRC32/FNV32(dataLen, data, 4, data-4/*PCS*/);
The pot checksum is part of the segment data and is used for simple validation to ensure that the segment data has not been accidentally corrupted.
The data check sum (CRC32 or FNV32) is calculated simply over the pot checksum (4 byte) and expanded to 8 or 16 byte depending on the integity mode.
CRC32/FNV32(4, data-4/*PCS*/, 8/16, data-12/20/*DCS*/);
The data checksum is part of the segment attributes, which are stored together with the segment header in the higher-level matrix and, like the pot checksum, are simply used to detect random changes to the data.
The cryptographic protection of the integrity of a segment is carried out via an AES CMAC calculation, whereby the key is tapped at offset 32 and the IV at offset 16 of the derived key material (64 bytes). The CMAC is already calculated via the 8-byte long data checksum, which is located directly before the pot checksum, followed by any data that has already been encrypted.
XOR(drvdKey+32,segNum); XOR(drvdKey+16,segNum); AESCMAC(16/24/32, drvdKey+32, 16, drvdKey+16, dataLen+12, data-12/*DCS*/, data-20/*MAC*/); XOR(drvdKey+32,segNum); XOR(drvdKey+16,segNum);
The data checksum (8 bytes) and the pot checksum (4 bytes) are included in the CMAC, which results in a shift of the calculation to 12 bytes. The first half of the resulting MAC is written before the data checksum, which means an offset of -20 from the beginning of the compressions, whereby the first 8 bytes are filled with the second half of the segment attributes.
If INTPROT=AUTH is used instead of INTPROT=MAC, the CMAC calculation is only carried out using the 8-byte long data checksum.
XOR(drvdKey+32,segNum); XOR(drvdKey+16,segNum); AESCMAC(16/24/32, drvdKey+32, 16, drvdKey+16, 8, data-12/*DCS*/, data-20/*MAC*/); XOR(drvdKey+32,segNum); XOR(drvdKey+16,segNum);
This saves the MAC calculation via the compresses and ensures that willful manipulations are recognized, but not via a cryptographic checksum.
For a better understanding, it is important to know that a FLAM segment is encrypted on the one hand and, on the other hand, protected against changes with a key-dependent cryptographic checksum (MAC). Depending on the crypto suite, the necessary keys, ICVs, etc. are determined from 64 bytes (512 bit) of key material and 16 bytes (128 bit) for dynamisation, i.e. a total of 80 bytes.
Furthermore, FLAM works here with the following crypto cores in total.
So the last two protect the meta-data in the archive. Therefore, there are three 80-byte random session keys that must be protected by the respective encryption object of this overlay.
Access to the contents of the archive can be controlled by this 3-partitioning. If you do not have access to any of the 3 keys, you cannot do anything with the archive except store it. With access to the directory key, you can only list the contents. With access to the member key, you can still collect compressed and encrypted result sets for a search against the Bloom filter signature via indexing. And only with access to the data key can the actual content, i.e. the clear data, be accessed.
To ensure that this works accordingly, the data key, the member key and the directory key are protected with a higher-level key encryption key or password.
For searching in the compressed and encrypted archives, a superordinate member key is used, which protects the directory and member key, and finally there can be a key encryption key (KEK) only for the directory key, if one is only to have access to the table of contents. Depending on the procedure, this package of the 3 session keys is signed so that the originator can be authenticated.
This three-way split means that the key management procedures offered here always support one key for accessing the data, one for searching (for the index (member key)) and one for listing the directory content.
When encrypting the session key material, access rights can be additionally restricted using a parameter string. This CLP string is encoded in UTF-8 and encrypted together with the session keys and protected against manipulation, thus providing a cryptographic binding. If such an additional restriction of rights to certain members, formats or columns has been defined, then this user cannot assign any further rights himself and his rights in the archive are limited to the rights bound here. It is the task of the respective FKME implementation to ensure the cryptographic binding of the rights.
The FLAM syntax allows several self-sufficient encrypted key packages to occur one after the other (FKMEID_CONTINUE), which are created by default during implicit or explicit re-keying. The keysets are only overwritten with the explicit specification of KEYMODE=REPLACE and the previous rights are lost in the new version of the archive.
The password encryption of the one, two or three times 80 byte long key material takes place in GCM mode, whereby the FKMEID and the three key lengths are each entered as 32 bit big endian values (together 16 bytes) as header data in the GCM-MAC. The AES 256 bit key for GCM encryption with MAC calculation is derived from the passphrase according to PBKDF2 with a 16 byte random salt over 10000 rounds and SHA3_512 as the hash method. Here, 12 bytes derived as IV, 4 bytes are derived as KTV and 32 bytes derived for the AES key (in total 48 bytes).
The key and IV derived in this way are then used for GCM mode encryption via the 16-byte header, the right string and the key data, whereby the resulting MAC (16 bytes) is noted directly after the SALT and KTV and therefore before the encrypted data. The following diagram shows the structures used:
FKMC -> Header EncDatKey EncMbrKey EncDirKey Header -> FKMEID DATKL MBRKL DIRKL EncXxxKey -> SALT KTV MAC Enc Enc -> RL RS KEY FKMEID - 32 bit - FKME ID DATKL - 32 bit - Encrypted data key package length MBRKL - 32 bit - Encrypted member key package length DIRKL - 32 bit - Encrypted directory key package length SALT - 128 bit - Salt for key derivation (random generated) KTV - 32 bit - Key test value (offset 12 of derived data) MAC - 128 bit - Message authentication code of GCM encryption RL - 32 bit - Length of right string RS - var - Right string in UTF-8 KEY - 240 byte- Data key set - 160 byte- Member key set - 80 byte- Directory key set
PGP encryption with integrity protection and optional signing is carried out per user ID with the Cyphersuite SEIP_AES256_SHA512 in accordance with RFC4880, where the key material described below is treated as a self-sufficient data stream.
EncData -> FKMEID RL RS KEY FKMEID - 32 bit - FKME ID (big endian format) RL - 32 bit - Length of right string (big endian format) RS - var - Right string in UTF-8 KEY - 240 byte- Data key set - 160 byte- Member key set - 80 byte- Directory key set
The encrypted data packets are simply encoded one after the other with their length and the corresponding encrypted data.
Below you can find the FKMEID definitions used.
// high order 8 bit used as bit mask #define FKMEID_BITMASK 0xFF000000U // bit mask #define FKMEID_DATKEYSET 0x01000000U // data key access #define FKMEID_MBRKEYSET 0x02000000U // member key access #define FKMEID_DIRKEYSET 0x04000000U // directory key access #define FKMEID_PROTECTDAT 0x10000000U // data key protection #define FKMEID_PROTECTMBR 0x20000000U // member key protection #define FKMEID_PROTECTDIR 0x40000000U // directory key protection #define FKMEID_CONTINUE 0x80000000U // an additional crypto header will follow #define FKMEID_COMPLETE 0x7FFFFFFFU // bit 12 to 16 used for the method (till 15, FKMEID_ZERO must be adjusted for more) #define FKMEID_METHOD 0x000F0000U // Continuously numbering (1,2,3,...,15) per variant #define FKMEID_PWD_OTP_PBKDF2_10000_SHA3_512 0x00010000U // key derivation method for passphrases #define FKMEID_PGP_SEIP_AES256_SHA512 0x00010000U // encryption method for PGP // bit 16 to 24 used for flags #define FKMEID_FLAGS 0x0000FF00U // bit mask #define FKMEID_FULDATACS 0x00000100U // full data access possible with this key set #define FKMEID_FULMBRACS 0x00000200U // full member access possible with this key set #define FKMEID_FULDIRACS 0x00000400U // full directory access possible with this key set #define FKMEID_FULACCESS 0x00000800U // full access possible with this key set #define FKMEID_LIMDATRGT 0x00001000U // limited data rights assigned for this key set #define FKMEID_LIMMBRRGT 0x00002000U // limited member rights assigned for this key set #define FKMEID_LIMDIRRGT 0x00004000U // limited directory rights assigned for this key set #define FKMEID_LIMRIGHTS 0x00008000U // limited key rights assigned for this key set // low order 4 bit used to identify the key management engine (till 15, FKMEID_ZERO must be adjusted for more) #define FKMEID_VARIANT 0x0000000FU // Continuously numbering (1,2,3,...,15) #define FKMEID_PWD 0x00000001U // passphrase protection used #define FKMEID_PGP 0x00000002U // PGP (RCF4880) protection used
The LIST command displays a list of the encrypted key sets and the respective FKMEID. The DELKEY command can be used to remove these encrypted key sets (FKMCs) from a new version using the respective index or by specifying the FKMEID.