flam4_manual-flam4_utility-input_to_output_name_mapping

Input to Output Name Mapping

When creating/extracting an archive, converting files or (more generally) doing batch processing, one might wish to automatically generate new filenames based on a pattern applied to the input filenames.

Example: You might want to read a member from a FLAMFILE which was created on a mainframe and contains dataset names. You wish to convert the dataset name into a path-based file name. You can achieve this by defining a pattern for the output file or member name.

Pattern syntax:

An output filename pattern can consist of simple text strings and tokens. Text strings are used as-is as output filename. Tokens are enclosed in square brackets ([]) and describe processing rules applied to the original file or member name. The pattern can consist of an arbitrary number of text strings and tokens (including none at all). A token can consist of one or more processing rules. Multiple processing rules are separated by a pipe character (|). Every processing rule within a token receives the output of the previous rule as its input. This allows chaining of processing rules.

Characters inside tokens can be escaped by a preceding caret (^) to avoid interpretation as pattern syntax. Outside of tokens, carets are treated literally, except in front of squared brackets to treat a bracket literally and not as token delimiter. The keywords of the tokens below are not case sensitive.

This is the list of supported processing rules:

[path] - Extracts the string before the last slash or backslash, i.e. the path without trailing (back)slash
[name] - Extracts the part after the last (back)slash, i.e. the filename
[base] - Extracts the part between the last (back)slash and the last dot, i.e. the filename without extension
[ext] - Extracts the part after the last dot, i.e. the file extension
[member] - Extracts a string enclosed in round brackets from the filename part of the string, e.g. a PDS member name on z/OS
[copy] - Simply copies the input string (useful if you only want to add an extension)
[upper] - Converts all characters to upper case
[lower] - Converts all characters to lower case
[title] - Converts the first character to upper case, all others to lower case
[cutX] - Cuts off the string after the X-th character, where X is a positive or negative number indicating how many characters to keep, counting from the front (positive) or back (negative)
[indN] - Inserts the current index as decimal number of at least length N (padded with preceding zeros) N is any number between 0 and 9, where 0 results in variable length numbers with their shortest possible representation (no preceding zeros)
[rndN] - Inserts random numbers of length N (with preceding zeros) N is any number between 0 and 9, where 0 results in variable length numbers (no preceding zeros)
[AX] - A is a place holder for a single-byte character, X is a positive or negative number; Extracts the string part between the X-th occurrence of the specified character and the next occurrence of that same character. If X is negative, search starts at the back and in reverse. If X is 0, the part before the first occurrence of A is extracted.
[AX-] - A is a placeholder for a single-byte character, X is a positive or negative number; Extracts the string part after the X-th occurrence of the specified character. If X is negative, search starts at the back and in reverse. If X is 0, the input string is copied.
[AX-BY] - A and B are placeholders for a single-byte character, X and Y are positive or negative numbers; Extracts the string part between the X-th occurrence of A and (relative to that occurrence) the Y-th occurrence of B. Like above, negative numbers mean counting from the back.
[search=replace] - Replaces the first occurrence of the string 'search' with the string 'replace'
[search=*replace] - Replaces all occurrences of the string 'search' with the string 'replace'
[regex:/pattern/replace/modifiers] - String replacement with Perl-compatible regular expressions
[table] - Evaluates to the current table name if end of table handling is activated (only when reading tables)

Using regular expressions

Regular expressions (short: regex) can be used to apply arbitrarily complex matching rules and substitute these matches using the syntax [regex:/pattern/replace/modifiers]. They can be chained like any other processing rules using the pipe character, e.g. [regex:/pattern/replace/modifiers|upper]

Regex delimiters

The character immediately following the keyword regex: is a delimiter to separate the pattern, replacement string and optional modifiers. The delimiter can be any single-byte character that does not occur in the pattern or replacement string and is expected exactly three times. You can also use pairs of brackets as delimiters in which case the matching closing bracket is expected to follow an opening bracket. A few examples using different delimiters, all performing the same replacement:

[regex:/pattern/replace/modifiers]
[regex:#pattern#replace#modifiers]
[regex:{pattern}{replace}modifiers]

The pattern and replacement strings must not be escaped with carets (^), even though they are inside a token ([]).

Regex patterns

Regular expression patterns in the Perl flavor are matched against the input string to find the substrings that are to be replaced. Writing regular expressions may seem like a daunting task if you never did so before. Understanding the basics is not as complicated as it may seem. You can find a tutorial that will get you started quickly here:
https://www.regular-expressions.info/quickstart.html

Countless other tutorials and documentation for regular expressions can be found on the web. We recommend to consult these ressources to familiarize yourself with regular expressions.

Advanced users can find the complete syntax documentation for writing regular expression patterns by following this URL:
https://www.pcre.org/current/doc/html/pcre2pattern.html

Regex replacement strings

Replacement strings are mostly string literals, except for the dollar character. A dollar character is an escape character that can specify the insertion of characters from capture groups and names from (*MARK) or other control verbs in the pattern. In other words, it can be used to insert parts of the match into the replacement string. Capture groups in a regex pattern are sub-expressions in parentheses. These can be accessed by their numerical index starting at 1 ($1, $2, ...). The capture group accessed by $0 always references the whole match. Capture groups can also be referenced by name. A named group in a regex pattern uses the syntax (?<name>subpattern). This group can then be referenced in the replacement with ${name}.

Regex modifiers

Modifiers change the behavior of regular expression matching and/or replacement. Modifiers are optional and must be placed after the final delimiter. Multiple modifiers can be combined.

Currently, these modifiers are supported:

g: Enables global matching. The regular expression is applied repeatedly and all matches are replaced. By default, regular expression matching stops after the first match has been replaced.
E: Fail with an error if the regular expression does not produce any match. By default, the input string remains unchanged if there is no match.
R: Instead of replacing matches in the input string, only the replacement string is returned.
i: Enables case insensitive matching.
m: Enables multi-line matching. By default ^ and $ only match at the start/end of the entire input string. With multi-line matching, they also match after/before newlines. Note, however, that unless the s modifier is set, the "any character" metacharacter (.) does not match at a newline.
s: If set, a dot metacharacter in the pattern matches any character, including one that indicates a newline. However, it only ever matches one character, even if newlines are coded as CRLF. By default, a dot does not match at a newline.
x: Enables extended matching mode. Most whitespace characters in the pattern are ignored except when escaped or inside a character class. Characters between an unescaped # outside a character class and the next newline, inclusive, are also ignored, which makes it possible to include comments inside complicated patterns.
n: Disables numbered capturing groups (inside parentheses) in the pattern. Any opening parenthesis that is not followed by ? behaves as if it were followed by ?: but named parentheses can still be used for capturing (and they acquire numbers in the usual way).

Examples for using filename patterns:

 infile='/path/to/file.ext'   pattern='text_without_tokens'     outfile='text_without_tokens'
 infile='/path/to/file.ext'   pattern='text^[path^]'            outfile='text[path]'
 infile='/path/to/file.ext'   pattern='[PATH]'                  outfile='/path/to'
 infile='/path/to/file.ext'   pattern='[Name]'                  outfile='file.ext'
 infile='/path/to/file.ext'   pattern='[ext]'                   outfile='ext'
 infile='/path/to/file.ext'   pattern='[UPPER]'                 outfile='/PATH/TO/FILE.EXT'
 infile='/path/to/file.ext'   pattern='[cut5]'                  outfile='/path'
 infile='/path/to/file.ext',  pattern='FILE[ind4]'              outfile='FILE0001'
 infile='/path/to/file.ext',  pattern='FILE[rnd4]'              outfile='FILE0815'
 infile='/path/to/file.ext'   pattern='[/2]'                    outfile='to'
 infile='/path/to/file.ext'   pattern='[/-1]'                   outfile='file.ext'
 infile='/path/to/file.ext'   pattern='[/2-]'                   outfile='to/file.ext'
 infile='/path/to/file.ext'   pattern='[/3-.1]'                 outfile='file'
 infile='/path/to/file.ext'   pattern='[path=directory]'        outfile='/directory/to/file.ext'
 infile='/path/to/file.ext'   pattern='[regex:/\.ext$/.bin/]'   outfile='/directory/to/file.bin'
 infile='/path/to/file.ext'   pattern='[regex:#/([^/]+)$#$1#R]' outfile='file.ext'
 infile='USER.DATA.PDS(MYMEMBER)' pattern='[member]'            outfile='MYMEMBER'
 infile='USER.DATA.PDS(MYMEMBER)' pattern='[.0|lower]/[.1|lower]/[member|lower].[ext|(0|lower]' outfile='user/data/mymember.pds'
 infile='/verylongpath/to/longfilename.ext' pattern='[path|verylong=|/=*.|.1-|upper].[base|cut-8|upper]' outfile='PATH.TO.FILENAME'
 infile='/path/to/*.txt'      pattern='<SYSUID>.GDG(+[ind0])' outfile='USER.GDG(+n)' whereby n starts with 1 and is incremented for each file

The last example shows a wildcard used at read and the different targets should be written into a generation data group as several versions. The file index in minimal amount of digest '[ind0]' could be used to define the next generations '(+[ind0])'.

More information:

Patterns can be used for output file and member names. This mapping mechanism is very powerful. For example, it can be used to compress each directory into a separate FLAMFILE, with each FLAMFILE containing all files of that directory.

 write.flam(file='[path]/?[name]')

If you define an output filename pattern that results in the same path being generated more than once, the second and further occurrences will append the data to previously written files.

If no pattern defined, this means that all matching files on input, will be written to one output file or archive, the append flag is automatically enabled starting with the second file.