Input to Output Name Mapping

When creating/extracting an archive, converting files or (more generally) doing batch processing, one might wish to automatically generate new filenames based on a pattern applied to the input filenames.

Example: You might want to read a member from a FLAMFILE which was created on a mainframe and contains dataset names. You wish to convert the dataset name into a path-based file name. You can achieve this by defining a pattern for the output file or member name.

Pattern syntax:

An output filename pattern can consist of simple text strings and tokens. Text strings are used as-is as output filename. Tokens are enclosed in square brackets ([]) and describe processing rules applied to the original file or member name. The pattern can consist of an arbitrary number of text strings and tokens (including none at all). A token can consist of one or more processing rules. Multiple processing rules are separated by a pipe character (|). Every processing rule within a token receives the output of the previous rule as its input. This allows chaining of processing rules.

Characters inside tokens can be escaped by a preceding caret (^) to avoid interpretation as pattern syntax. Outside of tokens, carets are treated literally, except in front of squared brackets to treat a bracket literally and not as token delimiter. The keywords of the tokens below are not case sensitive.

This is the list of supported processing rules:

Using regular expressions

Regular expressions (short: regex) can be used to apply arbitrarily complex matching rules and substitute these matches using the syntax [regex:/pattern/replace/modifiers]. They can be chained like any other processing rules using the pipe character, e.g. [regex:/pattern/replace/modifiers|upper]

Regex delimiters

The character immediately following the keyword regex: is a delimiter to separate the pattern, replacement string and optional modifiers. The delimiter can be any single-byte character that does not occur in the pattern or replacement string and is expected exactly three times. You can also use pairs of brackets as delimiters in which case the matching closing bracket is expected to follow an opening bracket. A few examples using different delimiters, all performing the same replacement:

The pattern and replacement strings must not be escaped with carets (^), even though they are inside a token ([]).

Regex patterns

Regular expression patterns in the Perl flavor are matched against the input string to find the substrings that are to be replaced. Writing regular expressions may seem like a daunting task if you never did so before. Understanding the basics is not as complicated as it may seem. You can find a tutorial that will get you started quickly here:
https://www.regular-expressions.info/quickstart.html

Countless other tutorials and documentation for regular expressions can be found on the web. We recommend to consult these ressources to familiarize yourself with regular expressions.

Advanced users can find the complete syntax documentation for writing regular expression patterns by following this URL:
https://www.pcre.org/current/doc/html/pcre2pattern.html

Regex replacement strings

Replacement strings are mostly string literals, except for the dollar character. A dollar character is an escape character that can specify the insertion of characters from capture groups and names from (*MARK) or other control verbs in the pattern. In other words, it can be used to insert parts of the match into the replacement string. Capture groups in a regex pattern are sub-expressions in parentheses. These can be accessed by their numerical index starting at 1 ($1, $2, ...). The capture group accessed by $0 always references the whole match. Capture groups can also be referenced by name. A named group in a regex pattern uses the syntax (?<name>subpattern). This group can then be referenced in the replacement with ${name}.

Regex modifiers

Modifiers change the behavior of regular expression matching and/or replacement. Modifiers are optional and must be placed after the final delimiter. Multiple modifiers can be combined.

Currently, these modifiers are supported:

Examples for using filename patterns:

 infile='/path/to/file.ext'   pattern='text_without_tokens'     outfile='text_without_tokens'
 infile='/path/to/file.ext'   pattern='text^[path^]'            outfile='text[path]'
 infile='/path/to/file.ext'   pattern='[PATH]'                  outfile='/path/to'
 infile='/path/to/file.ext'   pattern='[Name]'                  outfile='file.ext'
 infile='/path/to/file.ext'   pattern='[ext]'                   outfile='ext'
 infile='/path/to/file.ext'   pattern='[UPPER]'                 outfile='/PATH/TO/FILE.EXT'
 infile='/path/to/file.ext'   pattern='[cut5]'                  outfile='/path'
 infile='/path/to/file.ext',  pattern='FILE[ind4]'              outfile='FILE0001'
 infile='/path/to/file.ext',  pattern='FILE[rnd4]'              outfile='FILE0815'
 infile='/path/to/file.ext'   pattern='[/2]'                    outfile='to'
 infile='/path/to/file.ext'   pattern='[/-1]'                   outfile='file.ext'
 infile='/path/to/file.ext'   pattern='[/2-]'                   outfile='to/file.ext'
 infile='/path/to/file.ext'   pattern='[/3-.1]'                 outfile='file'
 infile='/path/to/file.ext'   pattern='[path=directory]'        outfile='/directory/to/file.ext'
 infile='/path/to/file.ext'   pattern='[regex:/\.ext$/.bin/]'   outfile='/directory/to/file.bin'
 infile='/path/to/file.ext'   pattern='[regex:#/([^/]+)$#$1#R]' outfile='file.ext'
 infile='USER.DATA.PDS(MYMEMBER)' pattern='[member]'            outfile='MYMEMBER'
 infile='USER.DATA.PDS(MYMEMBER)' pattern='[.0|lower]/[.1|lower]/[member|lower].[ext|(0|lower]' outfile='user/data/mymember.pds'
 infile='/verylongpath/to/longfilename.ext' pattern='[path|verylong=|/=*.|.1-|upper].[base|cut-8|upper]' outfile='PATH.TO.FILENAME'
 infile='/path/to/*.txt'      pattern='<SYSUID>.GDG(+[ind0])' outfile='USER.GDG(+n)' whereby n starts with 1 and is incremented for each file

The last example shows a wildcard used at read and the different targets should be written into a generation data group as several versions. The file index in minimal amount of digest '[ind0]' could be used to define the next generations '(+[ind0])'.

More information:

Patterns can be used for output file and member names. This mapping mechanism is very powerful. For example, it can be used to compress each directory into a separate FLAMFILE, with each FLAMFILE containing all files of that directory.

 write.flam(file='[path]/?[name]')

If you define an output filename pattern that results in the same path being generated more than once, the second and further occurrences will append the data to previously written files.

If no pattern defined, this means that all matching files on input, will be written to one output file or archive, the append flag is automatically enabled starting with the second file.