When creating/extracting an archive, converting files or (more generally) doing batch processing, one might wish to automatically generate new filenames based on a pattern applied to the input filenames.
Example: You might want to read a member from a FLAMFILE which was created on a mainframe and contains dataset names. You wish to convert the dataset name into a path-based file name. You can achieve this by defining a pattern for the output file or member name.
An output filename pattern can consist of simple text strings and
tokens. Text strings are used as-is as output filename. Tokens are
enclosed in square brackets ([]
) and describe processing rules applied
to the original file or member name. The pattern can consist of an
arbitrary number of text strings and tokens (including none at all).
A token can consist of one or more processing rules. Multiple processing
rules are separated by a pipe character (|
). Every processing rule
within a token receives the output of the previous rule as its input.
This allows chaining of processing rules.
Characters inside tokens can be escaped by a preceding caret (^
)
to avoid interpretation as pattern syntax. Outside of tokens, carets
are treated literally, except in front of squared brackets to treat a
bracket literally and not as token delimiter. The keywords of the tokens
below are not case sensitive.
This is the list of supported processing rules:
Regular expressions (short: regex) can be used to apply arbitrarily
complex matching rules and substitute these matches using the syntax
[regex:/pattern/replace/modifiers]
. They can be chained like any other
processing rules using the pipe character, e.g.
[regex:/pattern/replace/modifiers|upper]
The character immediately following the keyword regex:
is a delimiter
to separate the pattern, replacement string and optional modifiers. The
delimiter can be any single-byte character that does not occur in the
pattern or replacement string and is expected exactly three times. You
can also use pairs of brackets as delimiters in which case the matching
closing bracket is expected to follow an opening bracket. A few examples
using different delimiters, all performing the same replacement:
[regex:/pattern/replace/modifiers]
[regex:#pattern#replace#modifiers]
[regex:{pattern}{replace}modifiers]
The pattern and replacement strings must not be escaped with carets
(^
), even though they are inside a token ([]
).
Regular expression patterns in the Perl flavor are matched against the
input string to find the substrings that are to be replaced. Writing
regular expressions may seem like a daunting task if you never did so
before. Understanding the basics is not as complicated as it may seem.
You can find a tutorial that will get you started quickly here:
https://www.regular-expressions.info/quickstart.html
Countless other tutorials and documentation for regular expressions can be found on the web. We recommend to consult these ressources to familiarize yourself with regular expressions.
Advanced users can find the complete syntax documentation for writing
regular expression patterns by following this URL:
https://www.pcre.org/current/doc/html/pcre2pattern.html
Replacement strings are mostly string literals, except for the dollar
character. A dollar character is an escape character that can specify
the insertion of characters from capture groups and names from (*MARK)
or other control verbs in the pattern. In other words, it can be used to
insert parts of the match into the replacement string. Capture groups in
a regex pattern are sub-expressions in parentheses. These can be accessed
by their numerical index starting at 1 ($1
, $2
, ...). The capture
group accessed by $0
always references the whole match. Capture groups
can also be referenced by name. A named group in a regex pattern uses the
syntax (?<name>subpattern)
. This group can then be referenced in the
replacement with ${name}
.
Modifiers change the behavior of regular expression matching and/or replacement. Modifiers are optional and must be placed after the final delimiter. Multiple modifiers can be combined.
Currently, these modifiers are supported:
g
: Enables global matching. The regular expression is applied
repeatedly and all matches are replaced. By default, regular
expression matching stops after the first match has been replaced.E
: Fail with an error if the regular expression does not produce any
match. By default, the input string remains unchanged if there
is no match.R
: Instead of replacing matches in the input string, only the
replacement string is returned.i
: Enables case insensitive matching.m
: Enables multi-line matching. By default ^
and $
only match at
the start/end of the entire input string. With multi-line matching,
they also match after/before newlines. Note, however, that unless
the s
modifier is set, the "any character" metacharacter (.
)
does not match at a newline.s
: If set, a dot metacharacter in the pattern matches any character,
including one that indicates a newline. However, it only ever
matches one character, even if newlines are coded as CRLF.
By default, a dot does not match at a newline.x
: Enables extended matching mode. Most whitespace characters in
the pattern are ignored except when escaped or inside a character
class. Characters between an unescaped #
outside a character
class and the next newline, inclusive, are also ignored, which
makes it possible to include comments inside complicated patterns.n
: Disables numbered capturing groups (inside parentheses) in the
pattern. Any opening parenthesis that is not followed by ?
behaves as if it were followed by ?:
but named parentheses can
still be used for capturing (and they acquire numbers in the
usual way).infile='/path/to/file.ext' pattern='text_without_tokens' outfile='text_without_tokens' infile='/path/to/file.ext' pattern='text^[path^]' outfile='text[path]' infile='/path/to/file.ext' pattern='[PATH]' outfile='/path/to' infile='/path/to/file.ext' pattern='[Name]' outfile='file.ext' infile='/path/to/file.ext' pattern='[ext]' outfile='ext' infile='/path/to/file.ext' pattern='[UPPER]' outfile='/PATH/TO/FILE.EXT' infile='/path/to/file.ext' pattern='[cut5]' outfile='/path' infile='/path/to/file.ext', pattern='FILE[ind4]' outfile='FILE0001' infile='/path/to/file.ext', pattern='FILE[rnd4]' outfile='FILE0815' infile='/path/to/file.ext' pattern='[/2]' outfile='to' infile='/path/to/file.ext' pattern='[/-1]' outfile='file.ext' infile='/path/to/file.ext' pattern='[/2-]' outfile='to/file.ext' infile='/path/to/file.ext' pattern='[/3-.1]' outfile='file' infile='/path/to/file.ext' pattern='[path=directory]' outfile='/directory/to/file.ext' infile='/path/to/file.ext' pattern='[regex:/\.ext$/.bin/]' outfile='/directory/to/file.bin' infile='/path/to/file.ext' pattern='[regex:#/([^/]+)$#$1#R]' outfile='file.ext' infile='USER.DATA.PDS(MYMEMBER)' pattern='[member]' outfile='MYMEMBER' infile='USER.DATA.PDS(MYMEMBER)' pattern='[.0|lower]/[.1|lower]/[member|lower].[ext|(0|lower]' outfile='user/data/mymember.pds' infile='/verylongpath/to/longfilename.ext' pattern='[path|verylong=|/=*.|.1-|upper].[base|cut-8|upper]' outfile='PATH.TO.FILENAME' infile='/path/to/*.txt' pattern='<SYSUID>.GDG(+[ind0])' outfile='USER.GDG(+n)' whereby n starts with 1 and is incremented for each file
The last example shows a wildcard used at read and the different targets should be written into a generation data group as several versions. The file index in minimal amount of digest '[ind0]' could be used to define the next generations '(+[ind0])'.
Patterns can be used for output file and member names. This mapping mechanism is very powerful. For example, it can be used to compress each directory into a separate FLAMFILE, with each FLAMFILE containing all files of that directory.
write.flam(file='[path]/?[name]')
If you define an output filename pattern that results in the same path being generated more than once, the second and further occurrences will append the data to previously written files.
If no pattern defined, this means that all matching files on input, will be written to one output file or archive, the append flag is automatically enabled starting with the second file.