Convert EBCDIC binary files from mainframe systems to CSV — on Linux, without a Spark cluster, without Informatica. Output is a ZIP archive containing one CSV per table type. From €59/month.
Extracting data from mainframe binary files is harder than it should be. Every existing approach forces painful compromises.
Hand-rolled scripts handle simple cases fine — until the copybook has COMP-3 packed decimal fields, nested REDEFINES, or OCCURS DEPENDING ON clauses. Then they silently produce wrong output or crash outright. Every new copybook means weeks of rework.
Informatica PowerExchange solves the problem — but it requires a full ETL platform license starting at $24,500 per year, dedicated infrastructure, and weeks of professional services to get a single pipeline running. Completely disproportionate for teams that just need CSV output.
Cobrix is the most widely used open-source alternative, but it runs only inside Apache Spark. Spinning up a Spark cluster for a 500 MB VSAM extract adds infrastructure cost, operational overhead, and latency — with no support for IMS or complex REDEFINES structures.
flbcsv handles the full spectrum of mainframe binary formats — from simple EBCDIC copybooks to complex IMS database unloads — in a single command-line tool. Output is always a ZIP archive containing one CSV file per logical table.
Full parser for COBOL copybook layouts: COMP, COMP-1/2 (HFP), COMP-3 (packed decimal), COMP-4, COMP-5, PIC X/A (EBCDIC strings), PIC 9/S9 DISPLAY, REDEFINES, and OCCURS n TIMES (flat columns). Fixed and variable-length records.
Pass any .pli or .pl1 file as
FORMAT — flbcsv selects the PLI2ROW compiler automatically.
Supported declarations: FIXED BIN(p) (1/2/4/8 byte
two's-complement), FIXED DEC(p[,q]) (IBM packed BCD),
FLOAT DEC(p) (HFP32/64/128), CHAR(n) (EBCDIC
string), CHAR(n) VARYING, BIT(n),
PICTURE (zoned and display), UNION (one CSV
member per alternative), and DIMENSION(n) array expansion.
ALIGNED padding is inserted automatically; FILLER and PTR fields are
silently skipped. Combine with ENDIAN=BIG for z/OS-produced
data on open-world hosts.
COMP-5 is "native endian" by COBOL standard, so z/OS emits big-endian
halfwords while Linux/Windows emit little-endian. flbcsv takes an explicit
ENDIAN={SYSTEM,BIG,LITTLE,LOCAL} parameter to force the byte
order at schema-compile time. Set ENDIAN=BIG when reading a
mainframe unload on Linux — the generated rowspec carries the endian
qualifier into the COMP-5 decoder. Regular COMP / COMP-4 stays fixed
big-endian per COBOL standard, independent of the parameter.
The COMPILE sub-object now supports a CHRSET parameter
that resolves the source character set for PIC X/A string columns
when no explicit CCSID is given.
CHRSET=EBCDIC (default) reads FL_DEFAULT_EBCDIC_CCSID
from the environment (e.g. IBM-1141 on German z/OS);
CHRSET=ASCII reads FL_DEFAULT_ASCII_CCSID.
An explicit CCSID= always takes priority over
CHRSET.
IMS_EOF catch-all rows validate their EOF sentinel bytes via
type.binary(match(...)) instead of silently draining any
leftover record. IMSL checks PCF=X'00' at offset 4 of the
20-byte prefix; IMSS checks the full 4-byte X'00040000' prefix.
Format corruption that would otherwise be hidden by the catch-all is now
surfaced as a hard error — exactly what you want when verifying a
mainframe transfer.
PIC N fields (USAGE NATIONAL) are decoded as UTF-16 Big-Endian and emitted as UTF-8 CSV. PIC G fields (EBCDIC DBCS with SO/SI) are passed through as binary — preserving the raw encoding for downstream CCSID conversion.
A single UNLOAD= keyword selects the unload wrapper around
your copybooks. Available today: NONE (plain fixed-format,
default), IMSL (IMS DFSURGU0 LONG, IBM z/OS — 20-byte prefix,
SEGNAM auto-routing), IMSS (IMS DFSURGU0 SHORT — 4-byte prefix,
fixlen routing) and HDU (IMS HD Unload / DFSURGL0 — physical
database image with 36-byte SEG-CODE prefix, multi-segment routing via
record-length match). Header/trailer records (PCF=X'00', IMS-EOF,
HDU-UNKNOWN) are drained silently via catch-all row spec.
The UNLOAD parameter already reserves keywords for three
Db2 unload formats arriving in Wave 2: DB2U (DSNUTILB
UNLOAD FORMAT UNLOAD — RECFM=VB with 2-byte OBID and inline
X'FF' null indicator), DB2S (DSNTIAUL sample program — RECFM=FB
with grouped X'6F' null indicators at row front), and DB2IX
(PC/IXF self-describing format — no copybook needed, schema embedded in
H/T/C/D records). Plus IMSH for the HP IMS variant.
Today these keywords parse and return a clean "not yet supported" error —
existing CI scripts will work unchanged once the Wave 2 implementation
ships.
Compressed input is auto-detected by magic bytes and decompressed on
the fly before the schema compiler sees the data. Supported today:
gzip, bzip2, xz/lzma,
zstd, lz4, lzip. TERSE/AMATERSE
(the classic IBM z/OS transport format) arrives in Wave 2 via the
bundled CNVTRS converter. No CLP flag to set — uncompressed input
passes through without any overhead, compressed input just works.
Base-encoded or encrypted inputs remain the domain of
flcl conv (separate pre-processing step).
Point flbcsv at a single member of a ZIP or FLAM4 (FL4)
archive using the URL-member syntax
INPUT='archive.zip/?MEMBER_NAME' — no need to extract first.
The archive type is auto-detected from magic bytes (ZIP or FL4), so the
same URL works unchanged against either container. Ideal for bundled
deliveries from mainframe shops: one transportable ZIP produced by a
single JCL step on z/OS replaces dozens of individually-transferred
binaries, while still honouring per-member compression (DEFLATE,
BZIP2, LZMA, ZSTD, or STORED) and record-length hints embedded in the
member filename (.F<lrecl>.bin).
The FMT2TAB infrastructure (Open/Add/Fin/Close pattern, CLPSTR
accumulator, compiler-probe array) is fully abstracted and prepared for
additional source languages. COBOL copybooks (COB2ROW) and
PL/I structure files (PLI2ROW) are in production today.
PLI2ROW compiles DCL/DECLARE blocks with level numbers 1–15:
FIXED BIN/DEC, FLOAT DEC (IBM HFP32/64/128), CHAR(n) [VARYING],
BIT(n), PICTURE, UNION (one row per alternative), DIMENSION expansion,
and ALIGNED padding. Pass any .pli or .pl1
file as FORMAT — the compiler is selected automatically by file extension.
Roadmap: C992ROW (C99 struct / union with
#pragma pack and typedef chains) and
XSD2ROW (XML Schema → tabular conversion for SEPA/XBRL/
SWIFT/ISO 20022). Available now: ASM2ROW (z/OS HLASM DSECT/DS/DC
for legacy messaging APIs) and DFDL2ROW (IBM's Data Format
Description Language).
All output is written into a ZIP archive. When the input contains multiple segment or table types (IMS multi-segment, REDEFINES alternatives), each logical table becomes its own CSV member — cleanly separated, no interleaving.
Converts all FL5-supported EBCDIC codepages to UTF-8 output including IBM-037 (US), IBM-273 (German), IBM-1141 (German/Euro), and 60+ others. CCSID is specified per run; auto-detection is not required.
STYLE object: separator (, ; | TAB SPACE),
quote character (any Unicode codepoint), quoting policy (ALWAYS / IFREQ / NONE),
record delimiter (CRLF / LF), header row on/off, escape handling, format checks.
TRIM: NONE / TRAILING / LEADING / BOTH. COLPATH: FULL hierarchical or LEAF names.
COLSEP: UNDERSCORE / DOT / HYPHEN / NONE. FILLER: COLUMN / ALIGN / SKIP. Fixed
record length override (RECLEN) for non-standard block sizes.
Compile your COBOL copybook once with ROWOUT=schema.tab, then
feed it back thousands of times with ROWIN=schema.tab — skips
the parse + type-resolution stage and starts streaming immediately. Ideal
for CI/CD pipelines, high-throughput batch jobs, and cloud-functions with
a strict cold-start budget.
Wildcard-pattern overrides without re-editing the copybook:
COLTYPE='*-KEY/&binary' forces key fields to raw bytes;
COLTYPE='*-TEXT-*/&IBM-1141' switches a column family to a
different CCSID. Useful for legacy copybooks where PIC X really holds binary
or mixed-encoding data.
Wildcard INPUT supports directory trees with per-pattern
EXCLUDE. After successful conversion, source files can be
REMOVEd or RENAMEd via pattern (e.g. move to
archive/, timestamp stamp) — enabling idempotent pipeline runs
without external glue code.
ZIP sub-object ALGO: DEFLATE (default, universal compatibility),
COPY (stored uncompressed — fastest, inline-ready), BZIP2 (best ratio on text),
ZSTD (best speed/ratio trade-off, Zip64 only), LZMA. LEVEL
NORMAL / FAST / BEST. FORMAT AUTO / Z32 / Z64 — Zip64 extension
support for archives > 4 GiB or > 65k members.
NORUN parses all parameters and validates the schema +
output layout without touching any data — ideal for CI gates and infra-as-code
validation. MAXOCCURS caps OCCURS-expansion so a pathological
copybook (OCCURS 100000) cannot blow up the column matrix.
Read source files or write the output ZIP archive via SSH URLs —
e.g. OUTPUT='ssh://server/export/payroll.zip'.
On z/OS: read DSN locally, push ZIP to remote Linux server via SSH.
Ships as a statically linked executable. No JVM, no Python environment, no shared libraries to manage. Deploy by copying one file to any Linux, z/OS, Windows, AIX, or Solaris server.
PIC U fields (USAGE UTF-8, IBM z/OS extension) are handled natively: the n-byte UTF-8 payload is passed through without re-encoding and emitted directly as UTF-8 CSV. No intermediate EBCDIC conversion required.
COBOL COPY directives are resolved at parse time: the named
member is located in the same directory (or a configured search path) and
inlined into the record layout. COPY REPLACING is rejected with
a clear error — use preprocessed copybooks for that case.
Each CSV member inside the output ZIP is named
[name].[table].csv by default — e.g.
payroll.EMPLOYEE.csv. For IMS multi-segment unloads every
segment type gets its own predictably named member, ready for direct
COPY INTO in Snowflake or DuckDB.
[midN] tokenWhen the same row-spec can fire multiple times per input file
(OCCURS DEPENDING, multiple REDEFINES alternatives, recurring segments),
the optional [midN] pattern token adds a zero-padded
archive-local member counter — yielding unique member names accepted
by strict ISO-21320 readers such as Java ZipFile.
Example: MEMBER='[mid4].[name].[table].csv' emits
0001.payroll.EMP.csv, 0002.payroll.DEPT.csv, …
Row-specs that never match (drain rows, sentinel records) would
normally leave empty members in the archive. The EMPTY
parameter (DELETE by default in bcsv) silently
removes them from the central directory. ZIP(STREAM) enables
sequential-only writes required for z/OS sequential datasets —
no seek back, no rewrite, MVS-DASD friendly.
A direct comparison against the tools data engineers reach for when facing mainframe binary files.
| Tool | Cost | CLI / scriptable | COBOL REDEFINES | IMS support | No Spark needed |
|---|---|---|---|---|---|
| Informatica PowerExchange | ~$2,000/mo | ✗ GUI / server only | ✓ | ✗ | ✗ |
| Cobrix OSS | Free | ✗ API only | partial | ✗ Spark required | ✗ |
| pycobol2csv | Free | ✓ CLI | ✗ incomplete | ✗ | ✗ |
| flbcsv | €59/mo | ✓ CLI | ✓ fully supported | ✓ IMS DFSURGU0 | ✓ no Spark |
Mainframe data integration is not a niche problem. The numbers show a market under pressure to modernize — and running out of easy options.
One tool, multiple platforms. Pay only for the platform you deploy on. Annual subscriptions available with a 15% discount.
x86-64, 32- and 64-bit
Native z/OS and z/OS USS
Power and Solaris platforms
Annual subscription: 15% discount (equivalent to one free month)
Free trial: up to 1 MiB input, no registration required.
Contact us
to get your trial binary.
Answers to the questions we hear most often from data engineers evaluating flbcsv.
FIELD_1, FIELD_2, ..., FIELD_n).
OCCURS DEPENDING ON (variable-length arrays) is not supported in Wave 1 and
requires a full FLAM license.
EMPLOYEE.cbl for segment EMPLOYEE).
Each segment type is written as a separate CSV member inside the output ZIP.
Wave-1 limitation: only LONG/HD format (DFSURGU0-compatible). SHORT/XSHORT
(no segment name in the record) and HALDB are deferred to Wave 2.
[copybook].[table].csv by default (e.g.
payroll.EMPLOYEE.csv). For a single-segment COBOL file the
ZIP contains one CSV member; for IMS multi-segment unloads each segment
type becomes its own predictably named CSV member — ready to extract
individually or pass directly to DuckDB, pandas, or Snowflake's COPY INTO.
The CSV separator (SEP / CSV / TAB /
PSV), quoting, header row, trim mode, and EBCDIC-to-UTF-8
conversion are all configurable via the STYLE() parameter.
Free trial up to 1 MiB. No registration, no credit card.
Request Free Trial