Everything you need, nothing you don't

flbcsv handles the full spectrum of mainframe binary formats — from simple EBCDIC copybooks to complex IMS database unloads — in a single command-line tool. Output is always a ZIP archive containing one CSV file per logical table.

📋

COBOL Copybook parser (all COMP types)

Full parser for COBOL copybook layouts: COMP, COMP-1/2 (HFP), COMP-3 (packed decimal), COMP-4, COMP-5, PIC X/A (EBCDIC strings), PIC 9/S9 DISPLAY, REDEFINES, and OCCURS n TIMES (flat columns). Fixed and variable-length records.

📋

PL/I DECLARE compiler (PLI2ROW)

Pass any .pli or .pl1 file as FORMAT — flbcsv selects the PLI2ROW compiler automatically. Supported declarations: FIXED BIN(p) (1/2/4/8 byte two's-complement), FIXED DEC(p[,q]) (IBM packed BCD), FLOAT DEC(p) (HFP32/64/128), CHAR(n) (EBCDIC string), CHAR(n) VARYING, BIT(n), PICTURE (zoned and display), UNION (one CSV member per alternative), and DIMENSION(n) array expansion. ALIGNED padding is inserted automatically; FILLER and PTR fields are silently skipped. Combine with ENDIAN=BIG for z/OS-produced data on open-world hosts.

🔁

COMP-5 on cross-platform hosts — ENDIAN parameter

COMP-5 is "native endian" by COBOL standard, so z/OS emits big-endian halfwords while Linux/Windows emit little-endian. flbcsv takes an explicit ENDIAN={SYSTEM,BIG,LITTLE,LOCAL} parameter to force the byte order at schema-compile time. Set ENDIAN=BIG when reading a mainframe unload on Linux — the generated rowspec carries the endian qualifier into the COMP-5 decoder. Regular COMP / COMP-4 stays fixed big-endian per COBOL standard, independent of the parameter.

🌍

CHRSET — automatic CCSID resolution for string columns

The COMPILE sub-object now supports a CHRSET parameter that resolves the source character set for PIC X/A string columns when no explicit CCSID is given. CHRSET=EBCDIC (default) reads FL_DEFAULT_EBCDIC_CCSID from the environment (e.g. IBM-1141 on German z/OS); CHRSET=ASCII reads FL_DEFAULT_ASCII_CCSID. An explicit CCSID= always takes priority over CHRSET.

🛡️

Precise EOF sentinel matching

IMS_EOF catch-all rows validate their EOF sentinel bytes via type.binary(match(...)) instead of silently draining any leftover record. IMSL checks PCF=X'00' at offset 4 of the 20-byte prefix; IMSS checks the full 4-byte X'00040000' prefix. Format corruption that would otherwise be hidden by the catch-all is now surfaced as a hard error — exactly what you want when verifying a mainframe transfer.

🌐

Unicode: PIC N (National/UCS-2) + PIC G (DBCS)

PIC N fields (USAGE NATIONAL) are decoded as UTF-16 Big-Endian and emitted as UTF-8 CSV. PIC G fields (EBCDIC DBCS with SO/SI) are passed through as binary — preserving the raw encoding for downstream CCSID conversion.

🧭

UNLOAD parameter — mainframe unload formats

A single UNLOAD= keyword selects the unload wrapper around your copybooks. Available today: NONE (plain fixed-format, default), IMSL (IMS DFSURGU0 LONG, IBM z/OS — 20-byte prefix, SEGNAM auto-routing), IMSS (IMS DFSURGU0 SHORT — 4-byte prefix, fixlen routing) and HDU (IMS HD Unload / DFSURGL0 — physical database image with 36-byte SEG-CODE prefix, multi-segment routing via record-length match). Header/trailer records (PCF=X'00', IMS-EOF, HDU-UNKNOWN) are drained silently via catch-all row spec.

🗃️

Db2 unload formats — Wave 2 roadmap

The UNLOAD parameter already reserves keywords for three Db2 unload formats arriving in Wave 2: DB2U (DSNUTILB UNLOAD FORMAT UNLOAD — RECFM=VB with 2-byte OBID and inline X'FF' null indicator), DB2S (DSNTIAUL sample program — RECFM=FB with grouped X'6F' null indicators at row front), and DB2IX (PC/IXF self-describing format — no copybook needed, schema embedded in H/T/C/D records). Plus IMSH for the HP IMS variant. Today these keywords parse and return a clean "not yet supported" error — existing CI scripts will work unchanged once the Wave 2 implementation ships.

📥

Transparent decompression — no flag, no surprise

Compressed input is auto-detected by magic bytes and decompressed on the fly before the schema compiler sees the data. Supported today: gzip, bzip2, xz/lzma, zstd, lz4, lzip. TERSE/AMATERSE (the classic IBM z/OS transport format) arrives in Wave 2 via the bundled CNVTRS converter. No CLP flag to set — uncompressed input passes through without any overhead, compressed input just works. Base-encoded or encrypted inputs remain the domain of flcl conv (separate pre-processing step).

🗂️

Read directly from ZIP or FLAM archive members

Point flbcsv at a single member of a ZIP or FLAM4 (FL4) archive using the URL-member syntax INPUT='archive.zip/?MEMBER_NAME' — no need to extract first. The archive type is auto-detected from magic bytes (ZIP or FL4), so the same URL works unchanged against either container. Ideal for bundled deliveries from mainframe shops: one transportable ZIP produced by a single JCL step on z/OS replaces dozens of individually-transferred binaries, while still honouring per-member compression (DEFLATE, BZIP2, LZMA, ZSTD, or STORED) and record-length hints embedded in the member filename (.F<lrecl>.bin).

🧰

Multi-language schema compilers — COBOL · PL/1 · HLASM live · C in progress · XSD · DFDL planned

The FMT2TAB infrastructure (Open/Add/Fin/Close pattern, CLPSTR accumulator, compiler-probe array) is fully abstracted and prepared for additional source languages. COBOL copybooks (COB2ROW) and PL/I structure files (PLI2ROW) are in production today. PLI2ROW compiles DCL/DECLARE blocks with level numbers 1–15: FIXED BIN/DEC, FLOAT DEC (IBM HFP32/64/128), CHAR(n) [VARYING], BIT(n), PICTURE, UNION (one row per alternative), DIMENSION expansion, and ALIGNED padding. Pass any .pli or .pl1 file as FORMAT — the compiler is selected automatically by file extension. Roadmap: C992ROW (C99 struct / union with #pragma pack and typedef chains) and XSD2ROW (XML Schema → tabular conversion for SEPA/XBRL/ SWIFT/ISO 20022). Available now: ASM2ROW (z/OS HLASM DSECT/DS/DC for legacy messaging APIs) and DFDL2ROW (IBM's Data Format Description Language).

📦

ZIP archive output (one CSV per table)

All output is written into a ZIP archive. When the input contains multiple segment or table types (IMS multi-segment, REDEFINES alternatives), each logical table becomes its own CSV member — cleanly separated, no interleaving.

🔤

EBCDIC to UTF-8 (64+ CCSIDs)

Converts all FL5-supported EBCDIC codepages to UTF-8 output including IBM-037 (US), IBM-273 (German), IBM-1141 (German/Euro), and 60+ others. CCSID is specified per run; auto-detection is not required.

⚙️

RFC-4180 CSV with full formatting control

STYLE object: separator (, ; | TAB SPACE), quote character (any Unicode codepoint), quoting policy (ALWAYS / IFREQ / NONE), record delimiter (CRLF / LF), header row on/off, escape handling, format checks. TRIM: NONE / TRAILING / LEADING / BOTH. COLPATH: FULL hierarchical or LEAF names. COLSEP: UNDERSCORE / DOT / HYPHEN / NONE. FILLER: COLUMN / ALIGN / SKIP. Fixed record length override (RECLEN) for non-standard block sizes.

⚡

Pre-compiled schema cache (ROWIN / ROWOUT)

Compile your COBOL copybook once with ROWOUT=schema.tab, then feed it back thousands of times with ROWIN=schema.tab — skips the parse + type-resolution stage and starts streaming immediately. Ideal for CI/CD pipelines, high-throughput batch jobs, and cloud-functions with a strict cold-start budget.

🎯

Per-field type override (COLTYPE)

Wildcard-pattern overrides without re-editing the copybook: COLTYPE='*-KEY/&binary' forces key fields to raw bytes; COLTYPE='*-TEXT-*/&IBM-1141' switches a column family to a different CCSID. Useful for legacy copybooks where PIC X really holds binary or mixed-encoding data.

🔁

Directory-walk, EXCLUDE, REMOVE, RENAME

Wildcard INPUT supports directory trees with per-pattern EXCLUDE. After successful conversion, source files can be REMOVEd or RENAMEd via pattern (e.g. move to archive/, timestamp stamp) — enabling idempotent pipeline runs without external glue code.

📦

Five compression algorithms in ZIP

ZIP sub-object ALGO: DEFLATE (default, universal compatibility), COPY (stored uncompressed — fastest, inline-ready), BZIP2 (best ratio on text), ZSTD (best speed/ratio trade-off, Zip64 only), LZMA. LEVEL NORMAL / FAST / BEST. FORMAT AUTO / Z32 / Z64 — Zip64 extension support for archives > 4 GiB or > 65k members.

✅

NORUN preflight + MAXOCCURS safety cap

NORUN parses all parameters and validates the schema + output layout without touching any data — ideal for CI gates and infra-as-code validation. MAXOCCURS caps OCCURS-expansion so a pathological copybook (OCCURS 100000) cannot blow up the column matrix.

🔗

SSH-URL input/output

Read source files or write the output ZIP archive via SSH URLs — e.g. OUTPUT='ssh://server/export/payroll.zip'. On z/OS: read DSN locally, push ZIP to remote Linux server via SSH.

🏎️

Single binary, no runtime dependencies

Ships as a statically linked executable. No JVM, no Python environment, no shared libraries to manage. Deploy by copying one file to any Linux, z/OS, Windows, AIX, or Solaris server.

🔡

PIC U — native UTF-8 strings

PIC U fields (USAGE UTF-8, IBM z/OS extension) are handled natively: the n-byte UTF-8 payload is passed through without re-encoding and emitted directly as UTF-8 CSV. No intermediate EBCDIC conversion required.

📂

COPY directives — copybook includes

COBOL COPY directives are resolved at parse time: the named member is located in the same directory (or a configured search path) and inlined into the record layout. COPY REPLACING is rejected with a clear error — use preprocessed copybooks for that case.

🗂️

Automatic ZIP member naming

Each CSV member inside the output ZIP is named [name].[table].csv by default — e.g. payroll.EMPLOYEE.csv. For IMS multi-segment unloads every segment type gets its own predictably named member, ready for direct COPY INTO in Snowflake or DuckDB.

🔐

ISO/IEC 21320-1 conformance via `[midN]` token

When the same row-spec can fire multiple times per input file (OCCURS DEPENDING, multiple REDEFINES alternatives, recurring segments), the optional [midN] pattern token adds a zero-padded archive-local member counter — yielding unique member names accepted by strict ISO-21320 readers such as Java ZipFile. Example: MEMBER='[mid4].[name].[table].csv' emits 0001.payroll.EMP.csv, 0002.payroll.DEPT.csv, …

📭

Empty-member policy + stream mode

Row-specs that never match (drain rows, sentinel records) would normally leave empty members in the archive. The EMPTY parameter (DELETE by default in bcsv) silently removes them from the central directory. ZIP(STREAM) enables sequential-only writes required for z/OS sequential datasets — no seek back, no rewrite, MVS-DASD friendly.

Frequently asked questions

Answers to the questions we hear most often from data engineers evaluating flbcsv.

Does flbcsv support REDEFINES and OCCURS?

REDEFINES is fully supported: each REDEFINES alternative generates its own row specification, and the correct variant is selected at runtime based on the data value of a discriminator field. OCCURS n TIMES (fixed-count arrays) is also fully supported — each occurrence becomes a separate column (e.g. FIELD_1, FIELD_2, ..., FIELD_n). OCCURS DEPENDING ON (variable-length arrays) is not supported in Wave 1 and requires a full FLAM license.

Do I need a mainframe or Spark cluster to use flbcsv?

No. flbcsv runs as a standard command-line binary on Linux x86-64, Windows, AIX, or Solaris — no mainframe connection, no Spark, no JVM, no Python environment. You only need the binary file exported from the mainframe (via FTP, NFS, or any file transfer method) and the COBOL copybook that describes its layout. The tool handles everything else locally.

What is COMP-3 / packed decimal?

COMP-3 (also called packed decimal or BCD) is a binary encoding used by COBOL for numeric fields. Each decimal digit is stored in 4 bits, with the sign in the last 4 bits of the final byte. A field defined as PIC S9(7)V99 COMP-3 occupies 5 bytes on disk but represents a signed 9-digit number with 2 decimal places. Misinterpreting COMP-3 fields as raw bytes is one of the most common causes of garbage output from home-grown parsers. flbcsv decodes all COMP variants (COMP through COMP-5) correctly.

Can flbcsv handle IMS database unload files?

Yes. flbcsv reads IMS DFSURGU0 HD unload files and auto-detects segment types from the standard unload prefix (bytes 6–13 = 8-character segment name). Each segment type is routed to the COBOL copybook whose filename matches the segment name (e.g. EMPLOYEE.cbl for segment EMPLOYEE). Each segment type is written as a separate CSV member inside the output ZIP. Wave-1 limitation: only LONG/HD format (DFSURGU0-compatible). SHORT/XSHORT (no segment name in the record) and HALDB are deferred to Wave 2.

What is the difference between flbcsv and Cobrix?

Cobrix is an open-source Spark datasource library written in Scala. It reads COBOL copybook files inside a Spark job — which means you need a running Spark cluster and a JVM to use it at all. Cobrix does not support IMS DFSURGU0 unloads, and its REDEFINES support is incomplete for complex nested structures. flbcsv is a standalone command-line binary with no runtime dependencies. It runs on Linux, z/OS, Windows, AIX, and Solaris. It handles all COMP types, REDEFINES, OCCURS n TIMES, and IMS DFSURGU0 HD unloads without any infrastructure requirements.

What is the output format?

flbcsv always produces a ZIP archive as its output. Each CSV member is named [copybook].[table].csv by default (e.g. payroll.EMPLOYEE.csv). For a single-segment COBOL file the ZIP contains one CSV member; for IMS multi-segment unloads each segment type becomes its own predictably named CSV member — ready to extract individually or pass directly to DuckDB, pandas, or Snowflake's COPY INTO. The CSV separator (SEP / CSV / TAB / PSV), quoting, header row, trim mode, and EBCDIC-to-UTF-8 conversion are all configurable via the STYLE() parameter.

Tool	Cost	CLI / scriptable	COBOL REDEFINES	IMS support	No Spark needed
Informatica PowerExchange	~$2,000/mo	✗ GUI / server only	✓	✗	✗
Cobrix OSS	Free	✗ API only	partial	✗ Spark required	✗
pycobol2csv	Free	✓ CLI	✗ incomplete	✗	✗
flbcsv	€59/mo	✓ CLI	✓ fully supported	✓ IMS DFSURGU0	✓ no Spark

COBOL Copybook to CSV Converter

The Data Engineer's Nightmare

Custom Python breaks at COMP-3 and REDEFINES

Informatica costs $24,500+/year

Cobrix requires a Spark cluster

Everything you need, nothing you don't

COBOL Copybook parser (all COMP types)

PL/I DECLARE compiler (PLI2ROW)

COMP-5 on cross-platform hosts — ENDIAN parameter

CHRSET — automatic CCSID resolution for string columns

Precise EOF sentinel matching

Unicode: PIC N (National/UCS-2) + PIC G (DBCS)

UNLOAD parameter — mainframe unload formats

Db2 unload formats — Wave 2 roadmap

Transparent decompression — no flag, no surprise

Read directly from ZIP or FLAM archive members

Multi-language schema compilers — COBOL · PL/1 · HLASM live · C in progress · XSD · DFDL planned

ZIP archive output (one CSV per table)

EBCDIC to UTF-8 (64+ CCSIDs)

RFC-4180 CSV with full formatting control

Pre-compiled schema cache (ROWIN / ROWOUT)

Per-field type override (COLTYPE)

Directory-walk, EXCLUDE, REMOVE, RENAME

Five compression algorithms in ZIP

NORUN preflight + MAXOCCURS safety cap

SSH-URL input/output

Single binary, no runtime dependencies

PIC U — native UTF-8 strings

COPY directives — copybook includes

Automatic ZIP member naming

ISO/IEC 21320-1 conformance via `[midN]` token

Empty-member policy + stream mode

How does flbcsv compare?

Why now?

Simple, transparent pricing

Linux / Windows

z/OS LPAR

AIX / SPARC

Frequently asked questions

Start converting mainframe data today

COBOL Copybook to CSV Converter

The Data Engineer's Nightmare

Custom Python breaks at COMP-3 and REDEFINES

Informatica costs $24,500+/year

Cobrix requires a Spark cluster

Everything you need, nothing you don't

COBOL Copybook parser (all COMP types)

PL/I DECLARE compiler (PLI2ROW)

COMP-5 on cross-platform hosts — ENDIAN parameter

CHRSET — automatic CCSID resolution for string columns

Precise EOF sentinel matching

Unicode: PIC N (National/UCS-2) + PIC G (DBCS)

UNLOAD parameter — mainframe unload formats

Db2 unload formats — Wave 2 roadmap

Transparent decompression — no flag, no surprise

Read directly from ZIP or FLAM archive members

Multi-language schema compilers — COBOL · PL/1 · HLASM live · C in progress · XSD · DFDL planned

ZIP archive output (one CSV per table)

EBCDIC to UTF-8 (64+ CCSIDs)

RFC-4180 CSV with full formatting control

Pre-compiled schema cache (ROWIN / ROWOUT)

Per-field type override (COLTYPE)

Directory-walk, EXCLUDE, REMOVE, RENAME

Five compression algorithms in ZIP

NORUN preflight + MAXOCCURS safety cap

SSH-URL input/output

Single binary, no runtime dependencies

PIC U — native UTF-8 strings

COPY directives — copybook includes

Automatic ZIP member naming

ISO/IEC 21320-1 conformance via [midN] token

Empty-member policy + stream mode

How does flbcsv compare?

Why now?

Simple, transparent pricing

Linux / Windows

z/OS LPAR

AIX / SPARC

Frequently asked questions

Start converting mainframe data today

ISO/IEC 21320-1 conformance via `[midN]` token