+49 (0) 6172 / 5919-0 info@flam.de

COBOL Copybook to CSV Converter

Convert EBCDIC binary files from mainframe systems to CSV — on Linux, without a Spark cluster, without Informatica. Output is a ZIP archive containing one CSV per table type. From €59/month.

# Local binary → CSV in ZIP archive flbcsv FORMAT=payroll.cbl INPUT=payroll.bin OUTPUT=payroll.zip # German EBCDIC (IBM-1141), pipe-separated, no header flbcsv FORMAT=payroll.cbl CCSID=IBM-1141 STYLE(SEPCHR=PIPE NOHDLN) \ INPUT=payroll.bin OUTPUT=payroll.zip # IMS DFSURGU0 LONG (IBM z/OS) → one CSV per segment type flbcsv FORMAT='./ims_copybooks/*.cbl' UNLOAD=IMSL CCSID=IBM-1140 \ INPUT=ims_unload.bin OUTPUT=ims_data.zip # z/OS COMP-5 on Linux: force big-endian decoding of mainframe halfwords flbcsv FORMAT=payroll.cbl CCSID=IBM-1140 ENDIAN=BIG \ INPUT=payroll_from_zos.bin OUTPUT=payroll.zip # IMS SHORT variant (4-byte prefix, no segment name — fixlen routing) flbcsv FORMAT='./ims_copybooks/*.cbl' UNLOAD=IMSS CCSID=IBM-1140 \ INPUT=ims_short.bin OUTPUT=ims_data.zip # ISO/IEC 21320-1 conform ZIP with unique member names via [midN] flbcsv FORMAT=mixed.cbl INPUT=mixed.bin OUTPUT=out.zip \ ZIP(MEMBER='[mid4].[name].[table].csv') # PL/I DECLARE with German EBCDIC big-endian flbcsv FORMAT=customer.pli CCSID=IBM-1141 ENDIAN=BIG \ INPUT=customer.bin OUTPUT=customer.zip
COBOL Copybook · COMP-3 · REDEFINES IMS LONG + SHORT (IMSL/IMSS) Db2 roadmap: DB2U · DB2S · DB2IX PL/1 DECLARE (PLI2ROW) — live HLASM DSECT ✅ live · C in progress · XSD · DFDL planned PIC N/G/U · 64+ CCSIDs ISO/IEC 21320-1 ready No Spark required Linux · z/OS · Windows · AIX · Solaris

The Data Engineer's Nightmare

Extracting data from mainframe binary files is harder than it should be. Every existing approach forces painful compromises.

🐍

Custom Python breaks at COMP-3 and REDEFINES

Hand-rolled scripts handle simple cases fine — until the copybook has COMP-3 packed decimal fields, nested REDEFINES, or OCCURS DEPENDING ON clauses. Then they silently produce wrong output or crash outright. Every new copybook means weeks of rework.

💸

Informatica costs $24,500+/year

Informatica PowerExchange solves the problem — but it requires a full ETL platform license starting at $24,500 per year, dedicated infrastructure, and weeks of professional services to get a single pipeline running. Completely disproportionate for teams that just need CSV output.

Cobrix requires a Spark cluster

Cobrix is the most widely used open-source alternative, but it runs only inside Apache Spark. Spinning up a Spark cluster for a 500 MB VSAM extract adds infrastructure cost, operational overhead, and latency — with no support for IMS or complex REDEFINES structures.

Everything you need, nothing you don't

flbcsv handles the full spectrum of mainframe binary formats — from simple EBCDIC copybooks to complex IMS database unloads — in a single command-line tool. Output is always a ZIP archive containing one CSV file per logical table.

📋

COBOL Copybook parser (all COMP types)

Full parser for COBOL copybook layouts: COMP, COMP-1/2 (HFP), COMP-3 (packed decimal), COMP-4, COMP-5, PIC X/A (EBCDIC strings), PIC 9/S9 DISPLAY, REDEFINES, and OCCURS n TIMES (flat columns). Fixed and variable-length records.

📋

PL/I DECLARE compiler (PLI2ROW)

Pass any .pli or .pl1 file as FORMAT — flbcsv selects the PLI2ROW compiler automatically. Supported declarations: FIXED BIN(p) (1/2/4/8 byte two's-complement), FIXED DEC(p[,q]) (IBM packed BCD), FLOAT DEC(p) (HFP32/64/128), CHAR(n) (EBCDIC string), CHAR(n) VARYING, BIT(n), PICTURE (zoned and display), UNION (one CSV member per alternative), and DIMENSION(n) array expansion. ALIGNED padding is inserted automatically; FILLER and PTR fields are silently skipped. Combine with ENDIAN=BIG for z/OS-produced data on open-world hosts.

🔁

COMP-5 on cross-platform hosts — ENDIAN parameter

COMP-5 is "native endian" by COBOL standard, so z/OS emits big-endian halfwords while Linux/Windows emit little-endian. flbcsv takes an explicit ENDIAN={SYSTEM,BIG,LITTLE,LOCAL} parameter to force the byte order at schema-compile time. Set ENDIAN=BIG when reading a mainframe unload on Linux — the generated rowspec carries the endian qualifier into the COMP-5 decoder. Regular COMP / COMP-4 stays fixed big-endian per COBOL standard, independent of the parameter.

🌍

CHRSET — automatic CCSID resolution for string columns

The COMPILE sub-object now supports a CHRSET parameter that resolves the source character set for PIC X/A string columns when no explicit CCSID is given. CHRSET=EBCDIC (default) reads FL_DEFAULT_EBCDIC_CCSID from the environment (e.g. IBM-1141 on German z/OS); CHRSET=ASCII reads FL_DEFAULT_ASCII_CCSID. An explicit CCSID= always takes priority over CHRSET.

🛡️

Precise EOF sentinel matching

IMS_EOF catch-all rows validate their EOF sentinel bytes via type.binary(match(...)) instead of silently draining any leftover record. IMSL checks PCF=X'00' at offset 4 of the 20-byte prefix; IMSS checks the full 4-byte X'00040000' prefix. Format corruption that would otherwise be hidden by the catch-all is now surfaced as a hard error — exactly what you want when verifying a mainframe transfer.

🌐

Unicode: PIC N (National/UCS-2) + PIC G (DBCS)

PIC N fields (USAGE NATIONAL) are decoded as UTF-16 Big-Endian and emitted as UTF-8 CSV. PIC G fields (EBCDIC DBCS with SO/SI) are passed through as binary — preserving the raw encoding for downstream CCSID conversion.

🧭

UNLOAD parameter — mainframe unload formats

A single UNLOAD= keyword selects the unload wrapper around your copybooks. Available today: NONE (plain fixed-format, default), IMSL (IMS DFSURGU0 LONG, IBM z/OS — 20-byte prefix, SEGNAM auto-routing), IMSS (IMS DFSURGU0 SHORT — 4-byte prefix, fixlen routing) and HDU (IMS HD Unload / DFSURGL0 — physical database image with 36-byte SEG-CODE prefix, multi-segment routing via record-length match). Header/trailer records (PCF=X'00', IMS-EOF, HDU-UNKNOWN) are drained silently via catch-all row spec.

🗃️

Db2 unload formats — Wave 2 roadmap

The UNLOAD parameter already reserves keywords for three Db2 unload formats arriving in Wave 2: DB2U (DSNUTILB UNLOAD FORMAT UNLOAD — RECFM=VB with 2-byte OBID and inline X'FF' null indicator), DB2S (DSNTIAUL sample program — RECFM=FB with grouped X'6F' null indicators at row front), and DB2IX (PC/IXF self-describing format — no copybook needed, schema embedded in H/T/C/D records). Plus IMSH for the HP IMS variant. Today these keywords parse and return a clean "not yet supported" error — existing CI scripts will work unchanged once the Wave 2 implementation ships.

📥

Transparent decompression — no flag, no surprise

Compressed input is auto-detected by magic bytes and decompressed on the fly before the schema compiler sees the data. Supported today: gzip, bzip2, xz/lzma, zstd, lz4, lzip. TERSE/AMATERSE (the classic IBM z/OS transport format) arrives in Wave 2 via the bundled CNVTRS converter. No CLP flag to set — uncompressed input passes through without any overhead, compressed input just works. Base-encoded or encrypted inputs remain the domain of flcl conv (separate pre-processing step).

🗂️

Read directly from ZIP or FLAM archive members

Point flbcsv at a single member of a ZIP or FLAM4 (FL4) archive using the URL-member syntax INPUT='archive.zip/?MEMBER_NAME' — no need to extract first. The archive type is auto-detected from magic bytes (ZIP or FL4), so the same URL works unchanged against either container. Ideal for bundled deliveries from mainframe shops: one transportable ZIP produced by a single JCL step on z/OS replaces dozens of individually-transferred binaries, while still honouring per-member compression (DEFLATE, BZIP2, LZMA, ZSTD, or STORED) and record-length hints embedded in the member filename (.F<lrecl>.bin).

🧰

Multi-language schema compilers — COBOL · PL/1 · HLASM live · C in progress · XSD · DFDL planned

The FMT2TAB infrastructure (Open/Add/Fin/Close pattern, CLPSTR accumulator, compiler-probe array) is fully abstracted and prepared for additional source languages. COBOL copybooks (COB2ROW) and PL/I structure files (PLI2ROW) are in production today. PLI2ROW compiles DCL/DECLARE blocks with level numbers 1–15: FIXED BIN/DEC, FLOAT DEC (IBM HFP32/64/128), CHAR(n) [VARYING], BIT(n), PICTURE, UNION (one row per alternative), DIMENSION expansion, and ALIGNED padding. Pass any .pli or .pl1 file as FORMAT — the compiler is selected automatically by file extension. Roadmap: C992ROW (C99 struct / union with #pragma pack and typedef chains) and XSD2ROW (XML Schema → tabular conversion for SEPA/XBRL/ SWIFT/ISO 20022). Available now: ASM2ROW (z/OS HLASM DSECT/DS/DC for legacy messaging APIs) and DFDL2ROW (IBM's Data Format Description Language).

📦

ZIP archive output (one CSV per table)

All output is written into a ZIP archive. When the input contains multiple segment or table types (IMS multi-segment, REDEFINES alternatives), each logical table becomes its own CSV member — cleanly separated, no interleaving.

🔤

EBCDIC to UTF-8 (64+ CCSIDs)

Converts all FL5-supported EBCDIC codepages to UTF-8 output including IBM-037 (US), IBM-273 (German), IBM-1141 (German/Euro), and 60+ others. CCSID is specified per run; auto-detection is not required.

⚙️

RFC-4180 CSV with full formatting control

STYLE object: separator (, ; | TAB SPACE), quote character (any Unicode codepoint), quoting policy (ALWAYS / IFREQ / NONE), record delimiter (CRLF / LF), header row on/off, escape handling, format checks. TRIM: NONE / TRAILING / LEADING / BOTH. COLPATH: FULL hierarchical or LEAF names. COLSEP: UNDERSCORE / DOT / HYPHEN / NONE. FILLER: COLUMN / ALIGN / SKIP. Fixed record length override (RECLEN) for non-standard block sizes.

Pre-compiled schema cache (ROWIN / ROWOUT)

Compile your COBOL copybook once with ROWOUT=schema.tab, then feed it back thousands of times with ROWIN=schema.tab — skips the parse + type-resolution stage and starts streaming immediately. Ideal for CI/CD pipelines, high-throughput batch jobs, and cloud-functions with a strict cold-start budget.

🎯

Per-field type override (COLTYPE)

Wildcard-pattern overrides without re-editing the copybook: COLTYPE='*-KEY/&binary' forces key fields to raw bytes; COLTYPE='*-TEXT-*/&IBM-1141' switches a column family to a different CCSID. Useful for legacy copybooks where PIC X really holds binary or mixed-encoding data.

🔁

Directory-walk, EXCLUDE, REMOVE, RENAME

Wildcard INPUT supports directory trees with per-pattern EXCLUDE. After successful conversion, source files can be REMOVEd or RENAMEd via pattern (e.g. move to archive/, timestamp stamp) — enabling idempotent pipeline runs without external glue code.

📦

Five compression algorithms in ZIP

ZIP sub-object ALGO: DEFLATE (default, universal compatibility), COPY (stored uncompressed — fastest, inline-ready), BZIP2 (best ratio on text), ZSTD (best speed/ratio trade-off, Zip64 only), LZMA. LEVEL NORMAL / FAST / BEST. FORMAT AUTO / Z32 / Z64 — Zip64 extension support for archives > 4 GiB or > 65k members.

NORUN preflight + MAXOCCURS safety cap

NORUN parses all parameters and validates the schema + output layout without touching any data — ideal for CI gates and infra-as-code validation. MAXOCCURS caps OCCURS-expansion so a pathological copybook (OCCURS 100000) cannot blow up the column matrix.

🔗

SSH-URL input/output

Read source files or write the output ZIP archive via SSH URLs — e.g. OUTPUT='ssh://server/export/payroll.zip'. On z/OS: read DSN locally, push ZIP to remote Linux server via SSH.

🏎️

Single binary, no runtime dependencies

Ships as a statically linked executable. No JVM, no Python environment, no shared libraries to manage. Deploy by copying one file to any Linux, z/OS, Windows, AIX, or Solaris server.

🔡

PIC U — native UTF-8 strings

PIC U fields (USAGE UTF-8, IBM z/OS extension) are handled natively: the n-byte UTF-8 payload is passed through without re-encoding and emitted directly as UTF-8 CSV. No intermediate EBCDIC conversion required.

📂

COPY directives — copybook includes

COBOL COPY directives are resolved at parse time: the named member is located in the same directory (or a configured search path) and inlined into the record layout. COPY REPLACING is rejected with a clear error — use preprocessed copybooks for that case.

🗂️

Automatic ZIP member naming

Each CSV member inside the output ZIP is named [name].[table].csv by default — e.g. payroll.EMPLOYEE.csv. For IMS multi-segment unloads every segment type gets its own predictably named member, ready for direct COPY INTO in Snowflake or DuckDB.

🔐

ISO/IEC 21320-1 conformance via [midN] token

When the same row-spec can fire multiple times per input file (OCCURS DEPENDING, multiple REDEFINES alternatives, recurring segments), the optional [midN] pattern token adds a zero-padded archive-local member counter — yielding unique member names accepted by strict ISO-21320 readers such as Java ZipFile. Example: MEMBER='[mid4].[name].[table].csv' emits 0001.payroll.EMP.csv, 0002.payroll.DEPT.csv, …

📭

Empty-member policy + stream mode

Row-specs that never match (drain rows, sentinel records) would normally leave empty members in the archive. The EMPTY parameter (DELETE by default in bcsv) silently removes them from the central directory. ZIP(STREAM) enables sequential-only writes required for z/OS sequential datasets — no seek back, no rewrite, MVS-DASD friendly.

How does flbcsv compare?

A direct comparison against the tools data engineers reach for when facing mainframe binary files.

Tool Cost CLI / scriptable COBOL REDEFINES IMS support No Spark needed
Informatica PowerExchange ~$2,000/mo GUI / server only
Cobrix OSS Free API only partial Spark required
pycobol2csv Free CLI incomplete
flbcsv €59/mo CLI fully supported IMS DFSURGU0 no Spark

Why now?

Mainframe data integration is not a niche problem. The numbers show a market under pressure to modernize — and running out of easy options.

220 bn
Lines of COBOL code still running in production worldwide
25%
Of German enterprises actively use COBOL systems in core business processes
$8.4 bn
Projected mainframe modernization market size in 2025
Nov. 2025
AWS Mainframe Modernization stopped accepting new customers — closing a major migration pathway

Simple, transparent pricing

One tool, multiple platforms. Pay only for the platform you deploy on. Annual subscriptions available with a 15% discount.

Linux / Windows

x86-64, 32- and 64-bit

€59
per month, per server
  • Linux x86-64 and x86-32 binaries
  • Windows x64 and x86 binaries
  • Full COBOL copybook parser (FIX + VAR records)
  • COMP-3 / packed decimal; COMP/4/5; COMP-1/2 (HFP)
  • REDEFINES + OCCURS n TIMES
  • PIC N (National/UCS-2) + PIC G (DBCS) + PIC U (UTF-8)
  • IMS DFSURGU0 HD Unload support
  • COPY directives resolved at parse time
  • ZIP output; auto member naming; configurable delimiter + header
  • Email support
€449
per month, per LPAR
  • Runs natively on z/OS USS (UNIX System Services)
  • Reads VSAM and sequential datasets directly
  • Full copybook and IMS DBD support
  • EBCDIC-to-UTF-8 / ASCII conversion included
  • No mainframe ISV middleware required
  • Priority email and phone support

AIX / SPARC

Power and Solaris platforms

€99
per month, per server
  • IBM AIX on Power (32- and 64-bit)
  • Solaris SPARC and Solaris x86
  • Same feature set as Linux edition
  • Ideal for midrange data pipelines
  • Email support

Annual subscription: 15% discount (equivalent to one free month)
Free trial: up to 1 MiB input, no registration required. Contact us to get your trial binary.

Frequently asked questions

Answers to the questions we hear most often from data engineers evaluating flbcsv.

Does flbcsv support REDEFINES and OCCURS?
REDEFINES is fully supported: each REDEFINES alternative generates its own row specification, and the correct variant is selected at runtime based on the data value of a discriminator field. OCCURS n TIMES (fixed-count arrays) is also fully supported — each occurrence becomes a separate column (e.g. FIELD_1, FIELD_2, ..., FIELD_n). OCCURS DEPENDING ON (variable-length arrays) is not supported in Wave 1 and requires a full FLAM license.
Do I need a mainframe or Spark cluster to use flbcsv?
No. flbcsv runs as a standard command-line binary on Linux x86-64, Windows, AIX, or Solaris — no mainframe connection, no Spark, no JVM, no Python environment. You only need the binary file exported from the mainframe (via FTP, NFS, or any file transfer method) and the COBOL copybook that describes its layout. The tool handles everything else locally.
What is COMP-3 / packed decimal?
COMP-3 (also called packed decimal or BCD) is a binary encoding used by COBOL for numeric fields. Each decimal digit is stored in 4 bits, with the sign in the last 4 bits of the final byte. A field defined as PIC S9(7)V99 COMP-3 occupies 5 bytes on disk but represents a signed 9-digit number with 2 decimal places. Misinterpreting COMP-3 fields as raw bytes is one of the most common causes of garbage output from home-grown parsers. flbcsv decodes all COMP variants (COMP through COMP-5) correctly.
Can flbcsv handle IMS database unload files?
Yes. flbcsv reads IMS DFSURGU0 HD unload files and auto-detects segment types from the standard unload prefix (bytes 6–13 = 8-character segment name). Each segment type is routed to the COBOL copybook whose filename matches the segment name (e.g. EMPLOYEE.cbl for segment EMPLOYEE). Each segment type is written as a separate CSV member inside the output ZIP. Wave-1 limitation: only LONG/HD format (DFSURGU0-compatible). SHORT/XSHORT (no segment name in the record) and HALDB are deferred to Wave 2.
What is the difference between flbcsv and Cobrix?
Cobrix is an open-source Spark datasource library written in Scala. It reads COBOL copybook files inside a Spark job — which means you need a running Spark cluster and a JVM to use it at all. Cobrix does not support IMS DFSURGU0 unloads, and its REDEFINES support is incomplete for complex nested structures. flbcsv is a standalone command-line binary with no runtime dependencies. It runs on Linux, z/OS, Windows, AIX, and Solaris. It handles all COMP types, REDEFINES, OCCURS n TIMES, and IMS DFSURGU0 HD unloads without any infrastructure requirements.
What is the output format?
flbcsv always produces a ZIP archive as its output. Each CSV member is named [copybook].[table].csv by default (e.g. payroll.EMPLOYEE.csv). For a single-segment COBOL file the ZIP contains one CSV member; for IMS multi-segment unloads each segment type becomes its own predictably named CSV member — ready to extract individually or pass directly to DuckDB, pandas, or Snowflake's COPY INTO. The CSV separator (SEP / CSV / TAB / PSV), quoting, header row, trim mode, and EBCDIC-to-UTF-8 conversion are all configurable via the STYLE() parameter.

Start converting mainframe data today

Free trial up to 1 MiB. No registration, no credit card.

Request Free Trial