kobold-csv
Forensic CSV/delimited evidence for COBOL record migration, reconciliation, ETL, and analyst review.
Export raw COBOL records + their copybook into deterministic delimited text (CSV / | / TSV) that preserves
both the semantic value AND the storage truth — pic / offset / length / raw_hex /
copybook_hash / record_hash / findings — parse a Compact extract back into the exact record bytes,
prove the round trip, and compute the batch control totals banks reconcile on. Fail-closed.
[!IMPORTANT] kobold-csv is clean-room and zero-libcob. It links no COBOL runtime, contains no libcob-derived or GnuCOBOL-derived code, and depends on no crate at all (
std-only) — in particular not ongnucobol-rs,gnucobol-rs-json,kobold-json, orkobold-xml. It is NOT a generic CSV library and NOT a GnuCOBOL 3.2 parity claim. It answers "what should forensic delimited-file evidence for a COBOL record migration / reconciliation look like?" Evidence here is theKOBOLD.CSV.*court namespace.
Why
A batch migration must be auditable: when a COBOL file becomes CSV — for an analyst, an ETL pipeline, or
a target-system reload — you must be able to prove (a) what each field means, (b) that no byte was silently
lost, coerced, or truncated, and (c) that the money still balances. kobold-csv carries the raw bytes
alongside the decoded value and emits a Finding instead of silently coercing when a numeric field is
not valid digits, when a value overflows its PIC, when a quoted cell is malformed, or when a header does not
match the copybook.
Export modes (KOBOLD.CSV.EXPORT.1)
| Mode | Shape | Header |
|---|---|---|
Compact |
one row per record | field names — the classic ACCOUNT_NO,BALANCE,STATUS extract |
Audit |
tall, one row per (record, field) | field,value,pic,offset,length,raw_hex,findings |
Evidence |
tall + custody hashes | record_hash,copybook_hash,field,value,pic,offset,length,raw_hex,findings |
A wide row cannot carry per-field metadata, so custody data goes tall. Hashes are sha256:-prefixed.
Numeric rendering: leading zeros stripped, decimal point at the implied scale, leading - if negative
(zoned sign overpunch recognized); raw_hex is lowercase hex of the exact field bytes. A non-digit numeric
field emits a NUMERIC_NONDIGIT finding — never a silent coercion — and raw_hex preserves the truth.
Quoting (KOBOLD.CSV.ESCAPE)
RFC-4180 style: a field is quoted when it contains the delimiter, the quote char, CR, or LF; embedded quotes
are doubled (" → ""). The writer (write_field) and the fail-closed reader (parse_row) are exact
inverses — an unterminated quote or stray text after a closing quote is a Finding, never a guess. Dialects:
Dialect::csv() (comma), Dialect::pipe() (|), Dialect::tab() (TSV).
Courts
KOBOLD.CSV.EXPORT.1— records + copybook → delimited evidence (Compact / Audit / Evidence).KOBOLD.CSV.PARSE.1— a Compact extract + copybook → reconstructed record bytes, fail-closed (a value too long for its field, a non-numeric value into a numeric field, a header mismatch, or a wrong column count yields aFinding, not bytes).KOBOLD.CSV.ROUNDTRIP.1— records → Compact CSV →parse_into→ identical bytes (a value-only extract that cannot preserve a non-canonical stored form is reported honestly, not faked).KOBOLD.CSV.DIFF.1— row/field-wise differences between a source table and a target table (columns aligned by name); aDiffEntry { row, field, source, target }per differing cell.KOBOLD.CSV.CONTROLTOTAL.1— exact integer-scaled field sums (no float drift) + record count: the batch control totals a reconciliation balances on.
Dependency-free
No crate dependencies, std-only. SHA-256 (for copybook_hash / record_hash) is a small pure-Rust
implementation in src/sha256.rs, tested against published vectors. #![forbid(unsafe_code)].
License: Apache-2.0.