# kobold-csv
**Forensic CSV/delimited evidence for COBOL record migration, reconciliation, ETL, and analyst review.**
Export raw COBOL records + their copybook into deterministic delimited text (CSV / `|` / TSV) that preserves
**both the semantic value AND the storage truth** — `pic` / `offset` / `length` / `raw_hex` /
`copybook_hash` / `record_hash` / `findings` — parse a Compact extract back into the **exact** record bytes,
prove the round trip, and compute the batch **control totals** banks reconcile on. **Fail-closed.**
> [!IMPORTANT]
> **kobold-csv is clean-room and zero-libcob.** It links no COBOL runtime, contains no libcob-derived or
> GnuCOBOL-derived code, and depends on **no crate at all** (`std`-only) — in particular not on
> `gnucobol-rs`, `gnucobol-rs-json`, `kobold-json`, or `kobold-xml`. It is **NOT a generic CSV library** and
> **NOT a GnuCOBOL 3.2 parity claim.** It answers *"what should forensic delimited-file evidence for a COBOL
> record migration / reconciliation look like?"* Evidence here is the `KOBOLD.CSV.*` court namespace.
## Why
A batch migration must be **auditable**: when a COBOL file becomes CSV — for an analyst, an ETL pipeline, or
a target-system reload — you must be able to prove (a) what each field *means*, (b) that no byte was silently
lost, coerced, or truncated, and (c) that the money still balances. kobold-csv carries the raw bytes
alongside the decoded value and **emits a `Finding` instead of silently coercing** when a numeric field is
not valid digits, when a value overflows its PIC, when a quoted cell is malformed, or when a header does not
match the copybook.
## Export modes (`KOBOLD.CSV.EXPORT.1`)
| `Compact` | one row **per record** | field names — the classic `ACCOUNT_NO,BALANCE,STATUS` extract |
| `Audit` | **tall**, one row per (record, field) | `field,value,pic,offset,length,raw_hex,findings` |
| `Evidence` | tall + custody hashes | `record_hash,copybook_hash,field,value,pic,offset,length,raw_hex,findings` |
A wide row cannot carry per-field metadata, so custody data goes **tall**. Hashes are `sha256:`-prefixed.
Numeric rendering: leading zeros stripped, decimal point at the implied `scale`, leading `-` if negative
(zoned sign overpunch recognized); `raw_hex` is lowercase hex of the exact field bytes. A non-digit numeric
field emits a `NUMERIC_NONDIGIT` finding — never a silent coercion — and `raw_hex` preserves the truth.
## Quoting (`KOBOLD.CSV.ESCAPE`)
RFC-4180 style: a field is quoted when it contains the delimiter, the quote char, CR, or LF; embedded quotes
are doubled (`"` → `""`). The writer (`write_field`) and the **fail-closed** reader (`parse_row`) are exact
inverses — an unterminated quote or stray text after a closing quote is a `Finding`, never a guess. Dialects:
`Dialect::csv()` (comma), `Dialect::pipe()` (`|`), `Dialect::tab()` (TSV).
## Courts
- `KOBOLD.CSV.EXPORT.1` — records + copybook → delimited evidence (Compact / Audit / Evidence).
- `KOBOLD.CSV.PARSE.1` — a Compact extract + copybook → reconstructed record bytes, **fail-closed** (a value
too long for its field, a non-numeric value into a numeric field, a header mismatch, or a wrong column
count yields a `Finding`, not bytes).
- `KOBOLD.CSV.ROUNDTRIP.1` — records → Compact CSV → `parse_into` → **identical** bytes (a value-only extract
that cannot preserve a non-canonical stored form is reported honestly, not faked).
- `KOBOLD.CSV.DIFF.1` — row/field-wise differences between a source table and a target table (columns aligned
by name); a `DiffEntry { row, field, source, target }` per differing cell.
- `KOBOLD.CSV.CONTROLTOTAL.1` — exact **integer-scaled** field sums (no float drift) + record count: the
batch control totals a reconciliation balances on.
## Dependency-free
No crate dependencies, `std`-only. SHA-256 (for `copybook_hash` / `record_hash`) is a small pure-Rust
implementation in `src/sha256.rs`, tested against published vectors. `#![forbid(unsafe_code)]`.
License: Apache-2.0.