Expand description
Iceberg translator: projects merutable’s native Manifest onto Apache
Iceberg v2 TableMetadata JSON.
§Why this module exists
merutable’s commit path writes a native JSON manifest (see Manifest)
rather than Iceberg’s four-file (metadata.json + manifest-list Avro +
manifest Avro + data Parquet) layout. The native format is chosen for
efficiency — one fsyncable JSON per commit, zero Avro dependency on the
hot path — but it is designed as a strict superset of Iceberg v2
TableMetadata so that any merutable snapshot can be projected onto a
spec-compliant Iceberg metadata.json with no loss of information.
This module is that projection.
§What the translator does NOT do (yet)
- No deletion-vector projection. merutable writes V3-style Puffin
deletion-vector-v1blobs for partial compactions. V3 is not yet implemented by theiceberg-rscrate we depend on, so the emittedmetadata.jsonis shaped as v2 and the DV files are listed under a merutable-specific property (merutable.deletion-vectors). The Puffin files themselves on disk are already Iceberg v3 spec-compliant.
§Field mapping
merutable Manifest | Iceberg TableMetadata |
|---|---|
format_version | format-version (pinned 2) |
table_uuid | table-uuid |
last_updated_ms | last-updated-ms |
sequence_number | last-sequence-number |
snapshot_id | current-snapshot-id |
parent_snapshot_id | snapshot.parent-snapshot-id |
schema (merutable) | schemas[0] (Iceberg types) |
entries[] | referenced via manifest-list |
properties | properties (passed through) |
Functions§
- to_
iceberg_ data_ file_ v2 - Project one merutable
ManifestEntryonto an Iceberg v2DataFileentry in the shape an Avro manifest writer expects. - to_
iceberg_ data_ file_ v2_ with_ schema - Issue #20 Part 2b: schema-aware variant that projects
ParquetFileMeta::column_stats(hoisted from the Parquet writer’s own row-group metadata at write time) onto Iceberg’s five per-column stat maps. When stats are present, emits per field id: - to_
iceberg_ schema_ v2 - Project a merutable
TableSchemaonto an Iceberg v2 schema JSON. - to_
iceberg_ sort_ order_ v2 - Project merutable’s sort discipline (
_merutable_ikeyASC, which encodes(PK ASC, seq DESC)) onto an Iceberg v2 sort-order entry. - to_
iceberg_ v2_ table_ metadata - Project a merutable
Manifestonto an Iceberg v2TableMetadataJSON value. - to_
iceberg_ v2_ table_ metadata_ bytes - Convenience: serialize
to_iceberg_v2_table_metadatato pretty JSON bytes ready to write to disk.