Skip to main content

Module translate

Module translate 

Source
Expand description

Iceberg translator: projects merutable’s native Manifest onto Apache Iceberg v2 TableMetadata JSON.

§Why this module exists

merutable’s commit path writes a native JSON manifest (see Manifest) rather than Iceberg’s four-file (metadata.json + manifest-list Avro + manifest Avro + data Parquet) layout. The native format is chosen for efficiency — one fsyncable JSON per commit, zero Avro dependency on the hot path — but it is designed as a strict superset of Iceberg v2 TableMetadata so that any merutable snapshot can be projected onto a spec-compliant Iceberg metadata.json with no loss of information.

This module is that projection.

§What the translator does NOT do (yet)

  • No deletion-vector projection. merutable writes V3-style Puffin deletion-vector-v1 blobs for partial compactions. V3 is not yet implemented by the iceberg-rs crate we depend on, so the emitted metadata.json is shaped as v2 and the DV files are listed under a merutable-specific property (merutable.deletion-vectors). The Puffin files themselves on disk are already Iceberg v3 spec-compliant.

§Field mapping

merutable ManifestIceberg TableMetadata
format_versionformat-version (pinned 2)
table_uuidtable-uuid
last_updated_mslast-updated-ms
sequence_numberlast-sequence-number
snapshot_idcurrent-snapshot-id
parent_snapshot_idsnapshot.parent-snapshot-id
schema (merutable)schemas[0] (Iceberg types)
entries[]referenced via manifest-list
propertiesproperties (passed through)

Functions§

to_iceberg_data_file_v2
Project one merutable ManifestEntry onto an Iceberg v2 DataFile entry in the shape an Avro manifest writer expects.
to_iceberg_data_file_v2_with_schema
Issue #20 Part 2b: schema-aware variant that projects ParquetFileMeta::column_stats (hoisted from the Parquet writer’s own row-group metadata at write time) onto Iceberg’s five per-column stat maps. When stats are present, emits per field id:
to_iceberg_schema_v2
Project a merutable TableSchema onto an Iceberg v2 schema JSON.
to_iceberg_sort_order_v2
Project merutable’s sort discipline (_merutable_ikey ASC, which encodes (PK ASC, seq DESC)) onto an Iceberg v2 sort-order entry.
to_iceberg_v2_table_metadata
Project a merutable Manifest onto an Iceberg v2 TableMetadata JSON value.
to_iceberg_v2_table_metadata_bytes
Convenience: serialize to_iceberg_v2_table_metadata to pretty JSON bytes ready to write to disk.