Prestige CLI
Command-line interface for S3 Parquet file operations.
Installation
Build from source:
The binary will be available at target/release/prestige.
Commands
compact
Consolidate small parquet files in S3 into larger files.
Basic Usage
Options
--bucket <BUCKET>- S3 bucket name (required)--prefix <PREFIX>- File prefix to compact (required)--start <START>- Unix timestamp in seconds, exclusive lower bound (required)--end <END>- Unix timestamp in seconds, inclusive upper bound (required)--target-bytes <BYTES>- Target size per output file in bytes (default: 104857600 = 100MB)--delete-originals- Delete original files after successful compaction (default: true)--compression <TYPE>- Compression algorithm: snappy, gzip, lzo, brotli, lz4, zstd, uncompressed (default: snappy)--row-group-size <SIZE>- Parquet row group size (default: 10000)--deduplicate- Enable row-level deduplication (default: false)--plan- Dry-run mode: show statistics without modifying files
Authentication
AWS credentials can be provided via:
-
Command-line arguments:
-
Environment variables:
-
AWS credentials file (
~/.aws/credentials)
Plan Mode
Use --plan to estimate compaction results without modifying files:
Output:
Deduplication
Enable row-level deduplication to eliminate duplicate records:
Output:
Note: Deduplication uses row hashing and may increase processing time and memory usage proportional to the number of unique records.
LocalStack Testing
For local S3 testing with LocalStack:
iceberg-compact
Compact an Iceberg table by rewriting small files into larger, sorted files. Requires the iceberg feature flag.
Basic Usage
Options
--catalog-uri <URI>- REST catalog URI (env:ICEBERG_CATALOG_URI) (required)--catalog-name <NAME>- Catalog name (default: "default")--warehouse <WAREHOUSE>- Warehouse identifier (env:ICEBERG_WAREHOUSE) (required)--namespace <NAMESPACE>- Iceberg namespace, dot-separated (required)--table <TABLE>- Table name (required)--target-bytes <BYTES>- Target file size in bytes (default: 104857600 = 100MB)--deduplicate- Enable row-level deduplication by identifier fields (default: false)--min-files <N>- Minimum number of files before compaction triggers (default: 5)--compression <TYPE>- Compression algorithm: snappy, gzip, lzo, brotli, lz4, zstd, uncompressed--s3-endpoint <URL>- S3 endpoint override (env:AWS_ENDPOINT_URL)--s3-region <REGION>- S3 region (env:AWS_REGION)--s3-access-key <KEY>- S3 access key (env:AWS_ACCESS_KEY_ID)--s3-secret-key <KEY>- S3 secret key (env:AWS_SECRET_ACCESS_KEY)
iceberg-scan
Scan and display records from an Iceberg table.
Basic Usage
Options
--catalog-uri <URI>- REST catalog URI (env:ICEBERG_CATALOG_URI) (required)--catalog-name <NAME>- Catalog name (default: "default")--warehouse <WAREHOUSE>- Warehouse identifier (env:ICEBERG_WAREHOUSE) (required)--namespace <NAMESPACE>- Iceberg namespace, dot-separated (required)--table <TABLE>- Table name (required)--limit <N>- Maximum number of records to display (default: 20)--snapshot-id <ID>- Scan a specific snapshot (time travel)--filter <EXPR>- Row filter expression (repeatable, ANDed together). Format:"column op value"where op is=,!=,>,>=,<, or<=- S3/catalog connection options (same as iceberg-compact)
Filter Examples
iceberg-info
Display Iceberg table metadata including schema, partition spec, snapshots, and properties.
Connection options are the same as iceberg-compact and iceberg-scan.
Examples
Compact last hour of data
START=
END=
Compact with maximum compression
Compact without deleting originals
File Naming Convention
The compactor follows these naming conventions:
- Original files:
{prefix}.{timestamp_millis}.parquet - Compacted files:
{prefix}.{timestamp_millis}.c.parquet - Processed markers:
{prefix}.{timestamp_millis}.parquet.processed
Idempotency
The compactor is idempotent:
- After successful compaction,
.processedmarkers are created for source files - Subsequent runs skip files that have
.processedmarkers - If deletion fails, the marker prevents reprocessing
- Use
last_processed_timestampfor checkpoint-based processing
Error Handling
- Deletion failures are tracked in
deletion_failuresbut don't fail the operation - Schema mismatches cause files to be skipped
- Empty or invalid parquet files are skipped
- All errors are logged to stderr