prestige-cli-0.2.6 is not a library.
Prestige CLI
Command-line interface for S3 Parquet file operations.
Installation
Build from source:
The binary will be available at target/release/prestige.
Commands
compact
Consolidate small parquet files in S3 into larger files.
Basic Usage
Options
--bucket <BUCKET>- S3 bucket name (required)--prefix <PREFIX>- File prefix to compact (required)--start <START>- Unix timestamp in seconds, exclusive lower bound (required)--end <END>- Unix timestamp in seconds, inclusive upper bound (required)--target-bytes <BYTES>- Target size per output file in bytes (default: 104857600 = 100MB)--delete-originals- Delete original files after successful compaction (default: true)--compression <TYPE>- Compression algorithm: snappy, gzip, lzo, brotli, lz4, zstd, uncompressed (default: snappy)--row-group-size <SIZE>- Parquet row group size (default: 10000)--deduplicate- Enable row-level deduplication (default: false)--plan- Dry-run mode: show statistics without modifying files
Authentication
AWS credentials can be provided via:
-
Command-line arguments:
-
Environment variables:
-
AWS credentials file (
~/.aws/credentials)
Plan Mode
Use --plan to estimate compaction results without modifying files:
Output:
Deduplication
Enable row-level deduplication to eliminate duplicate records:
Output:
Note: Deduplication uses row hashing and may increase processing time and memory usage proportional to the number of unique records.
LocalStack Testing
For local S3 testing with LocalStack:
Examples
Compact last hour of data
START=
END=
Compact with maximum compression
Compact without deleting originals
File Naming Convention
The compactor follows these naming conventions:
- Original files:
{prefix}.{timestamp_millis}.parquet - Compacted files:
{prefix}.{timestamp_millis}.c.parquet - Processed markers:
{prefix}.{timestamp_millis}.parquet.processed
Idempotency
The compactor is idempotent:
- After successful compaction,
.processedmarkers are created for source files - Subsequent runs skip files that have
.processedmarkers - If deletion fails, the marker prevents reprocessing
- Use
last_processed_timestampfor checkpoint-based processing
Error Handling
- Deletion failures are tracked in
deletion_failuresbut don't fail the operation - Schema mismatches cause files to be skipped
- Empty or invalid parquet files are skipped
- All errors are logged to stderr