icepick
A CLI tool and wasm-compatible library for managing Apache Iceberg tables in AWS S3 Tables and Cloudflare R2 Data Catalog.
Table of Contents
What it does
icepick provides a simple command-line interface and wasm-friendly library for working with Apache Iceberg tables:
- List and inspect namespaces and tables
- Scan tables with partition pruning and column statistics
- Commit Parquet files to tables (with auto-detection of Hive-style partitions)
- Compact small files using bin-pack compaction
- Clean up snapshots based on retention policies
Why?
The official iceberg-rust library doesn't yet support WASM compilation, and most Iceberg tools are built for JVM environments. icepick fills the gap for:
- Serverless environments like Cloudflare Workers
- CLI-first workflows without spinning up Spark or Flink
- Lightweight table maintenance (compaction, snapshot cleanup)
- Quick data exploration without complex query engines
Quickstart
Install
Configure
Set your catalog credentials:
# For Cloudflare R2
# For AWS S3 Tables
# Uses AWS credential chain (env vars, ~/.aws/credentials, IAM role)
Verify Connection
# List namespaces
# List tables in a namespace
# Get table info
CLI Reference
Namespaces
# List all namespaces
# Create a namespace
# Delete a namespace
Tables
# List tables in a namespace
# Get detailed table info (schema, partitioning, snapshots)
# Scan table data (shows pruning stats with filters)
# Scan with filter
# Limit output rows
Commit Files
Commit existing Parquet files to an Iceberg table:
# Preview what would be committed (dry run)
# Commit files to existing table
# Create new table with partition spec
# For non-Hive paths, specify partition values explicitly
# Use specific file as schema exemplar
The commit command:
- Uses first file's schema (or
--exemplar) as the reference - Validates all files match the schema
- Extracts partition values from Hive-style paths automatically
- Supports
--partition-valuesfor flat directory structures - Shows detailed plan with
--dry-runbefore committing
Compaction
Merge small files into larger ones for better query performance:
# Preview compaction plan (dry run)
# Execute compaction with default settings
# Custom target file size (256 MB)
# Only compact files smaller than 128 MB
Snapshots
Manage table snapshots and clean up old versions:
# List all snapshots with age and status
# Preview cleanup (dry run)
# Execute cleanup with retention policy
Snapshot cleanup respects:
- Current snapshot - Never expired (it's the current table state)
- Referenced snapshots - Never expired if referenced by branches or tags
- Retention count - Keeps the N most recent regardless of age
- Age threshold - Only expires snapshots older than the threshold
Cloudflare R2
Authentication
- Log into the Cloudflare dashboard
- Navigate to My Profile → API Tokens
- Create a token with R2 read/write permissions
- Set environment variables:
WASM Compatibility
The R2 catalog is fully WASM-compatible, making it suitable for:
- Cloudflare Workers
- Browser applications (if your catalog REST API supports CORS)
AWS S3 Tables
Authentication
Uses the AWS default credential provider chain:
- Environment variables (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY) - AWS credentials file (
~/.aws/credentials) - IAM instance profile (EC2)
- ECS task role
Important: Ensure your credentials have S3 Tables permissions.
Platform Support
S3 Tables requires the AWS SDK and is only available on native platforms (Linux, macOS, Windows). It does not compile to WASM.
Library Usage
icepick can also be used as a Rust library for programmatic access to Iceberg tables. See DEVELOPER.md for:
- Rust API examples
- Direct Parquet writes
- Registering existing files
- WASM considerations
Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.
Acknowledgments
Built on the official iceberg-rust library from the Apache Iceberg project.