rustberg 0.0.5

A production-grade, cross-platform, single-binary Apache Iceberg REST Catalog
Documentation

Rustberg


Why Rustberg?

Rustberg is a production-grade Apache Iceberg REST Catalog designed for simplicity and performance:

Core Capabilities

  • πŸš€ Instant Startup β€” Sub-10ms cold start, ready immediately
  • πŸ“¦ Single Binary β€” No JVM, no PostgreSQL, no external services required
  • πŸ” Security First β€” TLS 1.3, API keys, JWT/OIDC, Cedar policies, AES-256-GCM encryption
  • ☸️ Kubernetes Native β€” SlateDB on S3/GCS/Azure for horizontal scaling
  • 🌍 Cross-Platform β€” Linux, macOS, Windows with first-class support
  • πŸ“‹ Full Iceberg REST API β€” Tables, views, namespaces, transactions, credential vending

Quick Start

Option 1: Pre-built Binaries

# Linux (x86_64)
curl -L https://github.com/hupe1980/rustberg/releases/latest/download/rustberg-linux-x86_64 -o rustberg

# Linux (ARM64)
curl -L https://github.com/hupe1980/rustberg/releases/latest/download/rustberg-linux-aarch64 -o rustberg

# macOS (Apple Silicon)
curl -L https://github.com/hupe1980/rustberg/releases/latest/download/rustberg-darwin-aarch64 -o rustberg

# Make executable and run
chmod +x rustberg
./rustberg

Option 2: Docker

# Start Rustberg
docker run -d -p 8181:8181 --name rustberg \
  -e RUSTBERG_INSECURE_HTTP=true \
  ghcr.io/hupe1980/rustberg:latest

# Verify it's running
curl http://localhost:8181/health

# Create a namespace
curl -X POST http://localhost:8181/v1/namespaces \
  -H "Content-Type: application/json" \
  -d '{"namespace": ["my_namespace"]}'

Option 3: Helm Chart (Kubernetes)

# Clone repository
git clone https://github.com/hupe1980/rustberg
cd rustberg

# Install with Helm
helm install rustberg charts/rustberg \
  --set rustberg.storage.type=s3 \
  --set rustberg.storage.s3.bucket=my-catalog-bucket

# Or with custom values
helm install rustberg charts/rustberg -f my-values.yaml

Option 4: Build from Source

Requires Rust 1.89+ (install)

# Clone and build
git clone https://github.com/hupe1980/rustberg
cd rustberg
cargo build --release --all-features

# Generate TLS certificate (development)
./target/release/rustberg generate-cert

# Start server
./target/release/rustberg --tls-cert server.crt --tls-key server.key

Features

Core Iceberg API

  • βœ… Namespace CRUD - Create, list, update, delete namespaces
  • βœ… Table CRUD - Full table lifecycle management with optional data purge
  • βœ… Table Commits - Optimistic concurrency with version-based CAS (409 on conflict)
  • βœ… Atomic Rename - Crash-safe table rename via WriteBatch
  • βœ… Register Table - Import existing tables from metadata location
  • βœ… Multi-table Transactions - Atomic commit with WriteBatch
  • βœ… Metrics Reporting - Client telemetry collection
  • βœ… Credential Vending - AWS STS + GCS + Azure temporary credentials
  • βœ… Pagination - Cursor-based with configurable page size
  • βœ… Views - Full CRUD with persistent storage (SlateDB)
  • βœ… Idempotency - Request deduplication with persistent storage (SlateDB)

Security

  • βœ… API Key Authentication - Argon2id hashed with in-memory caching (moka)
  • βœ… JWT/OIDC Authentication - JWKS validation with auto-purge on rotation, configurable claims
  • βœ… Cedar Policy Authorization - Fine-grained ABAC beyond simple RBAC
  • βœ… Multi-Tenancy - Hard isolation between tenants
  • βœ… Rate Limiting - Token bucket per IP/tenant with non-consuming header peek
  • βœ… Encryption at Rest - AES-256-GCM with envelope encryption + AWS KMS/Vault/Azure KV
  • βœ… TLS/HTTPS - TLS 1.2/1.3 via rustls
  • βœ… Secret Redaction - Sensitive data redacted in debug output and logs
  • βœ… Security Headers - CSP, X-Frame-Options, X-Content-Type-Options
  • βœ… CORS Support - Configurable cross-origin resource sharing
  • βœ… Audit Logging - Structured JSON for SIEM
  • βœ… Idempotency Guard - RAII-based in-flight deduplication with auto-cleanup

Operations

  • βœ… Health Checks - /health and /ready endpoints with storage backend verification
  • βœ… Metrics - Prometheus-compatible /metrics with KMS operations, cache stats, and 30+ counters
  • βœ… Request Tracing - X-Request-Id propagation for distributed tracing
  • βœ… Response Compression - Gzip/deflate/brotli automatic compression
  • βœ… Graceful Shutdown - SIGTERM handling with connection drain
  • βœ… Backup/Restore - CLI commands for disaster recovery
  • βœ… TOML Configuration - File-based config with env override

Security

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         SECURITY LAYERS                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  TLS 1.2/1.3 (rustls)                           Transport Security  β”‚
β”‚  β”œβ”€β”€ Rate Limiting (token bucket)               DoS Protection      β”‚
β”‚  β”œβ”€β”€ Request Size Limits (10MB)                 Resource Protection β”‚
β”‚  β”œβ”€β”€ Request Timeouts (30s)                     Hang Protection     β”‚
β”‚  β”œβ”€β”€ Security Headers (CSP, X-Frame-Options)    Browser Security    β”‚
β”‚  β”œβ”€β”€ X-Request-Id Tracing                       Distributed Tracing β”‚
β”‚  β”œβ”€β”€ CORS Middleware                            Cross-Origin Policy β”‚
β”‚  β”œβ”€β”€ API Key / JWT Authentication               Identity            β”‚
β”‚  β”œβ”€β”€ Cedar Policy Authorization                 Access Control      β”‚
β”‚  β”œβ”€β”€ Input Validation                           Injection Defense   β”‚
β”‚  β”œβ”€β”€ Audit Logging                              Forensics           β”‚
β”‚  └── AES-256-GCM Encryption                     Data at Rest        β”‚
β”‚      └── KMS (env/AWS/Vault) + Circuit Breaker  Key Management      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Authentication

# Generate an API key
rustberg generate-key --name admin --roles admin,writer

# Use the key
curl -H "X-API-Key: rb_abc123..." https://localhost:8000/v1/namespaces

Authorization (Cedar Policies)

// Allow readers to list namespaces
permit(
  principal,
  action == Action::"ListNamespaces",
  resource
) when {
  principal.roles.contains("reader")
};

// Deny cross-tenant access
forbid(
  principal,
  action,
  resource
) when {
  principal.tenant_id != resource.tenant_id
};

API

Endpoints

Method Path Description
GET /health Liveness check
GET /ready Readiness check with dependencies
GET /metrics Prometheus metrics
GET /v1/config Catalog configuration
GET /v1/namespaces List namespaces
POST /v1/namespaces Create namespace
GET /v1/namespaces/{ns} Get namespace
POST /v1/namespaces/{ns} Update namespace
DELETE /v1/namespaces/{ns} Delete namespace
GET /v1/namespaces/{ns}/tables List tables
POST /v1/namespaces/{ns}/tables Create table
POST /v1/namespaces/{ns}/register Register existing table
GET /v1/namespaces/{ns}/tables/{table} Load table
DELETE /v1/namespaces/{ns}/tables/{table} Drop table
POST /v1/namespaces/{ns}/tables/{table} Commit table update
HEAD /v1/namespaces/{ns}/tables/{table} Check table exists
POST /v1/namespaces/{ns}/tables/{table}/metrics Report metrics
POST /v1/tables/rename Rename table
POST /v1/transactions/commit Multi-table transaction

Example: Create a Table

curl -X POST https://localhost:8000/v1/namespaces/my_ns/tables \
  -H "Content-Type: application/json" \
  -H "X-API-Key: rb_abc123..." \
  -d '{
    "name": "my_table",
    "schema": {
      "type": "struct",
      "fields": [
        {"id": 1, "name": "id", "type": "long", "required": true},
        {"id": 2, "name": "data", "type": "string", "required": false}
      ]
    }
  }'

Configuration

TOML Configuration File

# rustberg.toml

[server]
host = "0.0.0.0"
port = 8000

[server.auth]
api_key_enabled = true
jwt_enabled = false

[tls]
enabled = true
cert_path = "/etc/rustberg/tls/cert.pem"
key_path = "/etc/rustberg/tls/key.pem"

[storage]
# Single-node (local storage)
backend = "file:///var/lib/rustberg/data"

# K8s HA (S3-compatible)
# backend = "s3://rustberg-bucket/catalog?region=us-east-1"

[kms]
provider = "env"  # or "aws-kms", "vault"
cache_ttl_seconds = 300
circuit_breaker_enabled = true

[rate_limit]
enabled = true
requests_per_second = 100
burst_size = 200

[logging]
level = "info"
json_format = false

Environment Variables

Variable Default Description
RUSTBERG_HOST 0.0.0.0 Bind address
RUSTBERG_PORT 8000 Bind port
RUSTBERG_WAREHOUSE - Warehouse location
RUSTBERG_TENANT_ID default Default tenant
RUSTBERG_NO_AUTH false Disable authentication (dev only)
RUSTBERG_TLS_CERT - TLS certificate path
RUSTBERG_TLS_KEY - TLS key path
RUSTBERG_INSECURE_HTTP false Allow HTTP (dev only)
RUSTBERG_MASTER_KEY - Encryption master key (hex)
RUST_LOG info Log level

Deployment

Production Checklist

  • TLS enabled with valid certificates
  • Authentication enabled (default - ensure RUSTBERG_NO_AUTH is NOT set)
  • Master key stored securely (KMS recommended)
  • Rate limiting configured appropriately
  • Audit logging to persistent storage
  • Health checks configured in orchestrator
  • Backup schedule established

Kubernetes

Rustberg supports both single-node (with PVC) and highly-available (with S3) deployments:

Single-Node (PVC Storage)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rustberg
spec:
  replicas: 1  # Single node only with file:// storage
  template:
    spec:
      containers:
        - name: rustberg
          image: ghcr.io/hupe1980/rustberg:latest
          ports:
            - containerPort: 8000
          env:
            - name: STORAGE_BACKEND
              value: "file:///var/lib/rustberg/data"
            - name: RUSTBERG_MASTER_KEY
              valueFrom:
                secretKeyRef:
                  name: rustberg-secrets
                  key: master-key
          volumeMounts:
            - name: data
              mountPath: /var/lib/rustberg/data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: rustberg-data

High-Availability (S3/GCS/MinIO)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rustberg
spec:
  replicas: 3  # Multiple replicas with shared S3 storage
  template:
    spec:
      containers:
        - name: rustberg
          image: ghcr.io/hupe1980/rustberg:latest
          ports:
            - containerPort: 8000
          env:
            - name: STORAGE_BACKEND
              value: "s3://rustberg-bucket/catalog?region=us-east-1"
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  name: rustberg-secrets
                  key: aws-access-key
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: rustberg-secrets
                  key: aws-secret-key
            - name: RUSTBERG_MASTER_KEY
              valueFrom:
                secretKeyRef:
                  name: rustberg-secrets
                  key: master-key
          livenessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 5
          readinessProbe:
            httpGet:
              path: /ready
              port: 8000
            initialDelaySeconds: 5

Backup & Restore

# Create backup
rustberg backup --output /backup/rustberg-$(date +%Y%m%d).tar.gz

# Validate backup
rustberg validate-backup --input /backup/rustberg-20260123.tar.gz

# Restore (stops server first!)
rustberg restore --input /backup/rustberg-20260123.tar.gz --force

CLI Reference

# Start server
rustberg [OPTIONS]

# Generate API key
rustberg generate-key --name <NAME> --tenant <TENANT> --roles <ROLES>

# Generate TLS certificate (development)
rustberg generate-cert --common-name localhost --output-dir /tmp/tls

# Generate sample configuration file
rustberg generate-config --output config.toml

# Generate OpenAPI specification
rustberg open-api --format yaml --output openapi.yaml

# Backup catalog
rustberg backup --output <FILE> --data-dir <DIR>

# Restore catalog
rustberg restore --input <FILE> --data-dir <DIR> [--force]

# Validate backup
rustberg validate-backup --input <FILE>

# Show status
rustberg status --data-dir <DIR>

# Run performance benchmark
rustberg benchmark --iterations 10

Engine Compatibility

Engine Read Write Notes
PyIceberg βœ… βœ… Full support
Trino βœ… βœ… Full support
DuckDB βœ… - Read-only

Development

# Run tests
cargo test --all-features

# Run with debug logging
RUST_LOG=debug cargo run --all-features -- --insecure-http

# Format code
cargo fmt

# Lint
cargo clippy --all-features

License

Apache License 2.0