hexz-cli 0.4.5

CLI tool for managing Hexz snapshots and datasets
Documentation

hexz-cli

Command-line tool for managing Hexz snapshots, datasets, and virtual machines.

Overview

The hexz CLI provides a comprehensive interface for creating, analyzing, and managing Hexz snapshots. It supports dataset packing for AI/ML workflows, VM snapshot management, and diagnostic tools for inspecting snapshot internals.

This is the primary tool for developers and data engineers working with the Hexz format.

Installation

From Source

# Clone the repository
git clone https://github.com/Alethic-Systems/hexz.git
cd hexz

# Install the CLI
make install

# Or install directly with cargo
cargo install --path crates/cli

After installation, the hexz command will be available in your PATH.

Quick Examples

Pack a Dataset

Convert raw files into a compressed, deduplicated Hexz snapshot:

# Pack a directory of images for ML training
hexz data pack --disk ./raw_images --output dataset.hxz --cdc

# Pack with custom compression and encryption
hexz data pack \
  --disk ./data \
  --output encrypted.hxz \
  --compression zstd \
  --encrypt

Inspect a Snapshot

# Show snapshot metadata
hexz data info dataset.hxz

# Get JSON output for programmatic access
hexz data info dataset.hxz --json

Boot a VM from Snapshot

# Boot a VM with 4GB RAM (requires FUSE feature)
hexz vm boot ubuntu-22.04.hxz --ram 4G

# Boot without KVM acceleration
hexz vm boot snapshot.hxz --ram 2G --no-kvm

Command Reference

The CLI is organized into three main command groups:

Data Commands (hexz data)

Work with datasets and snapshots for ML/AI workflows:

Command Description
pack Create a snapshot from raw files/directories
build Build a snapshot with specific profiles (optimized for different workloads)
info Display snapshot metadata and statistics
diff Compare two snapshots (diagnostics feature)
analyze Analyze snapshot structure and compression efficiency (diagnostics feature)

Example:

# Pack with deduplication
hexz data pack --disk ./images --output train.hxz --cdc

# View detailed info
hexz data info train.hxz --json

VM Commands (hexz vm)

Manage virtual machines using Hexz snapshots (requires fuse feature):

Command Description
boot Boot a VM from a snapshot
install Install an OS from ISO to create a new snapshot
snapshot Capture running VM state (disk + memory)
mount Mount a snapshot as a FUSE filesystem

Example:

# Install Ubuntu from ISO
hexz vm install ubuntu-22.04.iso --disk-size 20G --output ubuntu.hxz

# Boot the installed system
hexz vm boot ubuntu.hxz --ram 4G

# Snapshot a running VM
hexz vm snapshot --socket /tmp/vm.sock --output checkpoint.hxz

System Commands (hexz sys)

Server and infrastructure operations:

Command Description
serve Start an HTTP server for streaming snapshots

Example:

# Serve snapshots over HTTP
hexz sys serve --port 8080 --path /snapshots

Common Options

Compression

Choose compression algorithm with --compression:

  • lz4 - Fast compression (~2GB/s), lower ratio (default)
  • zstd - Better compression (~500MB/s), higher ratio
hexz data pack --disk ./data --output data.hxz --compression zstd

Content-Defined Chunking (CDC)

Enable deduplication with --cdc:

# Use default CDC settings (FastCDC)
hexz data pack --disk ./data --output data.hxz --cdc

# Custom chunk sizes
hexz data pack \
  --disk ./data \
  --output data.hxz \
  --cdc \
  --min-chunk 16384 \
  --avg-chunk 65536 \
  --max-chunk 262144

Encryption

Encrypt snapshots with --encrypt:

# You'll be prompted for a password
hexz data pack --disk ./data --output secure.hxz --encrypt

Architecture

hexz-cli/
├── src/
│   ├── main.rs        # Entry point
│   ├── args.rs        # CLI argument parsing (clap)
│   ├── cmd/           # Command implementations
│   │   ├── data/      # Dataset commands (pack, info, etc.)
│   │   ├── vm/        # VM commands (boot, snapshot, etc.)
│   │   └── sys/       # System commands (serve)
│   └── ui/            # User interface (progress bars, formatters)
└── benches/           # Performance benchmarks
    ├── macro/         # Macro benchmarks (end-to-end)
    ├── micro/         # Micro benchmarks (component-level)
    └── ai/            # AI/ML-focused benchmarks

Development

All development commands use the project Makefile from the repository root.

Building

# Build CLI (release mode)
make rust

# Build and install locally
make install

# Run CLI directly
make run info dataset.hxz

Testing

# Run all tests
make test

# Run only Rust tests
make test-rust

# Run tests with filter
make test-rust pack

Benchmarks

The CLI crate includes comprehensive benchmarks:

# Run all benchmarks
make bench

# Run specific benchmark category
make bench ai           # AI/ML workload benchmarks
make bench cache        # Cache performance
make bench http         # HTTP streaming

# Compare against baseline
make bench-compare baseline-v1

Benchmark categories:

  • macro: End-to-end workflows (read throughput, sparse access, concurrency)
  • micro: Component-level (cache, decompression, API comparison)
  • ai: ML-specific (dataloader, shuffle, prefetch, multi-worker)

Linting & Formatting

# Format all code
make fmt

# Check formatting + clippy
make lint

# Run clippy with strict lints
make clippy

See make help for all available commands.

Features

The CLI supports compile-time feature flags:

  • default: ["fuse", "server", "compression-zstd", "encryption", "diagnostics", "signing"]
  • fuse: VM mounting and FUSE filesystem support
  • server: HTTP server for snapshot streaming
  • compression-zstd: Zstandard compression
  • encryption: AES-256-GCM encryption
  • diagnostics: Advanced analysis commands (diff, analyze)
  • signing: Cryptographic signing for snapshots
  • firecracker: Firecracker microVM support (experimental)

Build without optional features:

# Minimal build (no VM support)
cargo build -p hexz --no-default-features --features encryption,compression-zstd

Performance

The CLI is optimized for high-throughput operations:

  • Pack throughput: ~2GB/s (LZ4), ~500MB/s (Zstd)
  • Deduplication: FastCDC with parallel processing
  • Progress tracking: Real-time progress bars with indicatif
  • Zero-copy: Direct memory mapping where possible

See Also