# hexz-cli
Command-line tool for managing Hexz snapshots, datasets, and virtual machines.
## Overview
The `hexz` CLI provides a comprehensive interface for creating, analyzing, and managing Hexz snapshots. It supports dataset packing for AI/ML workflows, VM snapshot management, and diagnostic tools for inspecting snapshot internals.
This is the primary tool for developers and data engineers working with the Hexz format.
## Installation
### From Source
```bash
# Clone the repository
git clone https://github.com/Alethic-Systems/hexz.git
cd hexz
# Install the CLI
make install
# Or install directly with cargo
cargo install --path crates/cli
```
After installation, the `hexz` command will be available in your PATH.
## Quick Examples
### Pack a Dataset
Convert raw files into a compressed, deduplicated Hexz snapshot:
```bash
# Pack a directory of images for ML training
hexz data pack --disk ./raw_images --output dataset.hxz --cdc
# Pack with custom compression and encryption
hexz data pack \
--disk ./data \
--output encrypted.hxz \
--compression zstd \
--encrypt
```
### Inspect a Snapshot
```bash
# Show snapshot metadata
hexz data info dataset.hxz
# Get JSON output for programmatic access
hexz data info dataset.hxz --json
```
### Boot a VM from Snapshot
```bash
# Boot a VM with 4GB RAM (requires FUSE feature)
hexz vm boot ubuntu-22.04.hxz --ram 4G
# Boot without KVM acceleration
hexz vm boot snapshot.hxz --ram 2G --no-kvm
```
## Command Reference
The CLI is organized into three main command groups:
### Data Commands (`hexz data`)
Work with datasets and snapshots for ML/AI workflows:
| `pack` | Create a snapshot from raw files/directories |
| `build` | Build a snapshot with specific profiles (optimized for different workloads) |
| `info` | Display snapshot metadata and statistics |
| `diff` | Compare two snapshots (diagnostics feature) |
| `analyze` | Analyze snapshot structure and compression efficiency (diagnostics feature) |
**Example:**
```bash
# Pack with deduplication
hexz data pack --disk ./images --output train.hxz --cdc
# View detailed info
hexz data info train.hxz --json
```
### VM Commands (`hexz vm`)
Manage virtual machines using Hexz snapshots (requires `fuse` feature):
| `boot` | Boot a VM from a snapshot |
| `install` | Install an OS from ISO to create a new snapshot |
| `snapshot` | Capture running VM state (disk + memory) |
| `mount` | Mount a snapshot as a FUSE filesystem |
**Example:**
```bash
# Install Ubuntu from ISO
hexz vm install ubuntu-22.04.iso --disk-size 20G --output ubuntu.hxz
# Boot the installed system
hexz vm boot ubuntu.hxz --ram 4G
# Snapshot a running VM
hexz vm snapshot --socket /tmp/vm.sock --output checkpoint.hxz
```
### System Commands (`hexz sys`)
Server and infrastructure operations:
| `serve` | Start an HTTP server for streaming snapshots |
**Example:**
```bash
# Serve snapshots over HTTP
hexz sys serve --port 8080 --path /snapshots
```
## Common Options
### Compression
Choose compression algorithm with `--compression`:
- `lz4` - Fast compression (~2GB/s), lower ratio (default)
- `zstd` - Better compression (~500MB/s), higher ratio
```bash
hexz data pack --disk ./data --output data.hxz --compression zstd
```
### Content-Defined Chunking (CDC)
Enable deduplication with `--cdc`:
```bash
# Use default CDC settings (FastCDC)
hexz data pack --disk ./data --output data.hxz --cdc
# Custom chunk sizes
hexz data pack \
--disk ./data \
--output data.hxz \
--cdc \
--min-chunk 16384 \
--avg-chunk 65536 \
--max-chunk 262144
```
### Encryption
Encrypt snapshots with `--encrypt`:
```bash
# You'll be prompted for a password
hexz data pack --disk ./data --output secure.hxz --encrypt
```
## Architecture
```
hexz-cli/
├── src/
│ ├── main.rs # Entry point
│ ├── args.rs # CLI argument parsing (clap)
│ ├── cmd/ # Command implementations
│ │ ├── data/ # Dataset commands (pack, info, etc.)
│ │ ├── vm/ # VM commands (boot, snapshot, etc.)
│ │ └── sys/ # System commands (serve)
│ └── ui/ # User interface (progress bars, formatters)
└── benches/ # Performance benchmarks
├── macro/ # Macro benchmarks (end-to-end)
├── micro/ # Micro benchmarks (component-level)
└── ai/ # AI/ML-focused benchmarks
```
## Development
All development commands use the project Makefile from the repository root.
### Building
```bash
# Build CLI (release mode)
make rust
# Build and install locally
make install
# Run CLI directly
make run info dataset.hxz
```
### Testing
```bash
# Run all tests
make test
# Run only Rust tests
make test-rust
# Run tests with filter
make test-rust pack
```
### Benchmarks
The CLI crate includes comprehensive benchmarks:
```bash
# Run all benchmarks
make bench
# Run specific benchmark category
make bench ai # AI/ML workload benchmarks
make bench cache # Cache performance
make bench http # HTTP streaming
# Compare against baseline
make bench-compare baseline-v1
```
Benchmark categories:
- **macro**: End-to-end workflows (read throughput, sparse access, concurrency)
- **micro**: Component-level (cache, decompression, API comparison)
- **ai**: ML-specific (dataloader, shuffle, prefetch, multi-worker)
### Linting & Formatting
```bash
# Format all code
make fmt
# Check formatting + clippy
make lint
# Run clippy with strict lints
make clippy
```
See `make help` for all available commands.
## Features
The CLI supports compile-time feature flags:
- `default`: `["fuse", "server", "compression-zstd", "encryption", "diagnostics", "signing"]`
- `fuse`: VM mounting and FUSE filesystem support
- `server`: HTTP server for snapshot streaming
- `compression-zstd`: Zstandard compression
- `encryption`: AES-256-GCM encryption
- `diagnostics`: Advanced analysis commands (diff, analyze)
- `signing`: Cryptographic signing for snapshots
- `firecracker`: Firecracker microVM support (experimental)
Build without optional features:
```bash
# Minimal build (no VM support)
cargo build -p hexz --no-default-features --features encryption,compression-zstd
```
## Performance
The CLI is optimized for high-throughput operations:
- **Pack throughput**: ~2GB/s (LZ4), ~500MB/s (Zstd)
- **Deduplication**: FastCDC with parallel processing
- **Progress tracking**: Real-time progress bars with `indicatif`
- **Zero-copy**: Direct memory mapping where possible
## See Also
- **[User Documentation](../../docs/)** - Tutorials and how-to guides
- **[CLI Reference](../../docs/reference/cli-reference.md)** - Complete command documentation
- **[hexz-core](../core/)** - Core engine library
- **[Python Bindings](../loader/)** - PyTorch integration
- **[Project README](../../README.md)** - Main project overview