Table of Contents
- Overview
- Architecture
- Installation
- Usage
- CLI
- Features
- Benchmarks
- Testing
- Security
- Contributing
- License
Overview
Pacha is a unified registry for machine learning artifacts -- models, datasets, and training recipes -- with full lineage tracking, semantic versioning, and cryptographic integrity verification. It provides content-addressed storage with BLAKE3 hashing for deduplication, tamper detection, and efficient delta storage.
Key Capabilities
- Model Registry - Register, version, and stage ML models with metadata and metrics
- Data Registry - Track datasets with schema validation and provenance
- Recipe Registry - Store training configurations with hyperparameters and environment specs
- Lineage Tracking - Full dependency graph from data to deployed model
- Content-Addressed Storage - BLAKE3-based deduplication and integrity verification
- Cryptographic Signing - Ed25519 signatures for artifact authenticity
- Experiment Tracking - Record training runs with metrics, parameters, and artifacts
Architecture
+------------------+ +------------------+ +------------------+
| Model Registry | | Data Registry | | Recipe Registry |
| (.apr files) | | (.ald files) | | (TOML configs) |
+--------+---------+ +--------+---------+ +--------+---------+
| | |
+------------------------+------------------------+
|
+-----------+-----------+
| Content-Addressed |
| Storage (BLAKE3) |
+-----------+-----------+
|
+-----------+-----------+
| SQLite Metadata DB |
| (~/.pacha/registry) |
+-----------------------+
Installation
Add to your Cargo.toml:
[]
= "0.2.5"
Or install the CLI:
Usage
use *;
Data Registry
use *;
// Register a dataset with schema
let schema = new;
registry.register_data?;
Experiment Tracking
use *;
let experiment = builder
.name
.model
.dataset
.hyperparams
.build;
registry.log_experiment?;
CLI
# Initialize a registry
# Model operations
# Data operations
# Registry statistics
Features
| Feature | Description | Default |
|---|---|---|
compression |
Zstd compression for stored artifacts | Yes |
cli |
Command-line interface | Yes |
signing |
Ed25519 cryptographic signing | Yes |
encryption |
ChaCha20-Poly1305 encryption at rest | No |
remote |
HTTP remote registry support | No |
lineage-graph |
Graph-based lineage visualization | No |
aprender-integration |
Integration with aprender ML library | No |
alimentar-integration |
Integration with alimentar data library | No |
Enable all features:
[]
= { = "0.2.5", = ["full"] }
Benchmarks
Run benchmarks with:
Content-addressing operations (BLAKE3 hashing, storage, retrieval) are benchmarked
using Criterion. See benches/content_address.rs for benchmark definitions.
Testing
468 tests passing with zero warnings.
# Unit tests
# All tests (unit + integration)
# All features
# With nextest (faster)
# Quality gates
# Coverage
# Mutation testing
Recent Fixes (v0.2.5)
- Non-atomic manifest write fixed: uses temp file + rename for crash safety
find_best_runhandles empty input gracefully instead of panicking
Security
- Cryptographic Integrity: All artifacts are content-addressed with BLAKE3
- Ed25519 Signing: Optional artifact signing for authenticity verification
- Encryption at Rest: Optional ChaCha20-Poly1305 encryption
- Dependency Auditing:
cargo-denyandcargo-auditin CI pipeline - No Unsafe Code:
#![deny(unsafe_code)]enforced project-wide
To report a security vulnerability, please email security@paiml.com.
Contributing
Contributions welcome! Please follow the PAIML quality standards:
- Fork the repository
- Ensure all tests pass:
cargo test - Run quality checks:
cargo clippy -- -D warnings && cargo fmt --check - Submit a pull request
MSRV
Minimum Supported Rust Version: 1.75
See Also
- Cookbook — 7 runnable examples
License
MIT - see LICENSE for details.
Part of the Aprender monorepo — 70 workspace crates.