# Case Study: Model Serialization (.apr Format)
Save and load ML models with built-in quality: checksums, signatures, encryption, WASM compatibility.
## Quick Start
```rust
use aprender::format::{save, load, ModelType, SaveOptions};
use aprender::linear_model::LinearRegression;
// Train model
let mut model = LinearRegression::new();
model.fit(&x, &y)?;
// Save
save(&model, ModelType::LinearRegression, "model.apr", SaveOptions::default())?;
// Load
let loaded: LinearRegression = load("model.apr", ModelType::LinearRegression)?;
```
## WASM Compatibility (Hard Requirement)
The `.apr` format is designed for **universal deployment**. Every feature works in:
- Native (Linux, macOS, Windows)
- WASM (browsers, Cloudflare Workers, Vercel Edge)
- Embedded (no_std with alloc)
```rust
// Same model works everywhere
#[cfg(target_arch = "wasm32")]
async fn load_in_browser() -> Result<LinearRegression> {
let bytes = fetch("https://models.example.com/house-prices.apr").await?;
load_from_bytes(&bytes, ModelType::LinearRegression)
}
#[cfg(not(target_arch = "wasm32"))]
fn load_native() -> Result<LinearRegression> {
load("house-prices.apr", ModelType::LinearRegression)
}
```
**Why this matters:**
- Train once, deploy anywhere
- Browser-based ML demos
- Edge inference (low latency)
- Serverless functions
## Format Structure
```text
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Header (32 bytes, fixed) โ โ Magic, version, type, sizes
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Metadata (variable, MessagePack) โ โ Hyperparameters, metrics
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Salt + Nonce (if ENCRYPTED) โ โ Security parameters
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Payload (variable, compressed) โ โ Model weights (bincode)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Signature (if SIGNED) โ โ Ed25519 signature
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ License (if LICENSED) โ โ Commercial protection
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Checksum (4 bytes, CRC32) โ โ Integrity verification
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
## Built-in Quality (Jidoka)
### CRC32 Checksum
Every `.apr` file has a CRC32 checksum. Corruption is detected immediately:
```rust
// Automatic verification on load
let model: LinearRegression = load("model.apr", ModelType::LinearRegression)?;
// If checksum fails: AprenderError::ChecksumMismatch { expected, actual }
```
### Type Safety
Model type is encoded in header. Loading wrong type fails fast:
```rust
// Saved as LinearRegression
save(&lr_model, ModelType::LinearRegression, "lr.apr", opts)?;
// Attempt to load as KMeans - fails immediately
let result: Result<KMeans> = load("lr.apr", ModelType::KMeans);
// Error: "Model type mismatch: file contains LinearRegression, expected KMeans"
```
## Metadata
Store hyperparameters, metrics, and custom data:
```rust
let options = SaveOptions::default()
.with_name("house-price-predictor")
.with_description("Trained on Boston Housing dataset");
// Add hyperparameters
options.metadata.hyperparameters.insert(
"learning_rate".to_string(),
serde_json::json!(0.01)
);
// Add metrics
options.metadata.metrics.insert(
"r2_score".to_string(),
serde_json::json!(0.95)
);
save(&model, ModelType::LinearRegression, "model.apr", options)?;
```
## Inspection Without Loading
Check model info without deserializing weights:
```rust
use aprender::format::inspect;
let info = inspect("model.apr")?;
println!("Model type: {:?}", info.model_type);
println!("Format version: {}.{}", info.format_version.0, info.format_version.1);
println!("Payload size: {} bytes", info.payload_size);
println!("Created: {}", info.metadata.created_at);
println!("Encrypted: {}", info.encrypted);
println!("Signed: {}", info.signed);
```
## Model Types
| 0x0001 | LinearRegression | Regression |
| 0x0002 | LogisticRegression | Binary classification |
| 0x0003 | DecisionTree | Interpretable classification |
| 0x0004 | RandomForest | Ensemble classification |
| 0x0005 | GradientBoosting | High-performance ensemble |
| 0x0006 | KMeans | Clustering |
| 0x0007 | Pca | Dimensionality reduction |
| 0x0008 | NaiveBayes | Probabilistic classification |
| 0x0009 | Knn | Distance-based classification |
| 0x000A | Svm | Support vector machine |
| 0x0010 | NgramLm | Language modeling |
| 0x0011 | TfIdf | Text vectorization |
| 0x0012 | CountVectorizer | Bag of words |
| 0x0020 | NeuralSequential | Deep learning |
| 0x0021 | NeuralCustom | Custom architectures |
| 0x0030 | ContentRecommender | Recommendations |
| 0x0040 | MixtureOfExperts | Sparse/dense MoE ensembles |
| 0x00FF | Custom | User-defined |
## Encryption (Feature: `format-encryption`)
### Password-Based (Personal/Team)
```rust
use aprender::format::{save_encrypted, load_encrypted};
// Save with password (Argon2id + AES-256-GCM)
save_encrypted(&model, ModelType::LinearRegression, "secure.apr",
SaveOptions::default(), "my-strong-password")?;
// Load with password
let model: LinearRegression = load_encrypted("secure.apr",
ModelType::LinearRegression, "my-strong-password")?;
```
**Security properties:**
- Argon2id: Memory-hard, GPU-resistant key derivation
- AES-256-GCM: Authenticated encryption (detects tampering)
- Random salt: Same password produces different ciphertexts
### Recipient-Based (Commercial Distribution)
```rust
use aprender::format::{save_for_recipient, load_as_recipient};
use x25519_dalek::{PublicKey, StaticSecret};
// Generate buyer's keypair (done once by buyer)
let buyer_secret = StaticSecret::random_from_rng(&mut rng);
let buyer_public = PublicKey::from(&buyer_secret);
// Seller encrypts for buyer's public key (no password sharing!)
save_for_recipient(&model, ModelType::LinearRegression, "commercial.apr",
SaveOptions::default(), &buyer_public)?;
// Only buyer's secret key can decrypt
let model: LinearRegression = load_as_recipient("commercial.apr",
ModelType::LinearRegression, &buyer_secret)?;
```
**Benefits:**
- No password sharing required
- Cryptographically bound to buyer (non-transferable)
- Forward secrecy via ephemeral sender keys
- Perfect for model marketplaces
## Digital Signatures (Feature: `format-signing`)
Verify model provenance:
```rust
use aprender::format::{save_signed, load_verified};
use ed25519_dalek::{SigningKey, VerifyingKey};
// Generate seller's keypair (done once)
let signing_key = SigningKey::generate(&mut rng);
let verifying_key = VerifyingKey::from(&signing_key);
// Sign model with private key
save_signed(&model, ModelType::LinearRegression, "signed.apr",
SaveOptions::default(), &signing_key)?;
// Verify signature before loading (reject tampering)
let model: LinearRegression = load_verified("signed.apr",
ModelType::LinearRegression, Some(&verifying_key))?;
```
**Use cases:**
- Model marketplaces (verify seller identity)
- Compliance (audit trail)
- Supply chain security
## Compression (Feature: `format-compression`)
```rust
use aprender::format::{Compression, SaveOptions};
let options = SaveOptions::default()
.with_compression(Compression::ZstdDefault); // Level 3, good balance
// Or maximum compression for archival
let archival = SaveOptions::default()
.with_compression(Compression::ZstdMax); // Level 19
```
| None | 1:1 | Instant | Debugging |
| ZstdDefault | ~3:1 | Fast | Distribution |
| ZstdMax | ~4:1 | Slow | Archival |
| LZ4 | ~2:1 | Very fast | Streaming |
## WASM Loading Patterns
### Browser (Fetch API)
```rust
#[cfg(target_arch = "wasm32")]
pub async fn load_from_url<M: DeserializeOwned>(
url: &str,
model_type: ModelType,
) -> Result<M> {
let response = fetch(url).await?;
let bytes = response.bytes().await?;
load_from_bytes(&bytes, model_type)
}
// Usage
let model = load_from_url::<LinearRegression>(
"https://models.example.com/house-prices.apr",
ModelType::LinearRegression
).await?;
```
### IndexedDB Cache
```rust
#[cfg(target_arch = "wasm32")]
pub async fn load_cached<M: DeserializeOwned>(
cache_key: &str,
url: &str,
model_type: ModelType,
) -> Result<M> {
// Try cache first
if let Some(bytes) = idb_get(cache_key).await? {
return load_from_bytes(&bytes, model_type);
}
// Fetch and cache
let bytes = fetch(url).await?.bytes().await?;
idb_set(cache_key, &bytes).await?;
load_from_bytes(&bytes, model_type)
}
```
### Graceful Degradation
Some features are native-only (STREAMING, TRUENO_NATIVE). In WASM, they're silently ignored:
```rust
// This works in both native and WASM
let options = SaveOptions::default()
.with_compression(Compression::ZstdDefault) // Works everywhere
.with_streaming(true); // Ignored in WASM, no error
// WASM: loads via in-memory path
// Native: uses mmap for large models
let model: LinearRegression = load("model.apr", ModelType::LinearRegression)?;
```
## Ecosystem Integration
The `.apr` format coordinates with alimentar's `.ald` dataset format:
```text
Training Pipeline (Native):
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ dataset.ald โ โ โ aprender โ โ โ model.apr โ
โ (alimentar) โ โ training โ โ (aprender) โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
Inference Pipeline (WASM):
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Fetch .apr โ โ โ aprender โ โ โ Prediction โ
โ from CDN โ โ inference โ โ in browser โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
```
**Shared properties:**
- Same crypto stack (aes-gcm, ed25519-dalek, x25519-dalek)
- Same WASM compatibility requirements
- Same Toyota Way principles (Jidoka, checksums, signatures)
## Private Inference (HIPAA/GDPR)
For sensitive data, use bidirectional encryption:
```rust
// Model publishes public key in metadata
let info = inspect("medical-model.apr")?;
let model_pub_key = info.metadata.custom.get("inference_pub_key");
// User encrypts input with model's public key
let encrypted_input = encrypt_for_model(&patient_data, model_pub_key)?;
// Send encrypted_input to model owner
// Model owner decrypts, runs inference, encrypts response with user's public key
// Only user can decrypt the prediction
```
**Use cases:**
- HIPAA-compliant medical inference
- GDPR-compliant EU data processing
- Financial data analysis
- Zero-trust ML APIs
## Toyota Way Principles
| **Jidoka** | CRC32 checksum stops on corruption |
| **Jidoka** | Type verification stops on mismatch |
| **Jidoka** | Signature verification stops on tampering |
| **Jidoka** | Decryption fails on wrong key (authenticated) |
| **Genchi Genbutsu** | `inspect()` to see actual file contents |
| **Kaizen** | Semantic versioning for format evolution |
| **Heijunka** | Graceful degradation (WASM ignores native-only flags) |
## Error Handling
```rust
use aprender::error::AprenderError;
match load::<LinearRegression>("model.apr", ModelType::LinearRegression) {
Ok(model) => { /* use model */ },
Err(AprenderError::ChecksumMismatch { expected, actual }) => {
eprintln!("File corrupted: expected {:08X}, got {:08X}", expected, actual);
},
Err(AprenderError::ModelTypeMismatch { expected, found }) => {
eprintln!("Wrong model type: expected {:?}, found {:?}", expected, found);
},
Err(AprenderError::SignatureInvalid) => {
eprintln!("Signature verification failed - model may be tampered");
},
Err(AprenderError::DecryptionFailed) => {
eprintln!("Decryption failed - wrong password or key");
},
Err(AprenderError::UnsupportedVersion { found, supported }) => {
eprintln!("Version {}.{} not supported (max {}.{})",
found.0, found.1, supported.0, supported.1);
},
Err(e) => eprintln!("Error: {}", e),
}
```
## Feature Flags
| (core) | bincode, rmp-serde | ~60KB | โ |
| `format-compression` | zstd | +250KB | โ |
| `format-signing` | ed25519-dalek | +150KB | โ |
| `format-encryption` | aes-gcm, argon2, x25519-dalek, hkdf, sha2 | +180KB | โ |
```toml
# Cargo.toml
[dependencies]
aprender = { version = "0.9", features = ["format-encryption", "format-signing"] }
```
## Single Binary Deployment
The `.apr` format's killer feature: embed models directly in your executable.
### The Pattern
```rust
// Embed model at compile time - zero runtime dependencies
const MODEL: &[u8] = include_bytes!("sentiment.apr");
fn main() -> Result<()> {
let model: LogisticRegression = load_from_bytes(MODEL, ModelType::LogisticRegression)?;
// SIMD inference immediately available
let prediction = model.predict(&features)?;
}
```
**Build and deploy:**
```bash
cargo build --release --target aarch64-unknown-linux-gnu
# Output: single 5MB binary with model embedded
./app # Runs anywhere, NEON SIMD active on ARM
```
### Why This Matters
| Cold start | 5-30 seconds | <100ms |
| Memory | 500MB - 2GB | 10-50MB |
| Dependencies | Python, PyTorch, etc. | None |
| Artifacts | 5-20 files | 1 file |
### AWS Lambda ARM (Graviton)
Based on [ruchy-lambda research](https://github.com/paiml/ruchy-lambda): blocking I/O achieves 7.69ms cold start.
```rust
const MODEL: &[u8] = include_bytes!("classifier.apr");
fn main() {
let model: LogisticRegression = load_from_bytes(MODEL, ModelType::LogisticRegression)
.expect("embedded model valid");
// Lambda Runtime API loop (blocking, no tokio)
loop {
let event = get_next_event(); // blocking GET
let pred = model.predict(&event.data); // NEON SIMD
send_response(pred); // blocking POST
}
}
```
**Performance:** 128MB ARM64, <10ms cold start, ~$0.0000002/request.
### Deployment Targets
| `x86_64-unknown-linux-gnu` | ~5MB | AVX2/512 | Lambda x86, servers |
| `aarch64-unknown-linux-gnu` | ~4MB | NEON | Lambda ARM, RPi |
| `wasm32-unknown-unknown` | ~500KB | - | Browser, Workers |
## Quantization
Reduce model size 4-8x with integer weights (GGUF-compatible).
### Quick Start
```bash
# Quantize existing model
apr quantize model.apr --type q4_0 --output model-q4.apr
# Inspect
apr inspect model-q4.apr --quantization
# Type: Q4_0, Block size: 32, Bits/weight: 4.5
```
### Types (GGUF Standard)
| Q8_0 | 8 | 32 | High accuracy |
| Q4_0 | 4 | 32 | Balanced |
| Q4_1 | 4 | 32 | Better accuracy |
### API
```rust
use aprender::format::{QuantType, save_quantized};
// Quantize and save
let quantized = model.quantize(QuantType::Q4_0)?;
save(&quantized, ModelType::NeuralSequential, "model-q4.apr", opts)?;
```
### Export
```bash
# To GGUF (llama.cpp compatible)
apr export model-q4.apr --format gguf --output model.gguf
# To SafeTensors (HuggingFace)
apr export model-q4.apr --format safetensors --output model/
```
## Knowledge Distillation
Train smaller models from larger teachers with full provenance tracking.
### The Pipeline
```bash
# 1. Distill 7B โ 1B
apr distill teacher-7b.apr --output student-1b.apr \
--temperature 3.0 --alpha 0.7
# 2. Quantize
apr quantize student-1b.apr --type q4_0 --output student-q4.apr
# 3. Embed in binary
# include_bytes!("student-q4.apr")
```
**Size reduction:**
| Teacher (7B, FP32) | 28 GB | baseline |
| Student (1B, FP32) | 4 GB | 7x |
| Student (Q4_0) | 500 MB | 56x |
| + Zstd | 400 MB | **70x** |
### Provenance
Every distilled model stores teacher information:
```rust
let info = inspect("student.apr")?;
let distill = info.distillation.unwrap();
println!("Teacher: {}", distill.teacher.hash); // SHA256
println!("Method: {:?}", distill.method); // Standard/Progressive/Ensemble
println!("Temperature: {}", distill.params.temperature);
println!("Final loss: {}", distill.params.final_loss);
```
### Methods
| Standard | KL divergence on final logits |
| Progressive | Layer-wise intermediate matching |
| Ensemble | Multiple teachers averaged |
```bash
# Progressive distillation with layer mapping
apr distill teacher.apr --output student.apr \
--method progressive --layer-map "0:0,1:2,2:4"
# Ensemble from multiple teachers
apr distill teacher1.apr teacher2.apr teacher3.apr \
--output student.apr --method ensemble
```
## Complete SLM Pipeline
End-to-end: large model โ edge deployment.
```text
โโโโโโโโโโโโโโโโโโโโ
โ LLaMA 7B (28GB) โ Teacher model
โโโโโโโโโโฌโโโโโโโโโโ
โ distill (entrenar)
โผ
โโโโโโโโโโโโโโโโโโโโ
โ Student 1B (4GB) โ Knowledge transferred
โโโโโโโโโโฌโโโโโโโโโโ
โ quantize (Q4_0)
โผ
โโโโโโโโโโโโโโโโโโโโ
โ Quantized (500MB)โ 4-bit weights
โโโโโโโโโโฌโโโโโโโโโโ
โ compress (zstd)
โผ
โโโโโโโโโโโโโโโโโโโโ
โ Compressed (400MB)โ 70x smaller
โโโโโโโโโโฌโโโโโโโโโโ
โ embed (include_bytes!)
โผ
โโโโโโโโโโโโโโโโโโโโ
โ Single Binary โ Deploy anywhere
โ ARM NEON SIMD โ <10ms cold start
โ 2GB RAM device โ $0.0000002/req
โโโโโโโโโโโโโโโโโโโโ
```
**Cargo.toml for minimal binary:**
```toml
[profile.release]
lto = true
codegen-units = 1
panic = "abort"
strip = true
opt-level = "z"
```
## Mixture of Experts (MoE)
MoE models use **bundled persistence** - a single `.apr` file contains the gating network and all experts:
```text
model.apr
โโโ Header (ModelType::MixtureOfExperts = 0x0040)
โโโ Metadata (MoeConfig)
โโโ Payload
โโโ Gating Network
โโโ Experts[0..n]
```
```rust
use aprender::ensemble::{MixtureOfExperts, MoeConfig, SoftmaxGating};
// Build MoE
let moe = MixtureOfExperts::builder()
.gating(SoftmaxGating::new(n_features, n_experts))
.expert(expert_0)
.expert(expert_1)
.expert(expert_2)
.config(MoeConfig::default().with_top_k(2))
.build()?;
// Save bundled (single file)
moe.save_apr("model.apr")?;
// Load
let loaded = MixtureOfExperts::<MyExpert, SoftmaxGating>::load("model.apr")?;
```
**Benefits:**
- Atomic save/load (no partial states)
- Single file deployment
- Checksummed integrity
See [Case Study: Mixture of Experts](./mixture-of-experts.md) for full API documentation.
## Specification
Full specification: [docs/specifications/model-format-spec.md](https://github.com/paiml/aprender/blob/main/docs/specifications/model-format-spec.md)
**Key properties:**
- Pure Rust (Sovereign AI, zero C/C++ dependencies)
- WASM compatibility (hard requirement, spec ยง1.0)
- Single binary deployment (spec ยง1.1)
- GGUF-compatible quantization (spec ยง6.2)
- Knowledge distillation provenance (spec ยง6.3)
- MoE bundled architecture (spec ยง6.4)
- 32-byte fixed header for fast scanning
- MessagePack metadata (compact, fast)
- bincode payload (zero-copy potential)
- CRC32 integrity, Ed25519 signatures, AES-256-GCM encryption
- trueno-native mode for zero-copy SIMD inference (native only)