aprender-train 0.29.0

Training & Optimization library with autograd, LoRA, quantization, and model merging
docs.rs failed to build aprender-train-0.29.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

A pure Rust training framework providing autograd, LoRA/QLoRA fine-tuning, quantization (Int4/Int8), model merging, knowledge distillation, and Compiler-in-the-Loop (CITL) training. Built on trueno for SIMD-accelerated compute and aprender for ML algorithms.


Features | Installation | Usage | Architecture | Quality | Sovereign Stack | Documentation | License


Table of Contents

What is entrenar?

Entrenar (Spanish: "to train") is a production-grade neural network training library in pure Rust. It provides everything needed to train, fine-tune, quantize, merge, and distill models -- with no Python dependency.

Core capabilities:

  • Autograd Engine -- Tape-based reverse-mode automatic differentiation
  • Optimizers -- SGD, Adam, AdamW with cosine scheduling and gradient clipping
  • LoRA / QLoRA -- Parameter-efficient fine-tuning with 4-bit quantized base weights
  • Quantization -- QAT, PTQ, GGUF-compatible Q4_0/Q8_0, NF4 training
  • Model Merging -- TIES, DARE, SLERP algorithms
  • Knowledge Distillation -- Multi-teacher, progressive layer-wise
  • CITL -- Compiler-in-the-Loop training for transpiler optimization
  • GPU Training -- WGPU backend (AMD/Intel/cross-platform), CUDA/cuBLAS (NVIDIA)
  • Monitoring -- Real-time metrics, drift detection, Andon alerts

Part of the PAIML Sovereign AI Stack.

Installation

Library

Add to your Cargo.toml:

[dependencies]
entrenar = "0.7"

CLI

cargo install entrenar

From source

git clone https://github.com/paiml/entrenar
cd entrenar
cargo install --path .

Usage

Basic Training

use entrenar::train::{Trainer, TrainConfig, MSELoss, EarlyStopping};
use entrenar::optim::Adam;
use entrenar::Tensor;

let params = vec![Tensor::zeros(784 * 128, true)];
let optimizer = Adam::new(0.001, 0.9, 0.999, 1e-8);

let mut trainer = Trainer::new(params, Box::new(optimizer), TrainConfig::default());
trainer.set_loss(Box::new(MSELoss));
trainer.add_callback(EarlyStopping::new(5, 0.001));

let result = trainer.train(100, || batches.clone(), |x| model.forward(x));
println!("Final loss: {:.4}", result.final_loss);

Autograd

use entrenar::autograd::{matmul, softmax, layer_norm, attention};

let y = matmul(&x, &w);
let s = softmax(&logits);
let n = layer_norm(&x, &gamma, &beta);
let a = attention(&q, &k, &v);

LoRA / QLoRA Fine-Tuning

use entrenar::lora::{LoRALayer, QLoRALayer};

// Standard LoRA
let lora = LoRALayer::new(4096, 4096, 16, 32.0);

// QLoRA: 4-bit base + FP16 adapters (7B model: 28GB -> 3.5GB)
let qlora = QLoRALayer::new(base_weights, 16, 32.0);

Quantization

use entrenar::quant::{FakeQuantize, PTQCalibrator, GGUFQuantizer};

let fq = FakeQuantize::new(8, true);            // QAT with STE
let calibrator = PTQCalibrator::percentile(0.999); // Post-training
let quantizer = GGUFQuantizer::q4_0();           // GGUF export

Model Merging

use entrenar::merge::{TiesMerge, DareMerge, SlerpMerge};

let merged = TiesMerge::new(0.2).merge(&models, &weights);
let merged = DareMerge::new(0.9).merge(&base, &finetuned);
let merged = SlerpMerge::new().merge(&a, &b, 0.5);

Declarative Configuration

# train.yaml
model:
  path: base-model.gguf
data:
  train: train.parquet
  batch_size: 8
optimizer:
  name: adamw
  lr: 0.0001
lora:
  rank: 64
  alpha: 16
training:
  epochs: 10
  grad_clip: 1.0
entrenar train train.yaml

CLI Commands

entrenar train config.yaml --epochs 10
entrenar quantize model.safetensors --bits 4 --output model_q4.json
entrenar merge model1.safetensors model2.safetensors --method ties
entrenar bench config.yaml --warmup 5 --iterations 100
entrenar inspect model.safetensors -v
entrenar audit predictions.parquet --type bias --threshold 0.8
entrenar monitor data.parquet --threshold 0.2

Features

Autograd Engine

Tape-based reverse-mode automatic differentiation with verified gradients. Supports matmul, softmax, layer normalization, and scaled dot-product attention. All gradients validated against finite-difference reference implementations.

LoRA / QLoRA Fine-Tuning

Parameter-efficient fine-tuning with up to 99.75% parameter reduction. QLoRA combines 4-bit NF4 quantized base weights with FP16 low-rank adapters, reducing 7B model memory from 28GB to 3.5GB. PEFT-compatible adapter export for interoperability with HuggingFace tooling.

Quantization

Three quantization strategies: Quantization-Aware Training (QAT) with straight-through estimator, Post-Training Quantization (PTQ) with percentile calibration, and GGUF-compatible Q4_0/Q8_0 export for llama.cpp interoperability. NF4 training with cuBLAS backward pass support.

Model Merging

Three model merging algorithms for combining fine-tuned checkpoints: TIES (Trim, Elect Sign, Merge) for multi-model consolidation, DARE (Dropout And Rescale) for parameter-efficient merging, and SLERP (Spherical Linear Interpolation) for smooth two-model blending.

Knowledge Distillation

Temperature-scaled KD loss with configurable alpha weighting between hard and soft targets. Multi-teacher ensemble distillation with weighted aggregation. Progressive layer-wise distillation for large-to-small model transfer.

CITL (Compiler-in-the-Loop)

Training loop that incorporates compiler feedback for transpiler optimization. Uses RAG-based fix suggestions via trueno-rag to guide training toward compilable outputs. Designed for the depyler/bashrs/decy transpilation stack.

GPU Training

WGPU backend for cross-platform GPU training (AMD, Intel, Apple Silicon). NVIDIA CUDA/cuBLAS backend for dedicated GPU acceleration. NVML integration for real-time GPU monitoring. VRAM ledger with file-based locking for multi-process coordination.

Monitoring

Toyota Way-inspired quality monitoring with real-time metrics collection, drift detection (z-score based), and Andon alert system for automatic anomaly notification. NaN/Inf detection, gradient explosion guards, and loss divergence tracking.

Feature Flags

Flag Purpose
gpu GPU-accelerated training via wgpu
cuda NVIDIA CUDA/cuBLAS training
citl Compiler-in-the-Loop with trueno-rag
monitor Training monitoring with trueno-db persistence
server REST/HTTP API server via axum
parquet Parquet batch loading via alimentar
hub HuggingFace Hub model fetching
wasm Browser-compatible WASM build
tracing Renacer distributed tracing integration
nvml Real GPU monitoring via NVIDIA NVML

Architecture

entrenar/
  autograd/     Tape-based automatic differentiation
  optim/        SGD, Adam, AdamW, schedulers
  lora/         LoRA, QLoRA fine-tuning
  quant/        QAT, PTQ, GGUF quantization
  merge/        TIES, DARE, SLERP merging
  distill/      Knowledge distillation
  finetune/     ClassifyPipeline, ClassifyTrainer, evaluation
  eval/         Classification metrics, drift detection, Andon
  train/        Trainer, callbacks, metrics, WGPU transformer trainer
  monitor/      Real-time monitoring, Andon alerts
  config/       Declarative YAML configuration
  io/           Model persistence (SafeTensors, APR)

Quality

Metric Value
Tests 7,527+ passing
Coverage 96%
TDG Score A+ (96.8/100)
Critical Defects 0
Property Tests 200K+ iterations
Gradient Checking Finite-difference validated
Mutation Testing >80% kill rate
MSRV 1.87

Sovereign AI Stack

Crate Purpose Version
trueno SIMD/GPU compute primitives 0.16.x
aprender ML algorithms, APR v2 format 0.27.x
entrenar Training and optimization 0.7.x
realizar Inference engine (APR/GGUF/SafeTensors) 0.8.x
repartir Distributed compute (CPU/GPU/Remote) 2.0.x
whisper-apr Pure Rust Whisper ASR 0.2.x
simular Simulation engine 0.3.x
batuta Stack orchestration 0.7.x

Documentation

  • API Reference -- Generated from source
  • Book -- Comprehensive guide with examples
  • Examples -- Runnable training, merging, and monitoring examples

Contributing

  1. Fork the repository
  2. Create your changes on the master branch
  3. Run quality gates: make lint && make test
  4. Run coverage: make coverage
  5. Submit a pull request

Cookbook

See entrenar-cookbook for examples and recipes.

License

MIT


Part of the Aprender monorepo — 70 workspace crates.