Overview
Aprender is a lightweight, pure Rust machine learning library designed for efficiency and ease of use. Built with EXTREME TDD methodology, it provides reliable implementations of core ML algorithms with comprehensive test coverage.
Sovereign AI Stack
aprender is part of the Paiml Sovereign AI Stack - a complete pure Rust ML/AI ecosystem.
┌─────────────────────────────────────────────────────────────┐
│ SOVEREIGN AI STACK │
├─────────────────────────────────────────────────────────────┤
│ APPLICATION ruchy · depyler · decy · batuta │
├─────────────────────────────────────────────────────────────┤
│ ML/AI realizar · entrenar · ★ aprender ★ │
├─────────────────────────────────────────────────────────────┤
│ DATA alimentar · trueno-db · trueno-graph │
├─────────────────────────────────────────────────────────────┤
│ COMPUTE trueno (SIMD/GPU/WASM) · repartir │
├─────────────────────────────────────────────────────────────┤
│ QUALITY pmat · certeza · renacer · verificar │
└─────────────────────────────────────────────────────────────┘
Key Properties:
- Pure Rust - No Python, no FFI, WASM-compatible
- Sovereign - Runs on-premises, EU clouds, or air-gapped
- GPU/SIMD - Hardware acceleration via trueno
- Zero US dependency - S3-compatible (MinIO, Scaleway, OVH)
Features
Core Primitives
- Vector - 1D numerical array with statistical operations (mean, sum, dot, norm, variance)
- Powered by trueno v0.4.1 for SIMD acceleration
- Matrix - 2D numerical array with linear algebra (matmul, transpose, Cholesky decomposition)
- SIMD-optimized operations via trueno backend
- DataFrame - Named column container for ML data preparation workflows
Supervised Learning (TOP 10 ML Algorithms ✅)
- LinearRegression - Ordinary Least Squares via normal equations
- LogisticRegression - Binary/multi-class classification with gradient descent
- DecisionTreeClassifier - GINI-based decision tree with configurable max depth
- RandomForestClassifier - Bootstrap aggregating ensemble with majority voting
- GradientBoostingClassifier - Adaptive boosting with residual learning
- NaiveBayes (GaussianNB) - Probabilistic classification with Bayes' theorem
- KNeighborsClassifier - Distance-based classification (k-NN)
- LinearSVM - Support Vector Machine with hinge loss and subgradient descent
Unsupervised Learning
- KMeans - K-means++ initialization with Lloyd's algorithm
- DBSCAN - Density-based clustering with eps and min_samples
- HierarchicalClustering - Agglomerative clustering with linkage methods
- GaussianMixture - EM algorithm for soft clustering
- SpectralClustering - Graph Laplacian eigendecomposition clustering
- IsolationForest - Ensemble-based anomaly detection
- LocalOutlierFactor - Density-based outlier detection
- PCA - Principal Component Analysis for dimensionality reduction (TOP 10 ✅)
- TSNE - t-SNE for non-linear visualization
Graph Algorithms
- Graph - Adjacency list representation with weighted/unweighted edges
- Betweenness Centrality - Shortest path-based node importance
- PageRank - Iterative power method for node ranking
- Louvain - Community detection via modularity optimization
Association Rule Mining
- Apriori - Frequent itemset mining for market basket analysis
- Support, confidence, and lift metrics
Descriptive Statistics
- Mean, Median, Mode, Variance, Standard Deviation
- Quartiles (Q1, Q2, Q3), Interquartile Range (IQR)
- Histograms with multiple binning strategies (Freedman-Diaconis, Sturges, Scott, Square Root)
- Five-number summary (min, Q1, median, Q3, max)
Model Selection & Evaluation
- train_test_split - Random train/test splitting with reproducible seeds
- KFold - K-fold cross-validator with optional shuffling
- cross_validate - Automated cross-validation with statistics (mean, std, min, max)
Model Format (.apr)
Native binary format with built-in quality (Jidoka):
use ;
// Save model with metadata
save?;
// Load with automatic verification
let model: LinearRegression = load?;
Features:
- Security: AES-256-GCM encryption, Ed25519 signatures, X25519 key exchange
- Integrity: CRC32 checksums, type verification (Jidoka - stop on corruption)
- Performance: trueno-native mode for 600x faster loading via zero-copy mmap
- Commercial: License blocks, watermarking, buyer-specific encryption
- Interop: Export to SafeTensors (HuggingFace), GGUF (Ollama)
Metrics
- Regression: r_squared, mse, rmse, mae
- Classification: accuracy, precision, recall, f1_score, confusion_matrix
- Clustering: silhouette_score, inertia
Installation
Add to your Cargo.toml:
[]
= "0.4.1"
Quick Start
Linear Regression
use *;
K-Means Clustering
use *;
Random Forest Classification
use *;
use RandomForestClassifier;
Cross-Validation
use *;
use ;
Examples
Run any of the 26+ included examples:
Supervised Learning
Unsupervised Learning
Anomaly Detection
Graph & Association Rules
Model Selection & Utilities
Quality Metrics
- TDG Score: 93.3/100 (A grade)
- Total Tests: 683 passing
- Property Tests: 32 (proptest)
- Doc Tests: 49
- Coverage: ~95%
- Max Cyclomatic Complexity: ≤10
- Clippy Warnings: 0
- SATD Violations: 0 critical (1 low-priority TODO)
Documentation
- EXTREME TDD Book: https://paiml.github.io/aprender/
- API Reference: Run
cargo doc --openor visit docs.rs/aprender
Roadmap
See ROADMAP.md for planned features and version roadmap.
Citation
If you use aprender in your research, please cite it:
Or in APA format:
Gift, N., & Contributors. (2024). aprender: Next Generation Machine Learning in Pure Rust (Version 0.10.0) [Computer software]. GitHub. https://github.com/paiml/aprender
See CITATION.cff for machine-readable citation metadata.
License
MIT License - see LICENSE for details.
Contributing
Contributions welcome! Please ensure:
- All tests pass:
cargo test --all - No clippy warnings:
cargo clippy --all-targets - Code is formatted:
cargo fmt