Crate aprender

Crate aprender 

Source
Expand description

Aprender: Next-generation machine learning library in pure Rust.

Aprender provides production-grade ML algorithms with a focus on ergonomic APIs, comprehensive testing, and backend-agnostic compute.

§Quick Start

use aprender::prelude::*;

// Create training data (y = 2*x + 1)
let x = Matrix::from_vec(4, 1, vec![
    1.0,
    2.0,
    3.0,
    4.0,
]).unwrap();
let y = Vector::from_slice(&[3.0, 5.0, 7.0, 9.0]);

// Train linear regression
let mut model = LinearRegression::new();
model.fit(&x, &y).unwrap();

// Make predictions
let predictions = model.predict(&x);
let r2 = model.score(&x, &y);
assert!(r2 > 0.99);

§Modules

  • primitives: Core Vector and Matrix types
  • data: DataFrame for named columns
  • linear_model: Linear regression algorithms
  • cluster: Clustering algorithms (K-Means)
  • code: Code analysis and code2vec embeddings
  • classification: Classification algorithms (Logistic Regression)
  • tree: Decision tree classifiers
  • metrics: Evaluation metrics
  • mining: Pattern mining algorithms (Apriori for association rules)
  • model_selection: Cross-validation and train/test splitting
  • preprocessing: Data transformers (scalers, encoders)
  • optim: Optimization algorithms (SGD, Adam)
  • loss: Loss functions for training (MSE, MAE, Huber)
  • serialization: Model serialization (SafeTensors format)
  • stats: Traditional descriptive statistics (quantiles, histograms)
  • graph: Graph construction and analysis (centrality, community detection)
  • bayesian: Bayesian inference (conjugate priors, MCMC, variational inference)
  • glm: Generalized Linear Models (Poisson, Gamma, Binomial families)
  • decomposition: Matrix decomposition (ICA, PCA)
  • text: Text processing and NLP (tokenization, stop words, stemming)
  • time_series: Time series analysis and forecasting (ARIMA)
  • index: Approximate nearest neighbor search (HNSW)
  • recommend: Recommendation systems (content-based, collaborative filtering)
  • synthetic: Synthetic data generation for AutoML (EDA, back-translation, MixUp)
  • bundle: Model bundling and memory paging for large models
  • cache: Cache hierarchy and model registry for large model management
  • chaos: Chaos engineering configuration (from renacer)
  • inspect: Model inspection tooling (header analysis, diff, quality scoring)
  • loading: Model loading subsystem with WCET and cryptographic agility
  • scoring: 100-point model quality scoring system
  • zoo: Model zoo protocol for sharing and discovery
  • embed: Data embedding with test data and tiny model representations
  • native: SIMD-native model format for zero-copy inference
  • stack: Sovereign AI Stack integration types
  • online: Online learning and dynamic retraining infrastructure

Re-exports§

pub use error::AprenderError;
pub use error::Result;
pub use primitives::Matrix;
pub use primitives::Vector;
pub use traits::Estimator;
pub use traits::Transformer;
pub use traits::UnsupervisedEstimator;

Modules§

active_learning
Active Learning strategies for label-efficient training.
autograd
Reverse-mode automatic differentiation engine for neural network training.
automl
Automated Machine Learning (AutoML) module.
bayesian
Bayesian inference and probability methods.
bench
Model evaluation and benchmarking framework (spec §7.10) Model Evaluation and Benchmarking Framework (aprender::bench)
bundle
Model Bundling and Memory Paging
cache
Model Cache and Registry
calibration
Model calibration for confidence estimation.
chaos
Chaos Engineering Configuration
citl
Compiler-in-the-Loop Learning (CITL) for transpiler support. Compiler-in-the-Loop Learning (CITL) module.
classification
Classification algorithms.
cluster
Clustering algorithms.
code
Code Analysis and Code2Vec Embeddings
data
DataFrame module for named column containers.
decomposition
Dimensionality reduction and matrix decomposition algorithms.
embed
Data embedding with test data and tiny model representations (spec §4) Data Embedding Module (spec §4)
ensemble
Mixture of Experts (MoE) ensemble learning (GH-101)
error
Error types for Aprender operations.
format
Aprender Model Format (.apr)
glm
Generalized Linear Models (GLM)
gnn
Graph Neural Network layers for learning on graph-structured data.
graph
Graph construction and analysis with cache-optimized CSR representation.
index
Indexing data structures for efficient nearest neighbor search.
inspect
Model inspection tooling (spec §7.2) Model Inspection Tooling
interpret
Model Interpretability and Explainability.
linear_model
Linear models for regression.
loading
Model loading subsystem with WCET and cryptographic agility (spec §7.1) APR Loading Subsystem
loss
Loss functions for training machine learning models.
metaheuristics
Derivative-free global optimization (metaheuristics).
metrics
Evaluation metrics for ML models.
mining
Pattern mining algorithms for association rule discovery.
model_selection
Model selection utilities for cross-validation and train/test splitting.
monte_carlo
Monte Carlo Simulation Framework
native
SIMD-native model format for zero-copy Trueno inference (spec §5) SIMD-Native Model Format (spec §5)
nn
Neural network modules for deep learning.
online
Online learning and dynamic retraining infrastructure Online Learning Infrastructure for Dynamic Model Retraining
optim
Optimization algorithms for gradient-based learning.
prelude
Convenience re-exports for common usage.
preprocessing
Preprocessing transformers for data standardization and normalization.
primitives
Core compute primitives (Vector, Matrix).
qa
Model Quality Assurance module (spec §7.9) Model Quality Assurance Module (aprender::qa)
recommend
Recommendation systems.
regularization
Regularization techniques for neural network training.
scoring
100-point model quality scoring system (spec §7) 100-Point Model Quality Scoring System (spec §7)
serialization
Model Serialization Module
stack
Sovereign AI Stack integration types (spec §9) Sovereign AI Stack Integration (spec §9)
stats
Traditional descriptive statistics for vector data.
synthetic
Synthetic Data Generation for AutoML.
text
Text processing and NLP utilities.
time_series
Time series analysis and forecasting.
traits
Core traits for ML estimators and transformers.
transfer
Transfer Learning module for cross-project knowledge sharing.
tree
Decision tree algorithms and ensemble methods.
weak_supervision
Weak Supervision and Label Model.
zoo
Model zoo protocol for sharing and discovery (spec §8) Model Zoo Protocol (spec §8)