Expand description
§vsa-optim-rs
Deterministic training optimization using Vector Symbolic Architecture (VSA), ternary quantization, and closed-form gradient prediction.
This crate enables efficient large model fine-tuning on consumer hardware through mathematically principled gradient compression and prediction with guaranteed reproducibility.
§Key Properties
- Deterministic: Identical inputs produce identical outputs
- Closed-form: Weighted least squares with Cramer’s rule—no iterative optimization
- Memory-efficient: ~90% gradient storage reduction via VSA compression
- Compute-efficient: ~80% backward pass reduction via gradient prediction
§Quick Start
The recommended entry point is DeterministicPhaseTrainer, which orchestrates
training through four phases: WARMUP → FULL → PREDICT → CORRECT.
ⓘ
use vsa_optim_rs::{DeterministicPhaseTrainer, DeterministicPhaseConfig, DeterministicPhase};
use candle_core::Device;
let shapes = vec![
("layer1.weight".into(), vec![768, 768]),
("layer2.weight".into(), vec![768, 3072]),
];
let config = DeterministicPhaseConfig::default();
let mut trainer = DeterministicPhaseTrainer::new(&shapes, config, &Device::Cpu)?;
for step in 0..100 {
let info = trainer.begin_step()?;
if trainer.should_compute_full() {
// Compute gradients via backpropagation
trainer.record_full_gradients(&gradients)?;
} else {
// Use deterministically predicted gradients
let predicted = trainer.get_predicted_gradients()?;
}
trainer.end_step(loss)?;
}§Modules
config: Configuration types for all componentserror: Error types and result aliasesphase: Phase-based training orchestration (deterministic and legacy)prediction: Gradient prediction (deterministic least squares and momentum)ternary: Ternary{-1, 0, +1}gradient accumulationvsa: VSA gradient compression with bind/bundle/unbind operations
§Deterministic Gradient Prediction
The core algorithm fits a linear gradient model using weighted least squares:
g(t) = baseline + velocity × t + residual- baseline: Weighted mean of historical gradients
- velocity: Gradient change rate (fitted via Cramer’s rule)
- residual: Exponentially-averaged prediction error for drift correction
§References
- Kanerva, P. (2009). Hyperdimensional Computing
- Johnson, W. & Lindenstrauss, J. (1984). Extensions of Lipschitz mappings
- Ma, S. et al. (2024). The Era of 1-bit LLMs
Re-exports§
pub use config::PhaseConfig;pub use config::PredictionConfig;pub use config::TernaryConfig;pub use config::VSAConfig;pub use error::OptimError;pub use error::Result;pub use phase::PhaseTrainer;pub use phase::TrainingPhase;pub use prediction::GradientPredictor;pub use ternary::TernaryGradientAccumulator;pub use ternary::TernaryOptimizerWrapper;pub use vsa::VSAGradientCompressor;pub use phase::DeterministicPhase;pub use phase::DeterministicPhaseConfig;pub use phase::DeterministicPhaseTrainer;pub use phase::DeterministicStepInfo;pub use phase::DeterministicTrainerStats;pub use prediction::DeterministicPredictionConfig;pub use prediction::DeterministicPredictor;pub use prediction::PredictorStatistics;
Modules§
- config
- Configuration types for VSA training optimization.
- error
- Error types for VSA training optimization.
- phase
- Phase-based training with prediction and correction cycles.
- prediction
- Gradient prediction for training acceleration.
- ternary
- Ternary math acceleration for gradient operations.
- vsa
- VSA (Vector Symbolic Architecture) gradient compression.