1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
//! # vsa-optim-rs
//!
//! Deterministic training optimization using Vector Symbolic Architecture (VSA),
//! ternary quantization, and closed-form gradient prediction.
//!
//! This crate enables efficient large model fine-tuning on consumer hardware through
//! mathematically principled gradient compression and prediction with guaranteed
//! reproducibility.
//!
//! ## Key Properties
//!
//! - **Deterministic**: Identical inputs produce identical outputs
//! - **Closed-form**: Weighted least squares with Cramer's rule—no iterative optimization
//! - **Memory-efficient**: ~90% gradient storage reduction via VSA compression
//! - **Compute-efficient**: ~80% backward pass reduction via gradient prediction
//!
//! ## Quick Start
//!
//! The recommended entry point is [`DeterministicPhaseTrainer`], which orchestrates
//! training through four phases: WARMUP → FULL → PREDICT → CORRECT.
//!
//! ```ignore
//! use vsa_optim_rs::{DeterministicPhaseTrainer, DeterministicPhaseConfig, DeterministicPhase};
//! use candle_core::Device;
//!
//! let shapes = vec![
//! ("layer1.weight".into(), vec![768, 768]),
//! ("layer2.weight".into(), vec![768, 3072]),
//! ];
//!
//! let config = DeterministicPhaseConfig::default();
//! let mut trainer = DeterministicPhaseTrainer::new(&shapes, config, &Device::Cpu)?;
//!
//! for step in 0..100 {
//! let info = trainer.begin_step()?;
//!
//! if trainer.should_compute_full() {
//! // Compute gradients via backpropagation
//! trainer.record_full_gradients(&gradients)?;
//! } else {
//! // Use deterministically predicted gradients
//! let predicted = trainer.get_predicted_gradients()?;
//! }
//!
//! trainer.end_step(loss)?;
//! }
//! # Ok::<(), vsa_optim_rs::error::OptimError>(())
//! ```
//!
//! ## Modules
//!
//! - [`config`]: Configuration types for all components
//! - [`error`]: Error types and result aliases
//! - [`phase`]: Phase-based training orchestration (deterministic and legacy)
//! - [`prediction`]: Gradient prediction (deterministic least squares and momentum)
//! - [`ternary`]: Ternary `{-1, 0, +1}` gradient accumulation
//! - [`vsa`]: VSA gradient compression with bind/bundle/unbind operations
//!
//! ## Deterministic Gradient Prediction
//!
//! The core algorithm fits a linear gradient model using weighted least squares:
//!
//! ```text
//! g(t) = baseline + velocity × t + residual
//! ```
//!
//! - **baseline**: Weighted mean of historical gradients
//! - **velocity**: Gradient change rate (fitted via Cramer's rule)
//! - **residual**: Exponentially-averaged prediction error for drift correction
//!
//! ## References
//!
//! - Kanerva, P. (2009). Hyperdimensional Computing
//! - Johnson, W. & Lindenstrauss, J. (1984). Extensions of Lipschitz mappings
//! - Ma, S. et al. (2024). The Era of 1-bit LLMs
// Re-export main types at crate root for convenience
pub use ;
pub use ;
pub use ;
pub use GradientPredictor;
pub use ;
pub use VSAGradientCompressor;
// Re-export deterministic training types (recommended for production)
pub use ;
pub use ;
pub use *;