1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
//! Real-time Optimization System for RuvLLM
//!
//! This module provides the optimization infrastructure for LLM inference,
//! integrating SONA learning with MicroLoRA and custom kernels.
//!
//! ## Quick Start
//!
//! ```rust,ignore
//! use ruvllm::optimization::{
//! SonaLlm, SonaLlmConfig, RealtimeOptimizer, RealtimeConfig,
//! MetricsCollector, ConsolidationStrategy,
//! };
//!
//! // Create SONA integration for three-tier learning
//! let sona_config = SonaLlmConfig {
//! instant_lr: 0.01,
//! background_interval_ms: 100,
//! background_min_samples: 10,
//! consolidation_strategy: ConsolidationStrategy::EwcMerge,
//! ..Default::default()
//! };
//! let sona = SonaLlm::new(sona_config);
//!
//! // During inference: instant adaptation
//! let result = sona.instant_adapt(&query_embedding, &response_embedding, 0.85);
//! println!("Adapt latency: {}us", result.latency_us);
//!
//! // Periodic: background consolidation
//! if let Some(bg_result) = sona.maybe_background() {
//! println!("Consolidated {} samples", bg_result.samples_used);
//! }
//!
//! // Triggered: deep optimization
//! if sona.should_trigger_deep() {
//! let deep_result = sona.deep_optimize(&samples);
//! println!("Quality delta: {:.3}", deep_result.quality_delta);
//! }
//! ```
//!
//! ## Architecture
//!
//! ```text
//! +-------------------+ +-------------------+
//! | Inference Request |---->| RealtimeOptimizer |
//! | (tokens, params) | | - Batch sizing |
//! +-------------------+ | - KV management |
//! | - Token budgets |
//! +--------+----------+
//! |
//! v (metrics)
//! +--------+----------+
//! | InferenceMetrics |
//! | - TTFT tracking |
//! | - TPS monitoring |
//! | - Memory usage |
//! +--------+----------+
//! |
//! v (feedback)
//! +--------+----------+
//! | SonaLlm |
//! | - Instant adapt |
//! | - Background loop |
//! | - Deep optimize |
//! +-------------------+
//! ```
//!
//! ## SONA Learning Tiers
//!
//! | Tier | Latency | Trigger | Action |
//! |------|---------|---------|--------|
//! | Instant | <1ms | Every request | MicroLoRA gradient update |
//! | Background | ~100ms | Timer/threshold | Pattern consolidation |
//! | Deep | Minutes | Manual/scheduled | Full training pipeline |
//!
//! ## Features
//!
//! - **Real-time Optimization**: Dynamic batch sizing and KV cache management
//! - **SONA Integration**: Three-tier learning loops for continuous improvement
//! - **Metrics Collection**: Comprehensive inference telemetry
//! - **Speculative Decoding**: Draft model integration for faster generation
//!
//! ## Consolidation Strategies
//!
//! ```rust,ignore
//! use ruvllm::optimization::ConsolidationStrategy;
//!
//! // EWC++ merge (default) - preserves important weights
//! let strategy = ConsolidationStrategy::EwcMerge;
//!
//! // Quality-weighted - higher quality samples have more influence
//! let strategy = ConsolidationStrategy::QualityWeighted;
//!
//! // Best only - keep top 20% by quality
//! let strategy = ConsolidationStrategy::BestOnly;
//! ```
// Re-exports
pub use ;
pub use ;
pub use ;