1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
//! # Phi-3 (Microsoft's Small Language Model)
//!
//! Phi-3 is a family of small language models developed by Microsoft that achieve
//! impressive performance while being compact enough to run on mobile devices.
//!
//! ## Architecture Innovations
//!
//! Phi-3 incorporates several key improvements:
//! - **RMSNorm**: Root Mean Square Layer Normalization for efficiency
//! - **SwiGLU activation**: Gated linear units in feed-forward network
//! - **Rotary Position Embeddings (RoPE)**: Advanced position encoding
//! - **LongRope scaling**: Extended context support up to 128K tokens
//! - **Grouped Query Attention (GQA)**: In larger models for efficiency
//! - **Sliding Window Attention**: Optional local attention patterns
//!
//! ## Model Variants
//!
//! Available configurations:
//! - **Phi-3 Mini (3.8B)**: Compact model for mobile and edge devices
//! - **Phi-3 Small (7B)**: Balanced performance and efficiency
//! - **Phi-3 Medium (14B)**: Highest capability model in the family
//!
//! Each variant comes in multiple context lengths:
//! - **4K**: Standard context length for most applications
//! - **8K**: Extended context for small model
//! - **128K**: Very long context with LongRope scaling
//!
//! ## Usage Examples
//!
//! ### Text Generation with Phi-3 Mini
//! ```rust,no_run
//! use trustformers_models::phi3::{Phi3ForCausalLM, Phi3Config};
//! use trustformers_core::generation::{GenerationConfig, SamplingStrategy};
//!
//! let config = Phi3Config::phi3_mini_4k_instruct();
//! let mut model = Phi3ForCausalLM::new(config)?;
//! model.load_from_hub("microsoft/Phi-3-mini-4k-instruct")?;
//!
//! // Generate with sampling
//! let gen_config = GenerationConfig {
//! max_new_tokens: 150,
//! temperature: 0.7,
//! top_p: 0.9,
//! do_sample: true,
//! ..Default::default()
//! };
//!
//! let generated = model.generate(input_ids, gen_config)?;
//! ```
//!
//! ### Instruction Following
//! ```rust,no_run
//! use trustformers_models::phi3::{Phi3ForCausalLM, Phi3Config};
//!
//! let config = Phi3Config::phi3_small_8k_instruct();
//! let mut model = Phi3ForCausalLM::new(config)?;
//! model.load_from_hub("microsoft/Phi-3-small-8k-instruct")?;
//!
//! // Format instruction with Phi-3 chat template
//! let instruction = "<|user|>\nExplain machine learning in simple terms.<|end|>\n<|assistant|>\n";
//! let input_ids = tokenizer.encode(instruction)?;
//!
//! let response = model.generate(input_ids, max_length: 400)?;
//! ```
//!
//! ### Long Context Processing
//! ```rust,no_run
//! use trustformers_models::phi3::{Phi3ForCausalLM, Phi3Config};
//!
//! // Use 128K context model for long documents
//! let config = Phi3Config::phi3_mini_128k_instruct();
//! let mut model = Phi3ForCausalLM::new(config)?;
//! model.load_from_hub("microsoft/Phi-3-mini-128k-instruct")?;
//!
//! // Process long document (up to 128K tokens)
//! let long_input = tokenizer.encode(&very_long_document)?;
//! let summary = model.generate(long_input, max_length: 1000)?;
//! ```
//!
//! ### Efficient Mobile Deployment
//! ```rust,no_run
//! use trustformers_models::phi3::{Phi3ForCausalLM, Phi3Config};
//!
//! let config = Phi3Config {
//! use_flash_attention: true, // Enable memory-efficient attention
//! attention_dropout: 0.0, // Disable dropout for inference
//! ..Phi3Config::phi3_mini_4k_instruct()
//! };
//!
//! let mut model = Phi3ForCausalLM::new(config)?;
//! model.load_quantized("phi3-mini-4bit.gguf")?; // Load quantized weights
//!
//! // Optimized inference for mobile
//! let result = model.forward(input_ids)?;
//! ```
//!
//! ## Key Components
//!
//! ### RMSNorm
//! More efficient normalization than LayerNorm:
//! ```text
//! RMSNorm(x) = x * g / sqrt(mean(x²) + ε)
//! ```
//!
//! ### LongRope Scaling
//! Enables extended context through position embedding scaling:
//! - Handles context lengths up to 128K tokens
//! - Uses short and long scaling factors
//! - Maintains performance on shorter sequences
//!
//! ### SwiGLU Activation
//! Gated linear unit in feed-forward network:
//! ```text
//! SwiGLU(x) = (xW₁ ⊙ σ(xW₃)) W₂
//! ```
//! Where σ is SiLU activation and ⊙ is element-wise multiplication.
//!
//! ### Grouped Query Attention
//! Used in medium model for efficiency:
//! - Reduces memory usage during inference
//! - Maintains quality while improving speed
//! - Balances between MHA and MQA
//!
//! ## Training Details
//!
//! - Trained on high-quality filtered data
//! - Uses curriculum learning approach
//! - Incorporates safety training and alignment
//! - Optimized for instruction following
//!
//! ## Performance Characteristics
//!
//! - **Phi-3 Mini**: Best efficiency, mobile-friendly
//! - **Phi-3 Small**: Balanced performance/size
//! - **Phi-3 Medium**: Highest capability in family
//!
//! All models excel at:
//! - Instruction following
//! - Code generation
//! - Mathematical reasoning
//! - Common sense reasoning
//!
//! ## Mobile and Edge Optimization
//!
//! Phi-3 is specifically designed for deployment on resource-constrained devices:
//! - Efficient architecture reduces memory usage
//! - Supports quantization (4-bit, 8-bit)
//! - Optimized for CPU and mobile GPU inference
//! - Fast initialization and small model files
//!
//! ## Safety and Alignment
//!
//! Phi-3 models include safety features:
//! - Trained with safety datasets
//! - Reduced harmful output generation
//! - Built-in content filtering
//! - Aligned for helpful, harmless responses
pub use Phi3Config;
pub use ;