1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
//! # Phi-2
//!
//! Microsoft Phi-2 is a 2.7B-parameter small language model (Li et al., 2023)
//! notable for its strong performance relative to model size. Key architectural
//! properties:
//!
//! * **Parallel transformer blocks**: attention and MLP operate in parallel on
//! the *same* layer-normalised input (unlike sequential residual stacking).
//! * **Rotary Position Embeddings (RoPE)** applied to Q and K projections.
//! * **GELU-activated MLP**: `fc1 (hidden→4×hidden) → GELU → fc2 (4×hidden→hidden)`.
//! * Standard Layer Normalisation (mean-variance, not RMS).
//! * No GQA — full Multi-Head Attention.
//!
//! ## Usage
//!
//! ```rust,no_run
//! use trustformers_models::phi2::{Phi2Config, Phi2ForCausalLM};
//!
//! let config = Phi2Config::small_test();
//! let model = Phi2ForCausalLM::new(config)?;
//! let logits = model.forward(vec![1u32, 2, 3])?;
//! # Ok::<(), trustformers_core::errors::TrustformersError>(())
//! ```
pub use Phi2Config;
pub use ;
pub use ;