1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
//! # Gemma-2 (Google's Second-Generation Open LLM)
//!
//! Gemma-2 is a family of open language models from Google DeepMind that introduces
//! several architectural improvements over the original Gemma:
//!
//! ## Key Architectural Innovations
//!
//! - **Alternating attention pattern**: Even layers use *local* (sliding window) attention;
//! odd layers use *global* (full causal) attention. This balances efficiency and long-range
//! capability.
//! - **Logit soft-capping**: Both attention scores and final LM logits are passed through
//! `tanh(x / cap) * cap`, preventing extremely large values from destabilising training.
//! - **Grouped Query Attention (GQA)**: Fewer KV heads reduce memory bandwidth during inference.
//! - **Post-normalization**: RMSNorm is applied both *before* and *after* each residual add
//! (pre-norm + post-norm), improving training stability.
//! - **GEGLU activation**: `gelu(gate) * up` in the MLP block.
//! - **Fixed 256-dim head size**: All Gemma-2 variants use `head_dim = 256` regardless of model
//! size.
//!
//! ## Model Variants
//!
//! | Variant | Params | Layers | Hidden | Heads (Q/KV) |
//! |----------|--------|--------|--------|--------------|
//! | 2B | 2.6 B | 26 | 2304 | 8 / 4 |
//! | 9B | 9 B | 42 | 3584 | 16 / 8 |
//! | 27B | 27 B | 46 | 4608 | 32 / 16 |
//!
//! ## Usage
//!
//! ```rust,no_run
//! use trustformers_models::gemma2::{Gemma2Config, Gemma2ForCausalLM};
//!
//! let config = Gemma2Config::gemma2_9b();
//! // For tests, use a tiny config instead:
//! // let config = Gemma2Config { hidden_size: 16, ..Default::default() };
//! let model = Gemma2ForCausalLM::new(config).expect("model creation");
//!
//! // Format a chat prompt
//! let prompt = trustformers_models::gemma2::format_chat_prompt("Hello!");
//! println!("{}", prompt);
//! ```
pub use Gemma2Config;
pub use ;
pub use ;