1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
//! # Hermes LLM
//!
//! A Rust library for training and running Large Language Models from scratch.
//!
//! ## Features
//!
//! - **Model Architecture Language (MAL)**: Define any transformer architecture using a composable DSL
//! - **Training**: Distributed training with NCCL, gradient accumulation, checkpointing
//! - **Generation**: Text generation with temperature, top-k sampling
//! - **Tokenization**: BPE tokenizer training and inference
//! - **DPO**: Direct Preference Optimization for RLHF
//!
//! ## Quick Start
//!
//! ```ignore
//! use hermes_llm::{Transformer, Trainer, get_builtin_model};
//!
//! // Load a predefined model architecture
//! let model_def = get_builtin_model("tiny").unwrap();
//!
//! // Or parse from MAL file
//! let model_def = hermes_llm::parse_mal_file("model.mal").unwrap();
//! ```
//!
//! ## Model Architecture Language (MAL)
//!
//! MAL allows defining transformer architectures in a readable, composable format:
//!
//! ```text
//! attention my_attn {
//! num_heads: 32
//! num_kv_heads: 8
//! }
//!
//! ffn my_ffn {
//! hidden_dim: 4096
//! activation: swiglu
//! }
//!
//! block my_block {
//! attention: my_attn
//! ffn: my_ffn
//! norm: rmsnorm { eps: 1e-5 }
//! norm_position: pre
//! }
//!
//! model my_model {
//! vocab_size: 32000
//! hidden_size: 1024
//! num_layers: 32
//! block: my_block
//! }
//! ```
// Core types
pub use TrainingConfig;
pub use Transformer;
// Training
pub use ;
// Generation
pub use ;
// Distributed
pub use ;
// Model Architecture Language (MAL)
pub use ;
// Data loading
pub use ;
// Tokenization
pub use ;