1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
//! # BERT (Bidirectional Encoder Representations from Transformers)
//!
//! BERT is a transformer-based model that uses bidirectional attention to create
//! deep bidirectional representations. It's pre-trained on masked language modeling
//! and next sentence prediction tasks.
//!
//! ## Architecture
//!
//! BERT consists of:
//! - Multi-layer bidirectional Transformer encoder
//! - WordPiece embeddings with positional and segment embeddings
//! - Layer normalization and dropout for regularization
//! - GELU activation functions
//!
//! ## Model Variants
//!
//! This implementation supports:
//! - **BERT-Base**: 12 layers, 768 hidden, 12 heads, 110M parameters
//! - **BERT-Large**: 24 layers, 1024 hidden, 16 heads, 340M parameters
//! - **Custom configurations**: Create your own BERT variant
//!
//! ## Usage Examples
//!
//! ### Text Classification
//! ```rust,no_run
//! use trustformers_models::bert::{BertForSequenceClassification, BertConfig};
//!
//! let config = BertConfig::bert_base_uncased();
//! let mut model = BertForSequenceClassification::new(config, num_labels)?;
//! model.load_from_hub("bert-base-uncased")?;
//!
//! // Perform classification
//! let outputs = model.forward(input_ids, attention_mask)?;
//! let predictions = outputs.logits.argmax(-1)?;
//! ```
//!
//! ### Masked Language Modeling
//! ```rust,no_run
//! use trustformers_models::bert::{BertForMaskedLM, BertConfig};
//!
//! let config = BertConfig::bert_base_uncased();
//! let mut model = BertForMaskedLM::new(config)?;
//! model.load_from_hub("bert-base-uncased")?;
//!
//! // Predict masked tokens
//! let outputs = model.forward(masked_input_ids, attention_mask)?;
//! let predictions = outputs.logits.argmax(-1)?;
//! ```
//!
//! ### Feature Extraction
//! ```rust,no_run
//! use trustformers_models::bert::{BertModel, BertConfig};
//!
//! let config = BertConfig::bert_base_uncased();
//! let mut model = BertModel::new(config)?;
//! model.load_from_hub("bert-base-uncased")?;
//!
//! // Extract features
//! let outputs = model.forward(input_ids, attention_mask)?;
//! let pooled_output = outputs.pooler_output; // [CLS] token representation
//! let sequence_output = outputs.last_hidden_state; // All token representations
//! ```
//!
//! ## Pre-training Tasks
//!
//! BERT is pre-trained on two tasks:
//!
//! 1. **Masked Language Modeling (MLM)**: Randomly mask 15% of tokens and predict them
//! 2. **Next Sentence Prediction (NSP)**: Predict if sentence B follows sentence A
//!
//! ## Fine-tuning
//!
//! BERT can be fine-tuned for various downstream tasks:
//! - Text classification (sentiment analysis, spam detection)
//! - Named Entity Recognition (NER)
//! - Question Answering
//! - Text similarity
//! - Token classification
//!
//! ## Performance Tips
//!
//! - Use `bert-base` for most tasks (good balance of performance/accuracy)
//! - Enable mixed precision training for faster fine-tuning
//! - Adjust max sequence length based on your data
//! - Use gradient accumulation for larger effective batch sizes
pub use BertConfig;
pub use BertModel;
pub use ;