1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
//! # Hierarchical Transformers
//!
//! This module implements hierarchical transformer architectures that process
//! sequences at multiple scales, enabling efficient handling of long sequences
//! and hierarchical pattern recognition.
//!
//! ## Key Concepts
//!
//! Hierarchical transformers operate on the principle of multi-scale processing:
//! - **Local Processing**: Fine-grained attention at token level
//! - **Regional Processing**: Medium-scale attention over token groups
//! - **Global Processing**: Coarse-grained attention over entire sequence
//!
//! ## Architecture Variants
//!
//! This module provides several hierarchical transformer variants:
//! - **Hierarchical Attention**: Multi-level attention with different scales
//! - **Pyramid Transformer**: Progressively coarsening representations
//! - **Nested Transformer**: Hierarchical encoder-decoder structures
//! - **Tree Transformer**: Tree-structured attention patterns
//! - **Hierarchical Memory**: Multi-scale memory mechanisms
//!
//! ## Applications
//!
//! Hierarchical transformers are particularly useful for:
//! - **Long Document Processing**: Efficient attention over very long sequences
//! - **Image Processing**: Multi-scale visual feature extraction
//! - **Speech Recognition**: Hierarchical audio pattern recognition
//! - **Code Understanding**: Multi-level program structure analysis
//! - **Scientific Documents**: Hierarchical text structure processing
//!
//! ## Performance Benefits
//!
//! - **Reduced Complexity**: O(n log n) instead of O(n²) for attention
//! - **Better Inductive Biases**: Natural hierarchical structure modeling
//! - **Improved Generalization**: Multi-scale feature learning
//! - **Memory Efficiency**: Hierarchical memory usage patterns
//!
//! ## Example Usage
//!
//! ```rust,no_run
//! use trustformers_models::hierarchical::{HierarchicalTransformer, HierarchicalConfig};
//!
//! let config = HierarchicalConfig {
//! hidden_size: 768,
//! num_levels: 4,
//! reduction_factor: 2,
//! num_heads: 12,
//! ..Default::default()
//! };
//!
//! let model = HierarchicalTransformer::new(config)?;
//! let output = model.forward(input_ids)?;
//! ```
pub use ;
pub use ;
pub use ;
pub use ;