Expand description
Transformer module for LLM support
This module provides:
- Autograd engine for automatic differentiation
- Transformer layer implementations (Attention, FFN, Norm)
- Model loading from Safetensors format
- Graph-structured Transformer inference
- KV Cache and batch inference optimizations
- Sparse attention optimizations
- Quantization support
- Performance optimizations (SIMD, memory pool, optimized kernels)
Re-exports§
pub use autograd::ComputeGraph;pub use autograd::DifferentiableTensor;pub use autograd::Op;pub use autograd::Optimizer;pub use layers::MultiHeadAttention;pub use layers::RMSNorm;pub use layers::LayerNorm;pub use layers::RoPE;pub use layers::FeedForward;pub use loader::SafetensorsLoader;pub use loader::ModelConfig;pub use model::LlamaModel;pub use model::LlamaConfig;pub use generation::GenerationConfig;pub use generation::TextGenerator;pub use graph_transformer::GraphExecutor;pub use graph_transformer::GraphTransformer;pub use graph_transformer::GraphNode;pub use graph_transformer::GraphEdge;pub use kv_cache::KVCache;pub use sparse_attention::SparseAttention;pub use batch::BatchInference;pub use quantization::QuantizedTensor;pub use quantization::QuantizationConfig;pub use perf::TransformerMemoryPool;pub use perf::softmax_inplace_simd;pub use perf::matmul_with_buffer;
Modules§
- autograd
- Autograd engine for automatic differentiation
- batch
- Batch inference module for efficient throughput
- generation
- Text generation utilities
- graph_
transformer - Graph-structured Transformer core module
- kv_
cache - KV Cache module for efficient autoregressive generation
- layers
- Transformer layer implementations
- loader
- Model loader for loading pre-trained weights
- model
- LLaMA model implementation
- optimization
- CAD-LLM Topology Optimization Module
- perf
- Performance optimization utilities for Transformer inference
- quantization
- Quantization module for efficient inference
- sparse_
attention - Sparse Attention module for efficient attention computation