Expand description
§Tensorlogic-Trustformers
Version: 0.1.0-beta.1 | Status: Production Ready
Transform transformer architectures into TensorLogic IR using einsum operations.
This crate provides implementations of transformer components (self-attention, multi-head attention, feed-forward networks) as einsum graphs that can be compiled and executed on various TensorLogic backends.
§Features
- Self-Attention: Scaled dot-product attention as einsum operations
- Multi-Head Attention: Parallel attention heads with head splitting
- Feed-Forward Networks: Position-wise FFN with configurable activations
- Gated FFN: GLU-style gated feed-forward networks
- Einsum-Native: All operations expressed as einsum for maximum flexibility
§Architecture
Transformer components are decomposed into einsum operations:
§Self-Attention
scores = einsum("bqd,bkd->bqk", Q, K) / sqrt(d_k)
attn = softmax(scores, dim=-1)
output = einsum("bqk,bkv->bqv", attn, V)§Multi-Head Attention
Q, K, V = [batch, seq, d_model] -> [batch, n_heads, seq, d_k]
scores = einsum("bhqd,bhkd->bhqk", Q, K) / sqrt(d_k)
attn = softmax(scores, dim=-1)
output = einsum("bhqk,bhkv->bhqv", attn, V)
output = reshape([batch, seq, d_model])§Feed-Forward Network
h1 = einsum("bsd,df->bsf", x, W1) + b1
h2 = activation(h1)
output = einsum("bsf,fd->bsd", h2, W2) + b2§Example Usage
use tensorlogic_trustformers::{
AttentionConfig, SelfAttention, MultiHeadAttention,
FeedForwardConfig, FeedForward,
};
use tensorlogic_ir::EinsumGraph;
// Configure self-attention
let attn_config = AttentionConfig::new(512, 8).unwrap();
let self_attn = SelfAttention::new(attn_config.clone()).unwrap();
// Build einsum graph
let mut graph = EinsumGraph::new();
graph.add_tensor("Q");
graph.add_tensor("K");
graph.add_tensor("V");
let outputs = self_attn.build_attention_graph(&mut graph).unwrap();
// Configure multi-head attention
let mha = MultiHeadAttention::new(attn_config).unwrap();
let mut mha_graph = EinsumGraph::new();
mha_graph.add_tensor("Q");
mha_graph.add_tensor("K");
mha_graph.add_tensor("V");
let mha_outputs = mha.build_mha_graph(&mut mha_graph).unwrap();
// Configure feed-forward network
let ffn_config = FeedForwardConfig::new(512, 2048)
.with_activation("gelu");
let ffn = FeedForward::new(ffn_config).unwrap();
let mut ffn_graph = EinsumGraph::new();
ffn_graph.add_tensor("x");
ffn_graph.add_tensor("W1");
ffn_graph.add_tensor("b1");
ffn_graph.add_tensor("W2");
ffn_graph.add_tensor("b2");
let ffn_outputs = ffn.build_ffn_graph(&mut ffn_graph).unwrap();§Integration with TensorLogic
The einsum graphs produced by this crate can be:
- Compiled with
tensorlogic-compiler - Executed on
tensorlogic-scirs-backendor other backends - Optimized using graph optimization passes
- Combined with logical rules for interpretable transformers
§Design Philosophy
This crate follows the TensorLogic principle of expressing neural operations as tensor contractions (einsum), enabling:
- Backend Independence: Same graph works on CPU, GPU, TPU
- Optimization Opportunities: Graph-level optimizations like fusion
- Interpretability: Clear mathematical semantics
- Composability: Mix transformer layers with logical rules
Re-exports§
pub use attention::MultiHeadAttention;pub use attention::SelfAttention;pub use checkpointing::CheckpointConfig;pub use checkpointing::CheckpointStrategy;pub use config::AttentionConfig;pub use config::FeedForwardConfig;pub use config::TransformerLayerConfig;pub use decoder::Decoder;pub use decoder::DecoderConfig;pub use encoder::Encoder;pub use encoder::EncoderConfig;pub use error::Result;pub use error::TrustformerError;pub use ffn::FeedForward;pub use ffn::GatedFeedForward;pub use flash_attention::FlashAttention;pub use flash_attention::FlashAttentionConfig;pub use flash_attention::FlashAttentionPreset;pub use flash_attention::FlashAttentionStats;pub use flash_attention::FlashAttentionV2Config;pub use gqa::GQAConfig;pub use gqa::GQAPreset;pub use gqa::GQAStats;pub use gqa::GroupedQueryAttention;pub use kv_cache::CacheStats;pub use kv_cache::KVCache;pub use kv_cache::KVCacheConfig;pub use layers::DecoderLayer;pub use layers::DecoderLayerConfig;pub use layers::EncoderLayer;pub use layers::EncoderLayerConfig;pub use lora::LoRAAttention;pub use lora::LoRAConfig;pub use lora::LoRALinear;pub use lora::LoRAPreset;pub use lora::LoRAStats;pub use moe::MoeConfig;pub use moe::MoeLayer;pub use moe::MoePreset;pub use moe::MoeStats;pub use moe::RouterType;pub use normalization::LayerNorm;pub use normalization::LayerNormConfig;pub use normalization::RMSNorm;pub use patterns::AttentionMask;pub use patterns::BlockSparseMask;pub use patterns::CausalMask;pub use patterns::GlobalLocalMask;pub use patterns::LocalMask;pub use patterns::RuleBasedMask;pub use patterns::RulePattern;pub use patterns::StridedMask;pub use position::AlibiPositionEncoding;pub use position::LearnedPositionEncoding;pub use position::PositionEncodingConfig;pub use position::PositionEncodingType;pub use position::RelativePositionEncoding;pub use position::RotaryPositionEncoding;pub use position::SinusoidalPositionEncoding;pub use presets::ModelPreset;pub use rule_attention::RuleAttentionConfig;pub use rule_attention::RuleAttentionType;pub use rule_attention::RuleBasedAttention;pub use rule_attention::StructuredAttention;pub use sliding_window::SlidingWindowAttention;pub use sliding_window::SlidingWindowConfig;pub use sliding_window::SlidingWindowPreset;pub use sliding_window::SlidingWindowStats;pub use sparse_attention::LocalAttention;pub use sparse_attention::SparseAttention;pub use sparse_attention::SparseAttentionConfig;pub use sparse_attention::SparsePatternType;pub use stacks::DecoderStack;pub use stacks::DecoderStackConfig;pub use stacks::EncoderStack;pub use stacks::EncoderStackConfig;pub use trustformers_integration::CheckpointData;pub use trustformers_integration::IntegrationConfig;pub use trustformers_integration::ModelConfig;pub use trustformers_integration::TensorLogicModel;pub use trustformers_integration::TrustformersConverter;pub use trustformers_integration::TrustformersWeightLoader;pub use utils::decoder_stack_stats;pub use utils::encoder_stack_stats;pub use utils::ModelStats;pub use vision::PatchEmbedding;pub use vision::PatchEmbeddingConfig;pub use vision::ViTPreset;pub use vision::VisionTransformer;pub use vision::VisionTransformerConfig;
Modules§
- attention
- Self-attention and multi-head attention as einsum operations.
- checkpointing
- Gradient checkpointing for memory-efficient training.
- config
- Configuration structures for transformer components.
- decoder
- Transformer decoder layers.
- encoder
- Transformer encoder layers.
- error
- Error types for tensorlogic-trustformers.
- ffn
- Feed-forward network layers as einsum operations.
- flash_
attention - Flash Attention
- gqa
- Grouped-Query Attention (GQA)
- kv_
cache - Key-Value cache for efficient autoregressive inference.
- layers
- Complete transformer encoder and decoder layers.
- lora
- LoRA (Low-Rank Adaptation)
- moe
- Mixture-of-Experts (MoE) layers for sparse transformer models
- normalization
- Layer normalization for transformer models.
- patterns
- Rule-based and sparse attention patterns.
- position
- Position encoding implementations for transformer models.
- presets
- Model presets for common transformer architectures.
- rule_
attention - Rule-based attention patterns for interpretable transformers.
- sliding_
window - Sliding Window Attention
- sparse_
attention - Sparse attention patterns for efficient long-sequence processing.
- stacks
- Transformer encoder and decoder stacks.
- trustformers_
integration - Integration layer between TensorLogic and TrustformeRS.
- utils
- Utility functions for transformer models.
- vision
- Vision Transformer (ViT) components for image processing
Functions§
- self_
attention_ as_ rules Deprecated
Type Aliases§
- Attn
Spec Deprecated