Crate tensorlogic_trustformers

Expand description

§Tensorlogic-Trustformers

Version: 0.1.0-rc.1 | Status: Production Ready

Transform transformer architectures into TensorLogic IR using einsum operations.

This crate provides implementations of transformer components (self-attention, multi-head attention, feed-forward networks) as einsum graphs that can be compiled and executed on various TensorLogic backends.

§Features

Self-Attention: Scaled dot-product attention as einsum operations
Multi-Head Attention: Parallel attention heads with head splitting
Feed-Forward Networks: Position-wise FFN with configurable activations
Gated FFN: GLU-style gated feed-forward networks
Einsum-Native: All operations expressed as einsum for maximum flexibility

§Architecture

Transformer components are decomposed into einsum operations:

§Self-Attention

scores = einsum("bqd,bkd->bqk", Q, K) / sqrt(d_k)
attn = softmax(scores, dim=-1)
output = einsum("bqk,bkv->bqv", attn, V)

§Multi-Head Attention

Q, K, V = [batch, seq, d_model] -> [batch, n_heads, seq, d_k]
scores = einsum("bhqd,bhkd->bhqk", Q, K) / sqrt(d_k)
attn = softmax(scores, dim=-1)
output = einsum("bhqk,bhkv->bhqv", attn, V)
output = reshape([batch, seq, d_model])

§Feed-Forward Network

h1 = einsum("bsd,df->bsf", x, W1) + b1
h2 = activation(h1)
output = einsum("bsf,fd->bsd", h2, W2) + b2

§Example Usage

use tensorlogic_trustformers::{
    AttentionConfig, SelfAttention, MultiHeadAttention,
    FeedForwardConfig, FeedForward,
};
use tensorlogic_ir::EinsumGraph;

// Configure self-attention
let attn_config = AttentionConfig::new(512, 8).unwrap();
let self_attn = SelfAttention::new(attn_config.clone()).unwrap();

// Build einsum graph
let mut graph = EinsumGraph::new();
graph.add_tensor("Q");
graph.add_tensor("K");
graph.add_tensor("V");

let outputs = self_attn.build_attention_graph(&mut graph).unwrap();

// Configure multi-head attention
let mha = MultiHeadAttention::new(attn_config).unwrap();
let mut mha_graph = EinsumGraph::new();
mha_graph.add_tensor("Q");
mha_graph.add_tensor("K");
mha_graph.add_tensor("V");

let mha_outputs = mha.build_mha_graph(&mut mha_graph).unwrap();

// Configure feed-forward network
let ffn_config = FeedForwardConfig::new(512, 2048)
    .with_activation("gelu");
let ffn = FeedForward::new(ffn_config).unwrap();

let mut ffn_graph = EinsumGraph::new();
ffn_graph.add_tensor("x");
ffn_graph.add_tensor("W1");
ffn_graph.add_tensor("b1");
ffn_graph.add_tensor("W2");
ffn_graph.add_tensor("b2");

let ffn_outputs = ffn.build_ffn_graph(&mut ffn_graph).unwrap();

§Integration with TensorLogic

The einsum graphs produced by this crate can be:

Compiled with tensorlogic-compiler
Executed on tensorlogic-scirs-backend or other backends
Optimized using graph optimization passes
Combined with logical rules for interpretable transformers

§Design Philosophy

This crate follows the TensorLogic principle of expressing neural operations as tensor contractions (einsum), enabling:

Backend Independence: Same graph works on CPU, GPU, TPU
Optimization Opportunities: Graph-level optimizations like fusion
Interpretability: Clear mathematical semantics
Composability: Mix transformer layers with logical rules

Re-exports§

pub use attention::MultiHeadAttention;
pub use attention::SelfAttention;
pub use checkpointing::CheckpointConfig;
pub use checkpointing::CheckpointStrategy;
pub use config::AttentionConfig;
pub use config::FeedForwardConfig;
pub use config::TransformerLayerConfig;
pub use decoder::Decoder;
pub use decoder::DecoderConfig;
pub use encoder::Encoder;
pub use encoder::EncoderConfig;
pub use error::Result;
pub use error::TrustformerError;
pub use ffn::FeedForward;
pub use ffn::GatedFeedForward;
pub use flash_attention::FlashAttention;
pub use flash_attention::FlashAttentionConfig;
pub use flash_attention::FlashAttentionPreset;
pub use flash_attention::FlashAttentionStats;
pub use flash_attention::FlashAttentionV2Config;
pub use gqa::GQAConfig;
pub use gqa::GQAPreset;
pub use gqa::GQAStats;
pub use gqa::GroupedQueryAttention;
pub use kv_cache::CacheStats;
pub use kv_cache::KVCache;
pub use kv_cache::KVCacheConfig;
pub use layers::DecoderLayer;
pub use layers::DecoderLayerConfig;
pub use layers::EncoderLayer;
pub use layers::EncoderLayerConfig;
pub use lora::LoRAAttention;
pub use lora::LoRAConfig;
pub use lora::LoRALinear;
pub use lora::LoRAPreset;
pub use lora::LoRAStats;
pub use moe::MoeConfig;
pub use moe::MoeLayer;
pub use moe::MoePreset;
pub use moe::MoeStats;
pub use moe::RouterType;
pub use normalization::LayerNorm;
pub use normalization::LayerNormConfig;
pub use normalization::RMSNorm;
pub use patterns::AttentionMask;
pub use patterns::BlockSparseMask;
pub use patterns::CausalMask;
pub use patterns::GlobalLocalMask;
pub use patterns::LocalMask;
pub use patterns::RuleBasedMask;
pub use patterns::RulePattern;
pub use patterns::StridedMask;
pub use position::AlibiPositionEncoding;
pub use position::LearnedPositionEncoding;
pub use position::PositionEncodingConfig;
pub use position::PositionEncodingType;
pub use position::RelativePositionEncoding;
pub use position::RotaryPositionEncoding;
pub use position::SinusoidalPositionEncoding;
pub use presets::ModelPreset;
pub use rule_attention::RuleAttentionConfig;
pub use rule_attention::RuleAttentionType;
pub use rule_attention::RuleBasedAttention;
pub use rule_attention::StructuredAttention;
pub use sliding_window::SlidingWindowAttention;
pub use sliding_window::SlidingWindowConfig;
pub use sliding_window::SlidingWindowPreset;
pub use sliding_window::SlidingWindowStats;
pub use sparse_attention::LocalAttention;
pub use sparse_attention::SparseAttention;
pub use sparse_attention::SparseAttentionConfig;
pub use sparse_attention::SparsePatternType;
pub use stacks::DecoderStack;
pub use stacks::DecoderStackConfig;
pub use stacks::EncoderStack;
pub use stacks::EncoderStackConfig;
pub use trustformers_integration::CheckpointData;
pub use trustformers_integration::IntegrationConfig;
pub use trustformers_integration::ModelConfig;
pub use trustformers_integration::TensorLogicModel;
pub use trustformers_integration::TrustformersConverter;
pub use trustformers_integration::TrustformersWeightLoader;
pub use utils::decoder_stack_stats;
pub use utils::encoder_stack_stats;
pub use utils::ModelStats;
pub use vision::PatchEmbedding;
pub use vision::PatchEmbeddingConfig;
pub use vision::ViTPreset;
pub use vision::VisionTransformer;
pub use vision::VisionTransformerConfig;

Modules§

attention: Self-attention and multi-head attention as einsum operations.
checkpointing: Gradient checkpointing for memory-efficient training.
config: Configuration structures for transformer components.
decoder: Transformer decoder layers.
encoder: Transformer encoder layers.
error: Error types for tensorlogic-trustformers.
ffn: Feed-forward network layers as einsum operations.
flash_attention: Flash Attention
gqa: Grouped-Query Attention (GQA)
kv_cache: Key-Value cache for efficient autoregressive inference.
layers: Complete transformer encoder and decoder layers.
lora: LoRA (Low-Rank Adaptation)
moe: Mixture-of-Experts (MoE) layers for sparse transformer models
normalization: Layer normalization for transformer models.
patterns: Rule-based and sparse attention patterns.
position: Position encoding implementations for transformer models.
presets: Model presets for common transformer architectures.
rule_attention: Rule-based attention patterns for interpretable transformers.
sliding_window: Sliding Window Attention
sparse_attention: Sparse attention patterns for efficient long-sequence processing.
stacks: Transformer encoder and decoder stacks.
trustformers_integration: Integration layer between TensorLogic and TrustformeRS.
utils: Utility functions for transformer models.
vision: Vision Transformer (ViT) components for image processing

Functions§

self_attention_as_rulesDeprecated

Type Aliases§

AttnSpecDeprecated