Skip to main content

Crate tensorlogic_trustformers

Crate tensorlogic_trustformers 

Source
Expand description

§Tensorlogic-Trustformers

Version: 0.1.0-beta.1 | Status: Production Ready

Transform transformer architectures into TensorLogic IR using einsum operations.

This crate provides implementations of transformer components (self-attention, multi-head attention, feed-forward networks) as einsum graphs that can be compiled and executed on various TensorLogic backends.

§Features

  • Self-Attention: Scaled dot-product attention as einsum operations
  • Multi-Head Attention: Parallel attention heads with head splitting
  • Feed-Forward Networks: Position-wise FFN with configurable activations
  • Gated FFN: GLU-style gated feed-forward networks
  • Einsum-Native: All operations expressed as einsum for maximum flexibility

§Architecture

Transformer components are decomposed into einsum operations:

§Self-Attention

scores = einsum("bqd,bkd->bqk", Q, K) / sqrt(d_k)
attn = softmax(scores, dim=-1)
output = einsum("bqk,bkv->bqv", attn, V)

§Multi-Head Attention

Q, K, V = [batch, seq, d_model] -> [batch, n_heads, seq, d_k]
scores = einsum("bhqd,bhkd->bhqk", Q, K) / sqrt(d_k)
attn = softmax(scores, dim=-1)
output = einsum("bhqk,bhkv->bhqv", attn, V)
output = reshape([batch, seq, d_model])

§Feed-Forward Network

h1 = einsum("bsd,df->bsf", x, W1) + b1
h2 = activation(h1)
output = einsum("bsf,fd->bsd", h2, W2) + b2

§Example Usage

use tensorlogic_trustformers::{
    AttentionConfig, SelfAttention, MultiHeadAttention,
    FeedForwardConfig, FeedForward,
};
use tensorlogic_ir::EinsumGraph;

// Configure self-attention
let attn_config = AttentionConfig::new(512, 8).unwrap();
let self_attn = SelfAttention::new(attn_config.clone()).unwrap();

// Build einsum graph
let mut graph = EinsumGraph::new();
graph.add_tensor("Q");
graph.add_tensor("K");
graph.add_tensor("V");

let outputs = self_attn.build_attention_graph(&mut graph).unwrap();

// Configure multi-head attention
let mha = MultiHeadAttention::new(attn_config).unwrap();
let mut mha_graph = EinsumGraph::new();
mha_graph.add_tensor("Q");
mha_graph.add_tensor("K");
mha_graph.add_tensor("V");

let mha_outputs = mha.build_mha_graph(&mut mha_graph).unwrap();

// Configure feed-forward network
let ffn_config = FeedForwardConfig::new(512, 2048)
    .with_activation("gelu");
let ffn = FeedForward::new(ffn_config).unwrap();

let mut ffn_graph = EinsumGraph::new();
ffn_graph.add_tensor("x");
ffn_graph.add_tensor("W1");
ffn_graph.add_tensor("b1");
ffn_graph.add_tensor("W2");
ffn_graph.add_tensor("b2");

let ffn_outputs = ffn.build_ffn_graph(&mut ffn_graph).unwrap();

§Integration with TensorLogic

The einsum graphs produced by this crate can be:

  • Compiled with tensorlogic-compiler
  • Executed on tensorlogic-scirs-backend or other backends
  • Optimized using graph optimization passes
  • Combined with logical rules for interpretable transformers

§Design Philosophy

This crate follows the TensorLogic principle of expressing neural operations as tensor contractions (einsum), enabling:

  1. Backend Independence: Same graph works on CPU, GPU, TPU
  2. Optimization Opportunities: Graph-level optimizations like fusion
  3. Interpretability: Clear mathematical semantics
  4. Composability: Mix transformer layers with logical rules

Re-exports§

pub use attention::MultiHeadAttention;
pub use attention::SelfAttention;
pub use checkpointing::CheckpointConfig;
pub use checkpointing::CheckpointStrategy;
pub use config::AttentionConfig;
pub use config::FeedForwardConfig;
pub use config::TransformerLayerConfig;
pub use decoder::Decoder;
pub use decoder::DecoderConfig;
pub use encoder::Encoder;
pub use encoder::EncoderConfig;
pub use error::Result;
pub use error::TrustformerError;
pub use ffn::FeedForward;
pub use ffn::GatedFeedForward;
pub use flash_attention::FlashAttention;
pub use flash_attention::FlashAttentionConfig;
pub use flash_attention::FlashAttentionPreset;
pub use flash_attention::FlashAttentionStats;
pub use flash_attention::FlashAttentionV2Config;
pub use gqa::GQAConfig;
pub use gqa::GQAPreset;
pub use gqa::GQAStats;
pub use gqa::GroupedQueryAttention;
pub use kv_cache::CacheStats;
pub use kv_cache::KVCache;
pub use kv_cache::KVCacheConfig;
pub use layers::DecoderLayer;
pub use layers::DecoderLayerConfig;
pub use layers::EncoderLayer;
pub use layers::EncoderLayerConfig;
pub use lora::LoRAAttention;
pub use lora::LoRAConfig;
pub use lora::LoRALinear;
pub use lora::LoRAPreset;
pub use lora::LoRAStats;
pub use moe::MoeConfig;
pub use moe::MoeLayer;
pub use moe::MoePreset;
pub use moe::MoeStats;
pub use moe::RouterType;
pub use normalization::LayerNorm;
pub use normalization::LayerNormConfig;
pub use normalization::RMSNorm;
pub use patterns::AttentionMask;
pub use patterns::BlockSparseMask;
pub use patterns::CausalMask;
pub use patterns::GlobalLocalMask;
pub use patterns::LocalMask;
pub use patterns::RuleBasedMask;
pub use patterns::RulePattern;
pub use patterns::StridedMask;
pub use position::AlibiPositionEncoding;
pub use position::LearnedPositionEncoding;
pub use position::PositionEncodingConfig;
pub use position::PositionEncodingType;
pub use position::RelativePositionEncoding;
pub use position::RotaryPositionEncoding;
pub use position::SinusoidalPositionEncoding;
pub use presets::ModelPreset;
pub use rule_attention::RuleAttentionConfig;
pub use rule_attention::RuleAttentionType;
pub use rule_attention::RuleBasedAttention;
pub use rule_attention::StructuredAttention;
pub use sliding_window::SlidingWindowAttention;
pub use sliding_window::SlidingWindowConfig;
pub use sliding_window::SlidingWindowPreset;
pub use sliding_window::SlidingWindowStats;
pub use sparse_attention::LocalAttention;
pub use sparse_attention::SparseAttention;
pub use sparse_attention::SparseAttentionConfig;
pub use sparse_attention::SparsePatternType;
pub use stacks::DecoderStack;
pub use stacks::DecoderStackConfig;
pub use stacks::EncoderStack;
pub use stacks::EncoderStackConfig;
pub use trustformers_integration::CheckpointData;
pub use trustformers_integration::IntegrationConfig;
pub use trustformers_integration::ModelConfig;
pub use trustformers_integration::TensorLogicModel;
pub use trustformers_integration::TrustformersConverter;
pub use trustformers_integration::TrustformersWeightLoader;
pub use utils::decoder_stack_stats;
pub use utils::encoder_stack_stats;
pub use utils::ModelStats;
pub use vision::PatchEmbedding;
pub use vision::PatchEmbeddingConfig;
pub use vision::ViTPreset;
pub use vision::VisionTransformer;
pub use vision::VisionTransformerConfig;

Modules§

attention
Self-attention and multi-head attention as einsum operations.
checkpointing
Gradient checkpointing for memory-efficient training.
config
Configuration structures for transformer components.
decoder
Transformer decoder layers.
encoder
Transformer encoder layers.
error
Error types for tensorlogic-trustformers.
ffn
Feed-forward network layers as einsum operations.
flash_attention
Flash Attention
gqa
Grouped-Query Attention (GQA)
kv_cache
Key-Value cache for efficient autoregressive inference.
layers
Complete transformer encoder and decoder layers.
lora
LoRA (Low-Rank Adaptation)
moe
Mixture-of-Experts (MoE) layers for sparse transformer models
normalization
Layer normalization for transformer models.
patterns
Rule-based and sparse attention patterns.
position
Position encoding implementations for transformer models.
presets
Model presets for common transformer architectures.
rule_attention
Rule-based attention patterns for interpretable transformers.
sliding_window
Sliding Window Attention
sparse_attention
Sparse attention patterns for efficient long-sequence processing.
stacks
Transformer encoder and decoder stacks.
trustformers_integration
Integration layer between TensorLogic and TrustformeRS.
utils
Utility functions for transformer models.
vision
Vision Transformer (ViT) components for image processing

Functions§

self_attention_as_rulesDeprecated

Type Aliases§

AttnSpecDeprecated