Expand description
§TrustformeRS Models
This crate provides pre-trained transformer model implementations optimized for Rust, offering high-performance alternatives to Python-based implementations.
§Overview
TrustformeRS Models includes implementations of popular transformer architectures:
- BERT Family: BERT, RoBERTa, DistilBERT, ALBERT, ELECTRA, DeBERTa
- GPT Family: GPT-2, GPT-Neo, GPT-J
- Modern LLMs: LLaMA, Mistral, Gemma, Qwen, Falcon, StableLM, Command R, Claude
- Encoder-Decoder: T5
- Vision Models: Vision Transformer (ViT), CLIP
- Multimodal Models: BLIP-2, LLaVA, DALL-E, Flamingo
- Efficient Models: Mamba, RWKV, S4
§Features
Each model implementation includes:
- Pre-trained weight loading from Hugging Face Hub
- Task-specific heads (classification, generation, etc.)
- Efficient inference with SciRS2 backend
- Support for quantization and optimization
- Performance optimization utilities for production deployment
- Model serving infrastructure with load balancing and health monitoring
§Usage Example
use trustformers_models::{BertModel, BertConfig};
use trustformers_core::Result;
fn main() -> Result<()> {
// Load a pre-trained BERT model
let config = BertConfig::bert_base_uncased();
let mut model = BertModel::new(config)?;
// Load weights from Hugging Face
model.load_from_hub("bert-base-uncased")?;
// Use the model for inference
// ... tokenization and forward pass
Ok(())
}§Feature Flags
Models are gated behind feature flags to reduce compilation time:
[dependencies]
trustformers-models = { version = "*", features = ["bert", "gpt2"] }Available features:
bert: BERT model familyroberta: RoBERTa modelsdistilbert: DistilBERT modelsgpt2: GPT-2 modelsgpt_neo: GPT-Neo modelsgpt_j: GPT-J modelst5: T5 encoder-decoder modelsalbert: ALBERT modelselectra: ELECTRA modelsdeberta: DeBERTa modelsvit: Vision Transformer modelsllama: LLaMA modelsmistral: Mistral modelsclip: CLIP multimodal modelsllava: LLaVA visual instruction tuning modelsdalle: DALL-E text-to-image generation modelsflamingo: Flamingo few-shot vision-language modelsgemma: Gemma modelsqwen: Qwen modelsphi3: Phi-3 small language modelsmamba: Mamba state-space modelsrwkv: RWKV linear complexity modelsfalcon: Falcon high-performance language modelsclaude: Claude constitutional AI models
§Model Categories
§Encoder Models (BERT Family)
These models use bidirectional attention and are best for:
- Text classification
- Named entity recognition
- Question answering
- Feature extraction
§Decoder Models (GPT Family)
These models use causal (left-to-right) attention and excel at:
- Text generation
- Code completion
- Creative writing
- Conversational AI
§Encoder-Decoder Models (T5)
These models combine both architectures and are ideal for:
- Translation
- Summarization
- Question answering
- Text-to-text generation
§Vision Models
- ViT: Image classification and feature extraction
- CLIP: Multimodal understanding (text + image)
Re-exports§
pub use bert::BertConfig;pub use bert::BertForMaskedLM;pub use bert::BertForSequenceClassification;pub use bert::BertModel;pub use automated_model_design::ArchitectureTemplate;pub use automated_model_design::ConstraintSolver;pub use automated_model_design::DeploymentEnvironment;pub use automated_model_design::DesignPatternLibrary;pub use automated_model_design::DesignRequirements;pub use automated_model_design::DesignRequirementsBuilder;pub use automated_model_design::Modality;pub use automated_model_design::ModelDesign;pub use automated_model_design::ModelDesignMetadata;pub use automated_model_design::ModelDesigner;pub use automated_model_design::ModelMetrics;pub use automated_model_design::PerformanceTarget;pub use automated_model_design::ResourceConstraints;pub use automated_model_design::TaskType as DesignTaskType;pub use automated_model_design::TemplateMetadata;pub use claude::ClaudeConfig;pub use claude::ClaudeForCausalLM;pub use claude::ClaudeModel;pub use command_r::CommandRConfig;pub use command_r::CommandRForCausalLM;pub use command_r::CommandRModel;pub use common_patterns::components;pub use common_patterns::get_global_registry;pub use common_patterns::ArchitectureType;pub use common_patterns::ComputeRequirements;pub use common_patterns::EvaluableModel;pub use common_patterns::EvaluationData;pub use common_patterns::EvaluationMetric;pub use common_patterns::EvaluationResults;pub use common_patterns::GenerationConfig;pub use common_patterns::GenerationStrategy;pub use common_patterns::GenerativeModel;pub use common_patterns::InitializationStrategy;pub use common_patterns::MemoryEstimate;pub use common_patterns::ModelFamily;pub use common_patterns::ModelFamilyMetadata;pub use common_patterns::ModelRegistry;pub use common_patterns::ModelUtils;pub use common_patterns::TaskType as CommonTaskType;pub use comprehensive_testing::reporting;pub use comprehensive_testing::BiasMetric;pub use comprehensive_testing::BiasmitigationStrategy;pub use comprehensive_testing::FairnessAssessment;pub use comprehensive_testing::FairnessConfig;pub use comprehensive_testing::FairnessMetricType;pub use comprehensive_testing::FairnessResult;pub use comprehensive_testing::FairnessTestData;pub use comprehensive_testing::FairnessViolation;pub use comprehensive_testing::GroupData;pub use comprehensive_testing::LayerPerformance;pub use comprehensive_testing::MemoryAnalysis;pub use comprehensive_testing::ModelTestSuite;pub use comprehensive_testing::NumericalDifferences;pub use comprehensive_testing::NumericalParityResults;pub use comprehensive_testing::OverallPerformance;pub use comprehensive_testing::PerformanceProfiler;pub use comprehensive_testing::PerformanceResults;pub use comprehensive_testing::ReferenceComparator;pub use comprehensive_testing::StatisticalTest;pub use comprehensive_testing::TestDataType;pub use comprehensive_testing::TestInputConfig;pub use comprehensive_testing::TestResult;pub use comprehensive_testing::TestStatistics;pub use comprehensive_testing::ThroughputMeasurements;pub use comprehensive_testing::TimingInfo;pub use comprehensive_testing::ValidationConfig;pub use continual_learning::utils as continual_learning_utils;pub use continual_learning::ContinualLearningConfig;pub use continual_learning::ContinualLearningMetrics;pub use continual_learning::ContinualLearningOutput;pub use continual_learning::ContinualLearningTrainer;pub use continual_learning::ContinualStrategy;pub use continual_learning::LearningRateSchedule;pub use continual_learning::MemoryBuffer;pub use continual_learning::MemorySelectionStrategy;pub use continual_learning::TaskEvaluation;pub use continual_learning::TaskInfo;pub use creative_writing_specialized::CreativeWritingConfig;pub use creative_writing_specialized::CreativeWritingForCausalLM;pub use creative_writing_specialized::CreativeWritingModel;pub use creative_writing_specialized::CreativeWritingSpecialTokens;pub use creative_writing_specialized::EmotionalTone;pub use creative_writing_specialized::ImprovementType;pub use creative_writing_specialized::LiteraryDevice;pub use creative_writing_specialized::NarrativePerspective;pub use creative_writing_specialized::PoetryStyle;pub use creative_writing_specialized::StyleAnalysis;pub use creative_writing_specialized::WritingGenre;pub use creative_writing_specialized::WritingImprovement;pub use creative_writing_specialized::WritingStyle;pub use cross_attention::AdaptiveCrossAttention;pub use cross_attention::CrossAttention;pub use cross_attention::CrossAttentionConfig;pub use cross_attention::GatedCrossAttention;pub use cross_attention::HierarchicalCrossAttention;pub use cross_attention::MultiHeadCrossAttention;pub use cross_attention::SparseCrossAttention;pub use curriculum_learning::utils as curriculum_learning_utils;pub use curriculum_learning::CurriculumAnalysis;pub use curriculum_learning::CurriculumConfig;pub use curriculum_learning::CurriculumEpochOutput;pub use curriculum_learning::CurriculumExample;pub use curriculum_learning::CurriculumLearningOutput;pub use curriculum_learning::CurriculumLearningTrainer;pub use curriculum_learning::CurriculumStats;pub use curriculum_learning::CurriculumStrategy;pub use curriculum_learning::DifficultyMeasure;pub use curriculum_learning::PacingFunction;pub use error_recovery::ErrorCategory;pub use error_recovery::ErrorRecoveryManager;pub use error_recovery::ErrorTrends;pub use error_recovery::ModelCheckpoint;pub use error_recovery::RecoverableOperation;pub use error_recovery::RecoveryAttempt;pub use error_recovery::RecoveryConfig;pub use error_recovery::RecoveryMetrics;pub use error_recovery::RecoveryReport;pub use error_recovery::RecoveryStrategy;pub use falcon::FalconConfig;pub use falcon::FalconForCausalLM;pub use falcon::FalconModel;pub use fnet::FNetConfig;pub use fnet::FNetForMaskedLM;pub use fnet::FNetForSequenceClassification;pub use fnet::FNetModel;pub use hierarchical::HierarchicalConfig;pub use hierarchical::HierarchicalForLanguageModeling;pub use hierarchical::HierarchicalForSequenceClassification;pub use hierarchical::HierarchicalTransformer;pub use hierarchical::NestedTransformer;pub use hierarchical::PyramidTransformer;pub use hierarchical::TreeTransformer;pub use hybrid_architectures::AdaptiveConfig;pub use hybrid_architectures::ArchitecturalComponent;pub use hybrid_architectures::ArchitectureSummary;pub use hybrid_architectures::AttentionType;pub use hybrid_architectures::CNNArchitecture;pub use hybrid_architectures::CrossModalConfig;pub use hybrid_architectures::EnsembleMethod;pub use hybrid_architectures::FusionStrategy;pub use hybrid_architectures::GlobalParams;pub use hybrid_architectures::HierarchyType;pub use hybrid_architectures::HybridArchitecture;pub use hybrid_architectures::HybridConfig;pub use hybrid_architectures::HybridConfigBuilder;pub use hybrid_architectures::MemoryType;pub use hybrid_architectures::ParallelFusionMethod;pub use hybrid_architectures::RNNCellType;pub use hybrid_architectures::StateSpaceType;pub use hybrid_architectures::SwitchingCriteria;pub use hybrid_architectures::TransformerVariant;pub use hyena::HyenaConfig;pub use hyena::HyenaForLanguageModeling;pub use hyena::HyenaForSequenceClassification;pub use hyena::HyenaModel;pub use knowledge_distillation::utils as knowledge_distillation_utils;pub use knowledge_distillation::DistillationConfig;pub use knowledge_distillation::DistillationOutput;pub use knowledge_distillation::DistillationStrategy;pub use knowledge_distillation::KnowledgeDistillationTrainer;pub use knowledge_distillation::ProgressiveStage;pub use knowledge_distillation::StudentOutputs;pub use knowledge_distillation::TeacherOutputs;pub use legal_medical_specialized::Citation;pub use legal_medical_specialized::CitationType;pub use legal_medical_specialized::ComplianceReport;pub use legal_medical_specialized::ComplianceViolation;pub use legal_medical_specialized::DocumentAnalysis;pub use legal_medical_specialized::LegalMedicalConfig;pub use legal_medical_specialized::LegalMedicalDomain;pub use legal_medical_specialized::LegalMedicalForCausalLM;pub use legal_medical_specialized::LegalMedicalModel;pub use legal_medical_specialized::LegalMedicalSpecialTokens;pub use legal_medical_specialized::LegalSystem;pub use legal_medical_specialized::MedicalStandard;pub use legal_medical_specialized::PrivacyRequirement;pub use linformer::LinformerConfig;pub use linformer::LinformerForMaskedLM;pub use linformer::LinformerForSequenceClassification;pub use linformer::LinformerModel;pub use mamba::MambaConfig;pub use mamba::MambaModel;pub use meta_learning::utils as meta_learning_utils;pub use meta_learning::ConvergenceMetrics;pub use meta_learning::EpisodeResult;pub use meta_learning::EvaluationResult;pub use meta_learning::Example;pub use meta_learning::ExampleSet;pub use meta_learning::MetaAlgorithm;pub use meta_learning::MetaLearner;pub use meta_learning::MetaLearningConfig;pub use meta_learning::MetaLearningModel;pub use meta_learning::MetaOptimizer;pub use meta_learning::MetaStatistics;pub use meta_learning::PerformanceMetrics;pub use meta_learning::Task;pub use meta_learning::TaskBatch;pub use meta_learning::TaskResult;pub use meta_learning::TaskSampler;pub use meta_learning::TaskType as MetaTaskType;pub use mixed_bit_quantization::BitAllocationStrategy;pub use mixed_bit_quantization::CalibrationConfig;pub use mixed_bit_quantization::CalibrationMethod;pub use mixed_bit_quantization::HardwareConstraints as QuantizationHardwareConstraints;pub use mixed_bit_quantization::HardwarePlatform as QuantizationHardwarePlatform;pub use mixed_bit_quantization::LayerQuantizationConstraints;pub use mixed_bit_quantization::MixedBitQuantizationConfig;pub use mixed_bit_quantization::MixedBitQuantizer;pub use mixed_bit_quantization::ProgressiveQuantizationConfig;pub use mixed_bit_quantization::QuantizationFormat;pub use mixed_bit_quantization::QuantizationParams;pub use mixed_bit_quantization::QuantizationQualityMetrics;pub use mixed_bit_quantization::QuantizationResults;pub use mixed_bit_quantization::QuantizedLayerInfo;pub use mixed_bit_quantization::SensitivityAnalysisResults;pub use model_compression::utils as model_compression_utils;pub use model_compression::ClusteringMethod;pub use model_compression::CompressedModel;pub use model_compression::CompressionAnalysis;pub use model_compression::CompressionConfig;pub use model_compression::CompressionPipeline;pub use model_compression::CompressionStrategy;pub use model_compression::CompressionSummary;pub use model_compression::DecompositionType;pub use model_compression::LayerCompressionStats;pub use model_compression::OptimizationObjective;pub use model_compression::PruningStrategy;pub use model_compression::StructuredPruningGranularity;pub use model_serving::InferenceRequest;pub use model_serving::InferenceResponse;pub use model_serving::LoadBalancer;pub use model_serving::LoadBalancingStrategy;pub use model_serving::ModelInstance;pub use model_serving::ModelServingManager;pub use model_serving::RequestPriority;pub use model_serving::RequestQueue;pub use model_serving::ServingConfig;pub use model_serving::ServingMetrics;pub use moe::glam_config;pub use moe::switch_config;pub use moe::Expert;pub use moe::ExpertParallel;pub use moe::MLPExpert;pub use moe::MoEConfig;pub use moe::RouterOutput;pub use moe::RoutingStats;pub use moe::SparseMoE;pub use moe::SwitchMoE;pub use moe::TopKRouter;pub use multi_task_learning::utils as multi_task_learning_utils;pub use multi_task_learning::LossBalancingStrategy;pub use multi_task_learning::MTLAnalysis;pub use multi_task_learning::MTLArchitecture;pub use multi_task_learning::MTLConfig;pub use multi_task_learning::MTLStats;pub use multi_task_learning::MultiTaskEvaluation;pub use multi_task_learning::MultiTaskLearningTrainer;pub use multi_task_learning::MultiTaskOutput;pub use multi_task_learning::TaskConfig;pub use multi_task_learning::TaskEvaluation as MTLTaskEvaluation;pub use multi_task_learning::TaskPriority;pub use multi_task_learning::TaskType as MTLTaskType;pub use neural_architecture_search::Architecture;pub use neural_architecture_search::ArchitectureConstraint;pub use neural_architecture_search::ArchitectureEvaluation;pub use neural_architecture_search::ArchitectureMetadata;pub use neural_architecture_search::DimensionRange;pub use neural_architecture_search::HardwareConstraints;pub use neural_architecture_search::HardwarePlatform;pub use neural_architecture_search::NASConfig;pub use neural_architecture_search::NeuralArchitectureSearcher;pub use neural_architecture_search::OptimizationObjective as NASOptimizationObjective;pub use neural_architecture_search::SearchSpace;pub use neural_architecture_search::SearchStatistics;pub use neural_architecture_search::SearchStrategy;pub use performance_optimization::BatchProcessor;pub use performance_optimization::BatchingStrategy;pub use performance_optimization::CachedTensor;pub use performance_optimization::DynamicBatchManager;pub use performance_optimization::GpuCacheStatistics;pub use performance_optimization::GpuMemoryChunk;pub use performance_optimization::GpuMemoryOptimizer;pub use performance_optimization::GpuMemoryPool;pub use performance_optimization::GpuMemoryStats;pub use performance_optimization::GpuOptimizationRecommendations;pub use performance_optimization::GpuTensorCache;pub use performance_optimization::MemoryOptimizer;pub use performance_optimization::PerformanceConfig;pub use performance_optimization::PerformanceMonitor;pub use performance_optimization::PerformanceStatistics;pub use performer::PerformerConfig;pub use performer::PerformerForMaskedLM;pub use performer::PerformerForSequenceClassification;pub use performer::PerformerModel;pub use progressive_training::utils as progressive_training_utils;pub use progressive_training::GrowthDimension;pub use progressive_training::GrowthEvent;pub use progressive_training::GrowthInfo;pub use progressive_training::GrowthResult;pub use progressive_training::GrowthSchedule;pub use progressive_training::GrowthStrategy;pub use progressive_training::LearningProgress;pub use progressive_training::ProgressiveConfig;pub use progressive_training::ProgressiveModel;pub use progressive_training::ProgressiveTrainer;pub use retnet::RetNetConfig;pub use retnet::RetNetForLanguageModeling;pub use retnet::RetNetForSequenceClassification;pub use retnet::RetNetModel;pub use rwkv::RwkvConfig;pub use rwkv::RwkvModel;pub use s4::S4Config;pub use s4::S4ForLanguageModeling;pub use s4::S4Model;pub use scientific_specialized::CitationStyle;pub use scientific_specialized::ScientificAnalysis;pub use scientific_specialized::ScientificConfig;pub use scientific_specialized::ScientificDomain;pub use scientific_specialized::ScientificForCausalLM;pub use scientific_specialized::ScientificModel;pub use scientific_specialized::ScientificSpecialTokens;pub use sparse_attention::utils as sparse_attention_utils;pub use sparse_attention::SparseAttention;pub use sparse_attention::SparseAttentionConfig;pub use sparse_attention::SparseAttentionMask;pub use sparse_attention::SparsePattern;pub use stablelm::StableLMConfig;pub use stablelm::StableLMForCausalLM;pub use stablelm::StableLMModel;pub use weight_loading::auto_create_loader;pub use weight_loading::create_distributed_loader;pub use weight_loading::create_gguf_loader;pub use weight_loading::create_huggingface_loader;pub use weight_loading::create_memory_mapped_loader;pub use weight_loading::DistributedStats;pub use weight_loading::DistributedWeightLoader;pub use weight_loading::GGMLType;pub use weight_loading::GGUFLoader;pub use weight_loading::HuggingFaceLoader;pub use weight_loading::LazyTensor;pub use weight_loading::MemoryMappedLoader;pub use weight_loading::QuantizationConfig;pub use weight_loading::StreamingLoader;pub use weight_loading::TensorMetadata;pub use weight_loading::WeightDataType;pub use weight_loading::WeightFormat;pub use weight_loading::WeightLoader;pub use weight_loading::WeightLoadingConfig;pub use xlstm::ExponentialGatingConfig;pub use xlstm::FeedForward;pub use xlstm::MLstmBlock;pub use xlstm::MLstmConfig;pub use xlstm::SLstmBlock;pub use xlstm::SLstmConfig;pub use xlstm::XLSTMBlockConfig;pub use xlstm::XLSTMBlockType;pub use xlstm::XLSTMConfig;pub use xlstm::XLSTMForCausalLM;pub use xlstm::XLSTMForSequenceClassification;pub use xlstm::XLSTMLayer;pub use xlstm::XLSTMModel;pub use xlstm::XLSTMState;pub use dynamic_pruning::*;
Modules§
- advanced_
quantization - automated_
model_ design - Automated Model Design Framework
- batch_
inference - Batch Inference Utilities for Trustformers Models
- benchmarking
- Simplified Model Benchmarking Suite
- bert
- BERT (Bidirectional Encoder Representations from Transformers)
- biologically_
inspired - claude
- Claude (Anthropic’s Constitutional AI)
- cogvlm
- CogVLM: Visual Expert for Pretrained Language Models
- command_
r - common_
patterns - Common Model Architecture Patterns and Traits
- comprehensive_
testing - Comprehensive Model Testing and Validation Framework
- continual_
learning - Continual Learning Framework
- creative_
writing_ specialized - Creative Writing Domain-Specialized Models
- cross_
attention - Cross-Attention Variants
- curriculum_
learning - Curriculum Learning Framework
- developer_
tools - Developer Tools and Code Generation
- dynamic_
pruning - Dynamic token pruning for efficient transformer inference.
- error_
recovery - Comprehensive Error Recovery Framework for TrustformeRS Models
- falcon
- Falcon - Technology Innovation Institute Language Models
- fnet
- generation_
utils - Generation Utilities for Trustformers Models
- hierarchical
- Hierarchical Transformers
- hybrid_
architectures - Hybrid Architectures Framework
- hyena
- knowledge_
distillation - Knowledge Distillation Framework
- legal_
medical_ specialized - Legal and Medical Domain-Specialized Models
- linformer
- mamba
- memory_
profiling - Memory Profiling Module for TrustformeRS Models
- meta_
learning - Meta-Learning Module
- mixed_
bit_ quantization - Mixed-Bit Quantization Framework
- model_
cards - Model Cards for TrustformeRS
- model_
compression - Model Compression Toolkit
- model_
serving - Model Serving Utilities
- moe
- multi_
task_ learning - Multi-Task Learning Framework
- neural_
architecture_ search - Neural Architecture Search (NAS) Framework
- numerical_
parity_ tests - Numerical parity tests to ensure our implementations match reference outputs
- performance_
optimization - Performance Optimization Utilities
- performer
- progressive_
training - Progressive Training Module
- quantum_
classical_ hybrids - recursive
- Recursive Transformers for Long Sequences
- retnet
- ring_
attention - rwkv
- s4
- S4 (Structured State Space) Model Implementation
- scientific_
specialized - Scientific Domain-Specialized Models
- sparse_
attention - Sparse Attention Patterns Library
- stablelm
- StableLM Model Implementation
- weight_
loading - xlstm
- Extended LSTM (xLSTM) Implementation