Skip to main content

Crate trustformers_models

Crate trustformers_models 

Source
Expand description

§TrustformeRS Models

This crate provides pre-trained transformer model implementations optimized for Rust, offering high-performance alternatives to Python-based implementations.

§Overview

TrustformeRS Models includes implementations of popular transformer architectures:

  • BERT Family: BERT, RoBERTa, DistilBERT, ALBERT, ELECTRA, DeBERTa
  • GPT Family: GPT-2, GPT-Neo, GPT-J
  • Modern LLMs: LLaMA, Mistral, Gemma, Qwen, Falcon, StableLM, Command R, Claude
  • Encoder-Decoder: T5
  • Vision Models: Vision Transformer (ViT), CLIP
  • Multimodal Models: BLIP-2, LLaVA, DALL-E, Flamingo
  • Efficient Models: Mamba, RWKV, S4

§Features

Each model implementation includes:

  • Pre-trained weight loading from Hugging Face Hub
  • Task-specific heads (classification, generation, etc.)
  • Efficient inference with SciRS2 backend
  • Support for quantization and optimization
  • Performance optimization utilities for production deployment
  • Model serving infrastructure with load balancing and health monitoring

§Usage Example

use trustformers_models::{BertModel, BertConfig};
use trustformers_core::Result;

fn main() -> Result<()> {
    // Load a pre-trained BERT model
    let config = BertConfig::bert_base_uncased();
    let mut model = BertModel::new(config)?;

    // Load weights from Hugging Face
    model.load_from_hub("bert-base-uncased")?;

    // Use the model for inference
    // ... tokenization and forward pass

    Ok(())
}

§Feature Flags

Models are gated behind feature flags to reduce compilation time:

[dependencies]
trustformers-models = { version = "*", features = ["bert", "gpt2"] }

Available features:

  • bert: BERT model family
  • roberta: RoBERTa models
  • distilbert: DistilBERT models
  • gpt2: GPT-2 models
  • gpt_neo: GPT-Neo models
  • gpt_j: GPT-J models
  • t5: T5 encoder-decoder models
  • albert: ALBERT models
  • electra: ELECTRA models
  • deberta: DeBERTa models
  • vit: Vision Transformer models
  • llama: LLaMA models
  • mistral: Mistral models
  • clip: CLIP multimodal models
  • llava: LLaVA visual instruction tuning models
  • dalle: DALL-E text-to-image generation models
  • flamingo: Flamingo few-shot vision-language models
  • gemma: Gemma models
  • qwen: Qwen models
  • phi3: Phi-3 small language models
  • mamba: Mamba state-space models
  • rwkv: RWKV linear complexity models
  • falcon: Falcon high-performance language models
  • claude: Claude constitutional AI models

§Model Categories

§Encoder Models (BERT Family)

These models use bidirectional attention and are best for:

  • Text classification
  • Named entity recognition
  • Question answering
  • Feature extraction

§Decoder Models (GPT Family)

These models use causal (left-to-right) attention and excel at:

  • Text generation
  • Code completion
  • Creative writing
  • Conversational AI

§Encoder-Decoder Models (T5)

These models combine both architectures and are ideal for:

  • Translation
  • Summarization
  • Question answering
  • Text-to-text generation

§Vision Models

  • ViT: Image classification and feature extraction
  • CLIP: Multimodal understanding (text + image)

Re-exports§

pub use bert::BertConfig;
pub use bert::BertForMaskedLM;
pub use bert::BertForSequenceClassification;
pub use bert::BertModel;
pub use automated_model_design::ArchitectureTemplate;
pub use automated_model_design::ConstraintSolver;
pub use automated_model_design::DeploymentEnvironment;
pub use automated_model_design::DesignPatternLibrary;
pub use automated_model_design::DesignRequirements;
pub use automated_model_design::DesignRequirementsBuilder;
pub use automated_model_design::Modality;
pub use automated_model_design::ModelDesign;
pub use automated_model_design::ModelDesignMetadata;
pub use automated_model_design::ModelDesigner;
pub use automated_model_design::ModelMetrics;
pub use automated_model_design::PerformanceTarget;
pub use automated_model_design::ResourceConstraints;
pub use automated_model_design::TaskType as DesignTaskType;
pub use automated_model_design::TemplateMetadata;
pub use claude::ClaudeConfig;
pub use claude::ClaudeForCausalLM;
pub use claude::ClaudeModel;
pub use command_r::CommandRConfig;
pub use command_r::CommandRForCausalLM;
pub use command_r::CommandRModel;
pub use common_patterns::components;
pub use common_patterns::get_global_registry;
pub use common_patterns::ArchitectureType;
pub use common_patterns::ComputeRequirements;
pub use common_patterns::EvaluableModel;
pub use common_patterns::EvaluationData;
pub use common_patterns::EvaluationMetric;
pub use common_patterns::EvaluationResults;
pub use common_patterns::GenerationConfig;
pub use common_patterns::GenerationStrategy;
pub use common_patterns::GenerativeModel;
pub use common_patterns::InitializationStrategy;
pub use common_patterns::MemoryEstimate;
pub use common_patterns::ModelFamily;
pub use common_patterns::ModelFamilyMetadata;
pub use common_patterns::ModelRegistry;
pub use common_patterns::ModelUtils;
pub use common_patterns::TaskType as CommonTaskType;
pub use comprehensive_testing::reporting;
pub use comprehensive_testing::BiasMetric;
pub use comprehensive_testing::BiasmitigationStrategy;
pub use comprehensive_testing::FairnessAssessment;
pub use comprehensive_testing::FairnessConfig;
pub use comprehensive_testing::FairnessMetricType;
pub use comprehensive_testing::FairnessResult;
pub use comprehensive_testing::FairnessTestData;
pub use comprehensive_testing::FairnessViolation;
pub use comprehensive_testing::GroupData;
pub use comprehensive_testing::LayerPerformance;
pub use comprehensive_testing::MemoryAnalysis;
pub use comprehensive_testing::ModelTestSuite;
pub use comprehensive_testing::NumericalDifferences;
pub use comprehensive_testing::NumericalParityResults;
pub use comprehensive_testing::OverallPerformance;
pub use comprehensive_testing::PerformanceProfiler;
pub use comprehensive_testing::PerformanceResults;
pub use comprehensive_testing::ReferenceComparator;
pub use comprehensive_testing::StatisticalTest;
pub use comprehensive_testing::TestDataType;
pub use comprehensive_testing::TestInputConfig;
pub use comprehensive_testing::TestResult;
pub use comprehensive_testing::TestStatistics;
pub use comprehensive_testing::ThroughputMeasurements;
pub use comprehensive_testing::TimingInfo;
pub use comprehensive_testing::ValidationConfig;
pub use continual_learning::utils as continual_learning_utils;
pub use continual_learning::ContinualLearningConfig;
pub use continual_learning::ContinualLearningMetrics;
pub use continual_learning::ContinualLearningOutput;
pub use continual_learning::ContinualLearningTrainer;
pub use continual_learning::ContinualStrategy;
pub use continual_learning::LearningRateSchedule;
pub use continual_learning::MemoryBuffer;
pub use continual_learning::MemorySelectionStrategy;
pub use continual_learning::TaskEvaluation;
pub use continual_learning::TaskInfo;
pub use creative_writing_specialized::CreativeWritingConfig;
pub use creative_writing_specialized::CreativeWritingForCausalLM;
pub use creative_writing_specialized::CreativeWritingModel;
pub use creative_writing_specialized::CreativeWritingSpecialTokens;
pub use creative_writing_specialized::EmotionalTone;
pub use creative_writing_specialized::ImprovementType;
pub use creative_writing_specialized::LiteraryDevice;
pub use creative_writing_specialized::NarrativePerspective;
pub use creative_writing_specialized::PoetryStyle;
pub use creative_writing_specialized::StyleAnalysis;
pub use creative_writing_specialized::WritingGenre;
pub use creative_writing_specialized::WritingImprovement;
pub use creative_writing_specialized::WritingStyle;
pub use cross_attention::AdaptiveCrossAttention;
pub use cross_attention::CrossAttention;
pub use cross_attention::CrossAttentionConfig;
pub use cross_attention::GatedCrossAttention;
pub use cross_attention::HierarchicalCrossAttention;
pub use cross_attention::MultiHeadCrossAttention;
pub use cross_attention::SparseCrossAttention;
pub use curriculum_learning::utils as curriculum_learning_utils;
pub use curriculum_learning::CurriculumAnalysis;
pub use curriculum_learning::CurriculumConfig;
pub use curriculum_learning::CurriculumEpochOutput;
pub use curriculum_learning::CurriculumExample;
pub use curriculum_learning::CurriculumLearningOutput;
pub use curriculum_learning::CurriculumLearningTrainer;
pub use curriculum_learning::CurriculumStats;
pub use curriculum_learning::CurriculumStrategy;
pub use curriculum_learning::DifficultyMeasure;
pub use curriculum_learning::PacingFunction;
pub use error_recovery::ErrorCategory;
pub use error_recovery::ErrorRecoveryManager;
pub use error_recovery::ErrorTrends;
pub use error_recovery::ModelCheckpoint;
pub use error_recovery::RecoverableOperation;
pub use error_recovery::RecoveryAttempt;
pub use error_recovery::RecoveryConfig;
pub use error_recovery::RecoveryMetrics;
pub use error_recovery::RecoveryReport;
pub use error_recovery::RecoveryStrategy;
pub use falcon::FalconConfig;
pub use falcon::FalconForCausalLM;
pub use falcon::FalconModel;
pub use fnet::FNetConfig;
pub use fnet::FNetForMaskedLM;
pub use fnet::FNetForSequenceClassification;
pub use fnet::FNetModel;
pub use hierarchical::HierarchicalConfig;
pub use hierarchical::HierarchicalForLanguageModeling;
pub use hierarchical::HierarchicalForSequenceClassification;
pub use hierarchical::HierarchicalTransformer;
pub use hierarchical::NestedTransformer;
pub use hierarchical::PyramidTransformer;
pub use hierarchical::TreeTransformer;
pub use hybrid_architectures::AdaptiveConfig;
pub use hybrid_architectures::ArchitecturalComponent;
pub use hybrid_architectures::ArchitectureSummary;
pub use hybrid_architectures::AttentionType;
pub use hybrid_architectures::CNNArchitecture;
pub use hybrid_architectures::CrossModalConfig;
pub use hybrid_architectures::EnsembleMethod;
pub use hybrid_architectures::FusionStrategy;
pub use hybrid_architectures::GlobalParams;
pub use hybrid_architectures::HierarchyType;
pub use hybrid_architectures::HybridArchitecture;
pub use hybrid_architectures::HybridConfig;
pub use hybrid_architectures::HybridConfigBuilder;
pub use hybrid_architectures::MemoryType;
pub use hybrid_architectures::ParallelFusionMethod;
pub use hybrid_architectures::RNNCellType;
pub use hybrid_architectures::StateSpaceType;
pub use hybrid_architectures::SwitchingCriteria;
pub use hybrid_architectures::TransformerVariant;
pub use hyena::HyenaConfig;
pub use hyena::HyenaForLanguageModeling;
pub use hyena::HyenaForSequenceClassification;
pub use hyena::HyenaModel;
pub use knowledge_distillation::utils as knowledge_distillation_utils;
pub use knowledge_distillation::DistillationConfig;
pub use knowledge_distillation::DistillationOutput;
pub use knowledge_distillation::DistillationStrategy;
pub use knowledge_distillation::KnowledgeDistillationTrainer;
pub use knowledge_distillation::ProgressiveStage;
pub use knowledge_distillation::StudentOutputs;
pub use knowledge_distillation::TeacherOutputs;
pub use legal_medical_specialized::Citation;
pub use legal_medical_specialized::CitationType;
pub use legal_medical_specialized::ComplianceReport;
pub use legal_medical_specialized::ComplianceViolation;
pub use legal_medical_specialized::DocumentAnalysis;
pub use legal_medical_specialized::LegalMedicalConfig;
pub use legal_medical_specialized::LegalMedicalDomain;
pub use legal_medical_specialized::LegalMedicalForCausalLM;
pub use legal_medical_specialized::LegalMedicalModel;
pub use legal_medical_specialized::LegalMedicalSpecialTokens;
pub use legal_medical_specialized::LegalSystem;
pub use legal_medical_specialized::MedicalStandard;
pub use legal_medical_specialized::PrivacyRequirement;
pub use linformer::LinformerConfig;
pub use linformer::LinformerForMaskedLM;
pub use linformer::LinformerForSequenceClassification;
pub use linformer::LinformerModel;
pub use mamba::MambaConfig;
pub use mamba::MambaModel;
pub use meta_learning::utils as meta_learning_utils;
pub use meta_learning::ConvergenceMetrics;
pub use meta_learning::EpisodeResult;
pub use meta_learning::EvaluationResult;
pub use meta_learning::Example;
pub use meta_learning::ExampleSet;
pub use meta_learning::MetaAlgorithm;
pub use meta_learning::MetaLearner;
pub use meta_learning::MetaLearningConfig;
pub use meta_learning::MetaLearningModel;
pub use meta_learning::MetaOptimizer;
pub use meta_learning::MetaStatistics;
pub use meta_learning::PerformanceMetrics;
pub use meta_learning::Task;
pub use meta_learning::TaskBatch;
pub use meta_learning::TaskResult;
pub use meta_learning::TaskSampler;
pub use meta_learning::TaskType as MetaTaskType;
pub use mixed_bit_quantization::BitAllocationStrategy;
pub use mixed_bit_quantization::CalibrationConfig;
pub use mixed_bit_quantization::CalibrationMethod;
pub use mixed_bit_quantization::HardwareConstraints as QuantizationHardwareConstraints;
pub use mixed_bit_quantization::HardwarePlatform as QuantizationHardwarePlatform;
pub use mixed_bit_quantization::LayerQuantizationConstraints;
pub use mixed_bit_quantization::MixedBitQuantizationConfig;
pub use mixed_bit_quantization::MixedBitQuantizer;
pub use mixed_bit_quantization::ProgressiveQuantizationConfig;
pub use mixed_bit_quantization::QuantizationFormat;
pub use mixed_bit_quantization::QuantizationParams;
pub use mixed_bit_quantization::QuantizationQualityMetrics;
pub use mixed_bit_quantization::QuantizationResults;
pub use mixed_bit_quantization::QuantizedLayerInfo;
pub use mixed_bit_quantization::SensitivityAnalysisResults;
pub use model_compression::utils as model_compression_utils;
pub use model_compression::ClusteringMethod;
pub use model_compression::CompressedModel;
pub use model_compression::CompressionAnalysis;
pub use model_compression::CompressionConfig;
pub use model_compression::CompressionPipeline;
pub use model_compression::CompressionStrategy;
pub use model_compression::CompressionSummary;
pub use model_compression::DecompositionType;
pub use model_compression::LayerCompressionStats;
pub use model_compression::OptimizationObjective;
pub use model_compression::PruningStrategy;
pub use model_compression::StructuredPruningGranularity;
pub use model_serving::InferenceRequest;
pub use model_serving::InferenceResponse;
pub use model_serving::LoadBalancer;
pub use model_serving::LoadBalancingStrategy;
pub use model_serving::ModelInstance;
pub use model_serving::ModelServingManager;
pub use model_serving::RequestPriority;
pub use model_serving::RequestQueue;
pub use model_serving::ServingConfig;
pub use model_serving::ServingMetrics;
pub use moe::glam_config;
pub use moe::switch_config;
pub use moe::Expert;
pub use moe::ExpertParallel;
pub use moe::MLPExpert;
pub use moe::MoEConfig;
pub use moe::RouterOutput;
pub use moe::RoutingStats;
pub use moe::SparseMoE;
pub use moe::SwitchMoE;
pub use moe::TopKRouter;
pub use multi_task_learning::utils as multi_task_learning_utils;
pub use multi_task_learning::LossBalancingStrategy;
pub use multi_task_learning::MTLAnalysis;
pub use multi_task_learning::MTLArchitecture;
pub use multi_task_learning::MTLConfig;
pub use multi_task_learning::MTLStats;
pub use multi_task_learning::MultiTaskEvaluation;
pub use multi_task_learning::MultiTaskLearningTrainer;
pub use multi_task_learning::MultiTaskOutput;
pub use multi_task_learning::TaskConfig;
pub use multi_task_learning::TaskEvaluation as MTLTaskEvaluation;
pub use multi_task_learning::TaskPriority;
pub use multi_task_learning::TaskType as MTLTaskType;
pub use neural_architecture_search::Architecture;
pub use neural_architecture_search::ArchitectureConstraint;
pub use neural_architecture_search::ArchitectureEvaluation;
pub use neural_architecture_search::ArchitectureMetadata;
pub use neural_architecture_search::DimensionRange;
pub use neural_architecture_search::HardwareConstraints;
pub use neural_architecture_search::HardwarePlatform;
pub use neural_architecture_search::NASConfig;
pub use neural_architecture_search::NeuralArchitectureSearcher;
pub use neural_architecture_search::OptimizationObjective as NASOptimizationObjective;
pub use neural_architecture_search::SearchSpace;
pub use neural_architecture_search::SearchStatistics;
pub use neural_architecture_search::SearchStrategy;
pub use performance_optimization::BatchProcessor;
pub use performance_optimization::BatchingStrategy;
pub use performance_optimization::CachedTensor;
pub use performance_optimization::DynamicBatchManager;
pub use performance_optimization::GpuCacheStatistics;
pub use performance_optimization::GpuMemoryChunk;
pub use performance_optimization::GpuMemoryOptimizer;
pub use performance_optimization::GpuMemoryPool;
pub use performance_optimization::GpuMemoryStats;
pub use performance_optimization::GpuOptimizationRecommendations;
pub use performance_optimization::GpuTensorCache;
pub use performance_optimization::MemoryOptimizer;
pub use performance_optimization::PerformanceConfig;
pub use performance_optimization::PerformanceMonitor;
pub use performance_optimization::PerformanceStatistics;
pub use performer::PerformerConfig;
pub use performer::PerformerForMaskedLM;
pub use performer::PerformerForSequenceClassification;
pub use performer::PerformerModel;
pub use progressive_training::utils as progressive_training_utils;
pub use progressive_training::GrowthDimension;
pub use progressive_training::GrowthEvent;
pub use progressive_training::GrowthInfo;
pub use progressive_training::GrowthResult;
pub use progressive_training::GrowthSchedule;
pub use progressive_training::GrowthStrategy;
pub use progressive_training::LearningProgress;
pub use progressive_training::ProgressiveConfig;
pub use progressive_training::ProgressiveModel;
pub use progressive_training::ProgressiveTrainer;
pub use retnet::RetNetConfig;
pub use retnet::RetNetForLanguageModeling;
pub use retnet::RetNetForSequenceClassification;
pub use retnet::RetNetModel;
pub use rwkv::RwkvConfig;
pub use rwkv::RwkvModel;
pub use s4::S4Config;
pub use s4::S4ForLanguageModeling;
pub use s4::S4Model;
pub use scientific_specialized::CitationStyle;
pub use scientific_specialized::ScientificAnalysis;
pub use scientific_specialized::ScientificConfig;
pub use scientific_specialized::ScientificDomain;
pub use scientific_specialized::ScientificForCausalLM;
pub use scientific_specialized::ScientificModel;
pub use scientific_specialized::ScientificSpecialTokens;
pub use sparse_attention::utils as sparse_attention_utils;
pub use sparse_attention::SparseAttention;
pub use sparse_attention::SparseAttentionConfig;
pub use sparse_attention::SparseAttentionMask;
pub use sparse_attention::SparsePattern;
pub use stablelm::StableLMConfig;
pub use stablelm::StableLMForCausalLM;
pub use stablelm::StableLMModel;
pub use weight_loading::auto_create_loader;
pub use weight_loading::create_distributed_loader;
pub use weight_loading::create_gguf_loader;
pub use weight_loading::create_huggingface_loader;
pub use weight_loading::create_memory_mapped_loader;
pub use weight_loading::DistributedStats;
pub use weight_loading::DistributedWeightLoader;
pub use weight_loading::GGMLType;
pub use weight_loading::GGUFLoader;
pub use weight_loading::HuggingFaceLoader;
pub use weight_loading::LazyTensor;
pub use weight_loading::MemoryMappedLoader;
pub use weight_loading::QuantizationConfig;
pub use weight_loading::StreamingLoader;
pub use weight_loading::TensorMetadata;
pub use weight_loading::WeightDataType;
pub use weight_loading::WeightFormat;
pub use weight_loading::WeightLoader;
pub use weight_loading::WeightLoadingConfig;
pub use xlstm::ExponentialGatingConfig;
pub use xlstm::FeedForward;
pub use xlstm::MLstmBlock;
pub use xlstm::MLstmConfig;
pub use xlstm::SLstmBlock;
pub use xlstm::SLstmConfig;
pub use xlstm::XLSTMBlockConfig;
pub use xlstm::XLSTMBlockType;
pub use xlstm::XLSTMConfig;
pub use xlstm::XLSTMForCausalLM;
pub use xlstm::XLSTMForSequenceClassification;
pub use xlstm::XLSTMLayer;
pub use xlstm::XLSTMModel;
pub use xlstm::XLSTMState;
pub use dynamic_pruning::*;

Modules§

advanced_quantization
automated_model_design
Automated Model Design Framework
batch_inference
Batch Inference Utilities for Trustformers Models
benchmarking
Simplified Model Benchmarking Suite
bert
BERT (Bidirectional Encoder Representations from Transformers)
biologically_inspired
claude
Claude (Anthropic’s Constitutional AI)
cogvlm
CogVLM: Visual Expert for Pretrained Language Models
command_r
common_patterns
Common Model Architecture Patterns and Traits
comprehensive_testing
Comprehensive Model Testing and Validation Framework
continual_learning
Continual Learning Framework
creative_writing_specialized
Creative Writing Domain-Specialized Models
cross_attention
Cross-Attention Variants
curriculum_learning
Curriculum Learning Framework
developer_tools
Developer Tools and Code Generation
dynamic_pruning
Dynamic token pruning for efficient transformer inference.
error_recovery
Comprehensive Error Recovery Framework for TrustformeRS Models
falcon
Falcon - Technology Innovation Institute Language Models
fnet
generation_utils
Generation Utilities for Trustformers Models
hierarchical
Hierarchical Transformers
hybrid_architectures
Hybrid Architectures Framework
hyena
knowledge_distillation
Knowledge Distillation Framework
legal_medical_specialized
Legal and Medical Domain-Specialized Models
linformer
mamba
memory_profiling
Memory Profiling Module for TrustformeRS Models
meta_learning
Meta-Learning Module
mixed_bit_quantization
Mixed-Bit Quantization Framework
model_cards
Model Cards for TrustformeRS
model_compression
Model Compression Toolkit
model_serving
Model Serving Utilities
moe
multi_task_learning
Multi-Task Learning Framework
neural_architecture_search
Neural Architecture Search (NAS) Framework
numerical_parity_tests
Numerical parity tests to ensure our implementations match reference outputs
performance_optimization
Performance Optimization Utilities
performer
progressive_training
Progressive Training Module
quantum_classical_hybrids
recursive
Recursive Transformers for Long Sequences
retnet
ring_attention
rwkv
s4
S4 (Structured State Space) Model Implementation
scientific_specialized
Scientific Domain-Specialized Models
sparse_attention
Sparse Attention Patterns Library
stablelm
StableLM Model Implementation
weight_loading
xlstm
Extended LSTM (xLSTM) Implementation