# trustformers-core





Core infrastructure crate providing fundamental abstractions and utilities for the TrustformeRS ecosystem.
## Current State
**Version 0.1.0 — Stable (2026-03-21)**
This crate is **stable and production-ready**, serving as the foundation for all other TrustformeRS components. It provides high-performance tensor operations, layer implementations, and advanced optimization techniques. All 1,140 tests pass with zero stubs or unimplemented items.
## Features
### Tensor Operations
- **Comprehensive tensor abstraction** supporting multiple backends
- **SciRS2 integration** for SIMD-optimized operations
- **GPU support** through multiple backends (CUDA, Metal, Vulkan, WebGPU, OpenCL, ROCm, Vulkan, TPU, OneAPI, XLA, RISC-V)
- **Automatic differentiation** with reverse-mode and forward-mode autodiff
- **Memory-efficient operations** with zero-copy views
### Layer Implementations
- **Core Layers**: Linear, Embedding, LayerNorm, Dropout, RMSNorm
- **Attention Mechanisms**:
- Multi-head attention (MHA) with causal masking
- FlashAttention and FlashAttention-2 for memory efficiency
- Multi-Query Attention (MQA) and Grouped-Query Attention (GQA)
- PagedAttention for KV cache management
- Optimized SDPA kernels with adaptive strategies
- **Advanced Layers**: FeedForward (SwiGLU, GeGLU), PositionalEncoding, RoPE, ALiBi
### Performance Optimizations
- **SIMD Operations**: Optimized LayerNorm, Softmax, and RoPE implementations
- **Quantization Support**: INT8, INT4, FP16, FP8, GPTQ, AWQ, GGUF/K-quants with calibration
- **Custom Kernels**: Fused operations for reduced memory bandwidth
- **Kernel Tuning**: Automatic hardware-aware kernel parameter optimization
- **Memory Management**: Adaptive pooling with LRU, LFU, ARC, and Hybrid eviction policies
- **Conv2D**: Full im2col+matmul implementation with groups and dilation
- **GPU Attention**: Scaled dot-product and flash attention (tiled online-softmax)
### Export and Interoperability
- **ONNX Export**: Complete graph construction and runtime support
- **GGML/GGUF**: Advanced quantization formats including K-quants for edge deployment
- **CoreML**: iOS/macOS deployment support
### Advanced Features
- **Evaluation Framework**: GLUE, SuperGLUE, MMLU, HellaSwag, HumanEval benchmarks
- **Monitoring**: TensorBoard integration, gradient flow analysis, activation statistics
- **Caching System**: Multiple eviction policies (LRU, LFU, ARC, Hybrid)
- **A/B Testing**: Infrastructure for model comparison
- **Model Compression**: Pruning and distillation support
- **Plugin System**: Extensible architecture for custom kernels and layers
- **Tensor Debugger**: Interactive watchpoints, NaN/Inf detection, operation tracing
### Distributed and Parallel Computing
- **Tensor Parallelism**: Column/row parallel linear layers
- **Pipeline Parallelism**: Stage-based model partitioning
- **Data Parallelism**: Multi-GPU training infrastructure
- **Communication Backends**: NCCL, MPI, Gloo support
- **Process Groups**: All-reduce, broadcast, all-gather, reduce-scatter operations
### PEFT (Parameter-Efficient Fine-Tuning)
- **LoRA**: Low-rank adaptation with weight merging
- **QLoRA**: Quantized LoRA for memory efficiency
- **Adapters**: Bottleneck adapter layers
- **Prefix Tuning**: Trainable prefix embeddings
- **Prompt Tuning**: Virtual token optimization
## Architecture
```
trustformers-core/
├── src/
│ ├── tensor/ # Tensor abstractions and operations
│ ├── layers/ # Neural network layers
│ ├── attention/ # Attention mechanisms
│ ├── optimization/ # Performance optimizations
│ ├── quantization/ # Quantization infrastructure
│ ├── export/ # Model export formats
│ ├── evaluation/ # Benchmark implementations
│ ├── monitoring/ # Profiling and analysis
│ ├── parallel/ # Distributed computing
│ └── peft/ # Parameter-efficient fine-tuning
```
## Usage Example
```rust
use trustformers_core::{
tensor::Tensor,
layers::{Linear, Layer},
attention::FlashAttention,
};
// Create tensors
let input = Tensor::randn(&[32, 512, 768])?;
// Create layers
let linear = Linear::new(768, 768, true)?;
let attention = FlashAttention::new(768, 12)?;
// Forward pass
let output = linear.forward(&input)?;
let attended = attention.forward(&output, None)?;
```
## Performance
- **FlashAttention**: O(N) memory complexity vs O(N²) standard
- **Quantization**: 50-75% memory reduction with INT8/INT4
- **SIMD**: 2-3x speedup on supported operations
- **PagedAttention**: Eliminates KV cache fragmentation
## Testing
The crate includes comprehensive test coverage:
- **1,140 unit and integration tests, all passing**
- Property-based testing with proptest
- Memory leak detection
- Performance benchmarks
- Cross-backend compatibility tests
- Numerical stability tests with adaptive tolerance
## Dependencies
- `scirs2-core`: SIMD operations and parallelism
- `half`: FP16/BF16 support
- `rayon`: Parallel iteration (via SciRS2)
- Various serialization and utility crates
## Public API
The crate exposes **1,596 public API items** covering tensors, layers, attention, quantization, export, evaluation, monitoring, distributed computing, PEFT, kernel tuning, and memory management.
## License
Apache-2.0