Expand description
§TrustformeRS WebAssembly Bindings
Run transformer models directly in the browser with WebAssembly and WebGPU acceleration.
This crate provides WebAssembly bindings for TrustformeRS, enabling transformer model inference in web browsers with near-native performance. It leverages WebGPU for GPU acceleration and Web Workers for parallel processing.
§Features
- WebGPU acceleration: GPU compute in the browser via WebGPU API
- Web Workers: Multi-threaded inference using Web Workers
- Streaming inference: Progressive token generation for chat applications
- Zero downloads: Models run entirely in-browser (no server calls)
- Privacy-preserving: All computation happens client-side
§Quick Start
import init, { Model, Tokenizer } from './trustformers_wasm.js';
async function main() {
// Initialize the WASM module
await init();
// Load model and tokenizer
const model = await Model.from_pretrained("bert-base-uncased");
const tokenizer = await Tokenizer.from_pretrained("bert-base-uncased");
// Run inference
const text = "Hello, world!";
const tokens = tokenizer.encode(text);
const output = await model.forward(tokens);
console.log(output);
}§Architecture
- WASM Core: Compiled Rust code for tensor operations
- WebGPU Backend: GPU compute shaders for matrix operations
- Web Workers: Parallel processing for batched inference
- Shared Memory: Zero-copy data transfer between workers
§Performance
- WebGPU: ~50-100x faster than CPU-only WASM
- SIMD: Vectorized operations via WASM SIMD
- Streaming: Progressive inference for lower latency
- Caching: Model weights cached in IndexedDB
§Browser Support
- Chrome/Edge 113+ (WebGPU)
- Firefox 121+ (WebGPU experimental)
- Safari 18+ (WebGPU preview)
§Build
wasm-pack build --target web --features webgpuRe-exports§
pub use core::model;pub use core::pipeline;pub use core::tensor;pub use core::tokenizer;pub use core::utils;pub use optimization::batch_processing;pub use optimization::memory_pool;pub use optimization::quantization;pub use optimization::simd_tensor_ops;pub use optimization::weight_compression;pub use tensor::WasmTensor;pub use auto_docs::create_default_doc_generator;pub use auto_docs::create_html_doc_generator;pub use auto_docs::create_markdown_doc_generator;pub use auto_docs::get_version_info;pub use auto_docs::AutoDocGenerator;pub use auto_docs::DocConfig;pub use auto_docs::DocFormat;pub use auto_docs::DocTheme;pub use auto_docs::VersionInfo;pub use batch_processing::BatchConfig;pub use batch_processing::BatchProcessor;pub use batch_processing::BatchResponse;pub use batch_processing::BatchingStrategy;pub use batch_processing::Priority as BatchPriority;pub use debug::DebugConfig;pub use debug::DebugLogger;pub use debug::LogLevel;pub use debug::PerformanceMetrics;pub use error::ErrorBuilder;pub use error::ErrorCode;pub use error::ErrorCollection;pub use error::ErrorContext;pub use error::ErrorHandler;pub use error::ErrorSeverity;pub use error::TrustformersError;pub use error::TrustformersResult;pub use events::EventData;pub use events::EventEmittable;pub use events::EventManager;pub use events::EventPriority;pub use events::EventType;pub use multi_model_manager::create_development_multi_model_manager;pub use multi_model_manager::create_production_multi_model_manager;pub use multi_model_manager::DeploymentEnvironment;pub use multi_model_manager::ModelPriority;pub use multi_model_manager::ModelStatus;pub use multi_model_manager::MultiModelConfig;pub use multi_model_manager::MultiModelManager;pub use performance::BottleneckType;pub use performance::OperationType as ProfilerOperationType;pub use performance::ProfilerConfig;pub use performance::ResourceType;pub use performance_profiler::create_development_profiler;pub use performance_profiler::create_production_profiler;pub use performance_profiler::PerformanceProfiler;pub use plugin_framework::create_default_plugin_config;pub use plugin_framework::create_plugin_context;pub use plugin_framework::ExecutionMetrics;pub use plugin_framework::ExecutionPriority;pub use plugin_framework::ModelMetadata as PluginModelMetadata;pub use plugin_framework::PerformanceBudget;pub use plugin_framework::Plugin;pub use plugin_framework::PluginConfig;pub use plugin_framework::PluginContext;pub use plugin_framework::PluginError;pub use plugin_framework::PluginErrorCode;pub use plugin_framework::PluginManager;pub use plugin_framework::PluginMetadata;pub use plugin_framework::PluginPermission;pub use plugin_framework::PluginRegistry;pub use plugin_framework::PluginResult;pub use plugin_framework::PluginType;pub use plugin_framework::ResourceLimits;pub use plugins::ModelOptimizerPlugin;pub use plugins::TextProcessorPlugin;pub use plugins::VisualizationPlugin;pub use quantization::QuantizationConfig;pub use quantization::QuantizationPrecision;pub use quantization::QuantizationStrategy;pub use quantization::QuantizedModelData;pub use quantization::WebQuantizer;pub use weight_compression::CompressedModelData;pub use weight_compression::CompressionConfig;pub use weight_compression::CompressionLevel;pub use weight_compression::CompressionStrategy;pub use weight_compression::SparsityPattern;pub use weight_compression::WeightCompressor;
Modules§
- auto_
docs - Automatic documentation generator from TypeScript definitions
- compute
- core
- debug
- Debug mode with comprehensive logging and performance monitoring
- error
- Comprehensive error handling for TrustformeRS WASM
- events
- Event system for lifecycle hooks and notifications
- export
- Model Export Module
- layers
- models
- multi_
model_ manager - Multi-model management system for efficient model loading and switching
- optimization
- Optimization modules for TrustformeRS WASM
- performance
- Performance profiler modules
- performance_
profiler - Advanced performance profiler for ML inference optimization
- plugin_
framework - plugins
Macros§
- debug_
log - Macro for easy logging with automatic category detection
- error
- Utility macros for creating errors
- error_
builder
Structs§
- Inference
Session - Memory
Stats - Memory usage statistics
- Trustformers
Wasm
Functions§
- enable_
simd - get_
gpu_ memory_ usage - Get current GPU memory usage
- get_
memory_ stats - Get comprehensive memory statistics
- get_
peak_ gpu_ memory_ usage - Get peak GPU memory usage
- get_
wasm_ memory_ usage - init_
panic_ hook - reset_
peak_ gpu_ memory - Reset peak GPU memory usage tracking
- track_
gpu_ allocation - Track GPU memory allocation (called by WebGPU backend)
- track_
gpu_ deallocation - Track GPU memory deallocation (called by WebGPU backend)