Skip to main content

Crate trustformers_wasm

Crate trustformers_wasm 

Source
Expand description

§TrustformeRS WebAssembly Bindings

Run transformer models directly in the browser with WebAssembly and WebGPU acceleration.

This crate provides WebAssembly bindings for TrustformeRS, enabling transformer model inference in web browsers with near-native performance. It leverages WebGPU for GPU acceleration and Web Workers for parallel processing.

§Features

  • WebGPU acceleration: GPU compute in the browser via WebGPU API
  • Web Workers: Multi-threaded inference using Web Workers
  • Streaming inference: Progressive token generation for chat applications
  • Zero downloads: Models run entirely in-browser (no server calls)
  • Privacy-preserving: All computation happens client-side

§Quick Start

import init, { Model, Tokenizer } from './trustformers_wasm.js';

async function main() {
  // Initialize the WASM module
  await init();

  // Load model and tokenizer
  const model = await Model.from_pretrained("bert-base-uncased");
  const tokenizer = await Tokenizer.from_pretrained("bert-base-uncased");

  // Run inference
  const text = "Hello, world!";
  const tokens = tokenizer.encode(text);
  const output = await model.forward(tokens);

  console.log(output);
}

§Architecture

  • WASM Core: Compiled Rust code for tensor operations
  • WebGPU Backend: GPU compute shaders for matrix operations
  • Web Workers: Parallel processing for batched inference
  • Shared Memory: Zero-copy data transfer between workers

§Performance

  • WebGPU: ~50-100x faster than CPU-only WASM
  • SIMD: Vectorized operations via WASM SIMD
  • Streaming: Progressive inference for lower latency
  • Caching: Model weights cached in IndexedDB

§Browser Support

  • Chrome/Edge 113+ (WebGPU)
  • Firefox 121+ (WebGPU experimental)
  • Safari 18+ (WebGPU preview)

§Build

wasm-pack build --target web --features webgpu

Re-exports§

pub use core::model;
pub use core::pipeline;
pub use core::tensor;
pub use core::tokenizer;
pub use core::utils;
pub use optimization::batch_processing;
pub use optimization::memory_pool;
pub use optimization::quantization;
pub use optimization::simd_tensor_ops;
pub use optimization::weight_compression;
pub use tensor::WasmTensor;
pub use auto_docs::create_default_doc_generator;
pub use auto_docs::create_html_doc_generator;
pub use auto_docs::create_markdown_doc_generator;
pub use auto_docs::get_version_info;
pub use auto_docs::AutoDocGenerator;
pub use auto_docs::DocConfig;
pub use auto_docs::DocFormat;
pub use auto_docs::DocTheme;
pub use auto_docs::VersionInfo;
pub use batch_processing::BatchConfig;
pub use batch_processing::BatchProcessor;
pub use batch_processing::BatchResponse;
pub use batch_processing::BatchingStrategy;
pub use batch_processing::Priority as BatchPriority;
pub use debug::DebugConfig;
pub use debug::DebugLogger;
pub use debug::LogLevel;
pub use debug::PerformanceMetrics;
pub use error::ErrorBuilder;
pub use error::ErrorCode;
pub use error::ErrorCollection;
pub use error::ErrorContext;
pub use error::ErrorHandler;
pub use error::ErrorSeverity;
pub use error::TrustformersError;
pub use error::TrustformersResult;
pub use events::EventData;
pub use events::EventEmittable;
pub use events::EventManager;
pub use events::EventPriority;
pub use events::EventType;
pub use multi_model_manager::create_development_multi_model_manager;
pub use multi_model_manager::create_production_multi_model_manager;
pub use multi_model_manager::DeploymentEnvironment;
pub use multi_model_manager::ModelPriority;
pub use multi_model_manager::ModelStatus;
pub use multi_model_manager::MultiModelConfig;
pub use multi_model_manager::MultiModelManager;
pub use performance::BottleneckType;
pub use performance::OperationType as ProfilerOperationType;
pub use performance::ProfilerConfig;
pub use performance::ResourceType;
pub use performance_profiler::create_development_profiler;
pub use performance_profiler::create_production_profiler;
pub use performance_profiler::PerformanceProfiler;
pub use plugin_framework::create_default_plugin_config;
pub use plugin_framework::create_plugin_context;
pub use plugin_framework::ExecutionMetrics;
pub use plugin_framework::ExecutionPriority;
pub use plugin_framework::ModelMetadata as PluginModelMetadata;
pub use plugin_framework::PerformanceBudget;
pub use plugin_framework::Plugin;
pub use plugin_framework::PluginConfig;
pub use plugin_framework::PluginContext;
pub use plugin_framework::PluginError;
pub use plugin_framework::PluginErrorCode;
pub use plugin_framework::PluginManager;
pub use plugin_framework::PluginMetadata;
pub use plugin_framework::PluginPermission;
pub use plugin_framework::PluginRegistry;
pub use plugin_framework::PluginResult;
pub use plugin_framework::PluginType;
pub use plugin_framework::ResourceLimits;
pub use plugins::ModelOptimizerPlugin;
pub use plugins::TextProcessorPlugin;
pub use plugins::VisualizationPlugin;
pub use quantization::QuantizationConfig;
pub use quantization::QuantizationPrecision;
pub use quantization::QuantizationStrategy;
pub use quantization::QuantizedModelData;
pub use quantization::WebQuantizer;
pub use weight_compression::CompressedModelData;
pub use weight_compression::CompressionConfig;
pub use weight_compression::CompressionLevel;
pub use weight_compression::CompressionStrategy;
pub use weight_compression::SparsityPattern;
pub use weight_compression::WeightCompressor;

Modules§

auto_docs
Automatic documentation generator from TypeScript definitions
compute
core
debug
Debug mode with comprehensive logging and performance monitoring
error
Comprehensive error handling for TrustformeRS WASM
events
Event system for lifecycle hooks and notifications
export
Model Export Module
layers
models
multi_model_manager
Multi-model management system for efficient model loading and switching
optimization
Optimization modules for TrustformeRS WASM
performance
Performance profiler modules
performance_profiler
Advanced performance profiler for ML inference optimization
plugin_framework
plugins

Macros§

debug_log
Macro for easy logging with automatic category detection
error
Utility macros for creating errors
error_builder

Structs§

InferenceSession
MemoryStats
Memory usage statistics
TrustformersWasm

Functions§

enable_simd
get_gpu_memory_usage
Get current GPU memory usage
get_memory_stats
Get comprehensive memory statistics
get_peak_gpu_memory_usage
Get peak GPU memory usage
get_wasm_memory_usage
init_panic_hook
reset_peak_gpu_memory
Reset peak GPU memory usage tracking
track_gpu_allocation
Track GPU memory allocation (called by WebGPU backend)
track_gpu_deallocation
Track GPU memory deallocation (called by WebGPU backend)