Crate trustformers_wasm

Expand description

§TrustformeRS WebAssembly Bindings

Run transformer models directly in the browser with WebAssembly and WebGPU acceleration.

This crate provides WebAssembly bindings for TrustformeRS, enabling transformer model inference in web browsers with near-native performance. It leverages WebGPU for GPU acceleration and Web Workers for parallel processing.

§Features

WebGPU acceleration: GPU compute in the browser via WebGPU API
Web Workers: Multi-threaded inference using Web Workers
Streaming inference: Progressive token generation for chat applications
Zero downloads: Models run entirely in-browser (no server calls)
Privacy-preserving: All computation happens client-side

§Quick Start

import init, { Model, Tokenizer } from './trustformers_wasm.js';

async function main() {
  // Initialize the WASM module
  await init();

  // Load model and tokenizer
  const model = await Model.from_pretrained("bert-base-uncased");
  const tokenizer = await Tokenizer.from_pretrained("bert-base-uncased");

  // Run inference
  const text = "Hello, world!";
  const tokens = tokenizer.encode(text);
  const output = await model.forward(tokens);

  console.log(output);
}

§Architecture

WASM Core: Compiled Rust code for tensor operations
WebGPU Backend: GPU compute shaders for matrix operations
Web Workers: Parallel processing for batched inference
Shared Memory: Zero-copy data transfer between workers

§Performance

WebGPU: ~50-100x faster than CPU-only WASM
SIMD: Vectorized operations via WASM SIMD
Streaming: Progressive inference for lower latency
Caching: Model weights cached in IndexedDB

§Browser Support

Chrome/Edge 113+ (WebGPU)
Firefox 121+ (WebGPU experimental)
Safari 18+ (WebGPU preview)

§Build

wasm-pack build --target web --features webgpu

Re-exports§

pub use core::model;
pub use core::pipeline;
pub use core::tensor;
pub use core::tokenizer;
pub use core::utils;
pub use optimization::batch_processing;
pub use optimization::memory_pool;
pub use optimization::quantization;
pub use optimization::simd_tensor_ops;
pub use optimization::weight_compression;
pub use tensor::WasmTensor;
pub use auto_docs::create_default_doc_generator;
pub use auto_docs::create_html_doc_generator;
pub use auto_docs::create_markdown_doc_generator;
pub use auto_docs::get_version_info;
pub use auto_docs::AutoDocGenerator;
pub use auto_docs::DocConfig;
pub use auto_docs::DocFormat;
pub use auto_docs::DocTheme;
pub use auto_docs::VersionInfo;
pub use batch_processing::BatchConfig;
pub use batch_processing::BatchProcessor;
pub use batch_processing::BatchResponse;
pub use batch_processing::BatchingStrategy;
pub use batch_processing::Priority as BatchPriority;
pub use debug::DebugConfig;
pub use debug::DebugLogger;
pub use debug::LogLevel;
pub use debug::PerformanceMetrics;
pub use error::ErrorBuilder;
pub use error::ErrorCode;
pub use error::ErrorCollection;
pub use error::ErrorContext;
pub use error::ErrorHandler;
pub use error::ErrorSeverity;
pub use error::TrustformersError;
pub use error::TrustformersResult;
pub use events::EventData;
pub use events::EventEmittable;
pub use events::EventManager;
pub use events::EventPriority;
pub use events::EventType;
pub use multi_model_manager::create_development_multi_model_manager;
pub use multi_model_manager::create_production_multi_model_manager;
pub use multi_model_manager::DeploymentEnvironment;
pub use multi_model_manager::ModelPriority;
pub use multi_model_manager::ModelStatus;
pub use multi_model_manager::MultiModelConfig;
pub use multi_model_manager::MultiModelManager;
pub use performance::BottleneckType;
pub use performance::OperationType as ProfilerOperationType;
pub use performance::ProfilerConfig;
pub use performance::ResourceType;
pub use performance_profiler::create_development_profiler;
pub use performance_profiler::create_production_profiler;
pub use performance_profiler::PerformanceProfiler;
pub use plugin_framework::create_default_plugin_config;
pub use plugin_framework::create_plugin_context;
pub use plugin_framework::ExecutionMetrics;
pub use plugin_framework::ExecutionPriority;
pub use plugin_framework::ModelMetadata as PluginModelMetadata;
pub use plugin_framework::PerformanceBudget;
pub use plugin_framework::Plugin;
pub use plugin_framework::PluginConfig;
pub use plugin_framework::PluginContext;
pub use plugin_framework::PluginError;
pub use plugin_framework::PluginErrorCode;
pub use plugin_framework::PluginManager;
pub use plugin_framework::PluginMetadata;
pub use plugin_framework::PluginPermission;
pub use plugin_framework::PluginRegistry;
pub use plugin_framework::PluginResult;
pub use plugin_framework::PluginType;
pub use plugin_framework::ResourceLimits;
pub use plugins::ModelOptimizerPlugin;
pub use plugins::TextProcessorPlugin;
pub use plugins::VisualizationPlugin;
pub use quantization::QuantizationConfig;
pub use quantization::QuantizationPrecision;
pub use quantization::QuantizationStrategy;
pub use quantization::QuantizedModelData;
pub use quantization::WebQuantizer;
pub use weight_compression::CompressedModelData;
pub use weight_compression::CompressionConfig;
pub use weight_compression::CompressionLevel;
pub use weight_compression::CompressionStrategy;
pub use weight_compression::SparsityPattern;
pub use weight_compression::WeightCompressor;

Modules§

auto_docs: Automatic documentation generator from TypeScript definitions
compute
core
debug: Debug mode with comprehensive logging and performance monitoring
error: Comprehensive error handling for TrustformeRS WASM
events: Event system for lifecycle hooks and notifications
export: Model Export Module
layers
models
multi_model_manager: Multi-model management system for efficient model loading and switching
optimization: Optimization modules for TrustformeRS WASM
performance: Performance profiler modules
performance_profiler: Advanced performance profiler for ML inference optimization
plugin_framework
plugins

Macros§

debug_log: Macro for easy logging with automatic category detection
error: Utility macros for creating errors
error_builder

Structs§

InferenceSession
MemoryStats: Memory usage statistics
TrustformersWasm

Functions§

enable_simd
get_gpu_memory_usage: Get current GPU memory usage
get_memory_stats: Get comprehensive memory statistics
get_peak_gpu_memory_usage: Get peak GPU memory usage
get_wasm_memory_usage
init_panic_hook
reset_peak_gpu_memory: Reset peak GPU memory usage tracking
track_gpu_allocation: Track GPU memory allocation (called by WebGPU backend)
track_gpu_deallocation: Track GPU memory deallocation (called by WebGPU backend)

Crate trustformers_wasm

Crate trustformers_wasm Copy item path

§TrustformeRS WebAssembly Bindings

§Features

§Quick Start

§Architecture

§Performance

§Browser Support

§Build

Re-exports§

Modules§

Macros§

Structs§

Functions§

Crate trustformers_wasm