TrustformeRS WebAssembly
WebAssembly bindings for the TrustformeRS transformer library, enabling transformer models to run directly in web browsers and Node.js environments with full WebGPU hardware acceleration.
Version: 0.1.0 | Status: Stable | Tests: 128 | SLoC: 55,504 | Last Updated: 2026-03-21
Features
- WebGPU Backend: 50-100x speedup over CPU via GPU compute shaders (wgpu 29.0 API)
- Web Workers Parallelism: Multi-threaded inference via SharedArrayBuffer
- IndexedDB Caching: Persistent model and KV-cache storage in the browser
- BERT WASM Model: Complete BERT implementation running in-browser
- React/Vue/Angular/Web Components: First-class framework bindings
- Streaming Inference: Token-by-token generation with streaming API
- SIMD Support: Hardware-accelerated tensor ops where available
- Mobile Optimization: Battery-aware, network-adaptive loading
- SciRS2 Integration: scirs2-core tensor operations in WASM
Building
Prerequisites
- Rust (latest stable)
- wasm-pack (
curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh)
Build Commands
# Build for all targets
# Or build individually:
Usage
Browser (Direct)
Node.js
const = require;
const tf = ;
const tensor = ;
console.log;
Webpack/Bundler
import * as wasm from './pkg-bundler/trustformers_wasm';
API Overview
Core Classes
TrustformersWasm
Main entry point for the library.
const tf = ;
console.log; // "0.1.0"
console.log; // true
WasmTensor
Core tensor operations.
// Creation
const a = ;
const b = ;
const c = ;
const d = ;
// Operations
const sum = a.;
const prod = a.;
const transposed = a.;
// Activations
const relu_out = a.;
const gelu_out = a.;
const softmax_out = a.;
Linear
Fully connected layer.
const linear = ;
const output = linear.;
BertModelWasm
BERT model running entirely in WASM.
const config = ;
const model = ;
const output = model.;
WebGPU Backend
import from './pkg-web/trustformers_wasm.js';
// Initialize WebGPU (50-100x speedup vs CPU)
const inference = await ;
// Streaming token generation
const generator = ;
Framework Bindings
React
import from 'trustformers-react';
Vue
import from 'trustformers-vue';
export default
Angular
import { TrustformersService } from 'trustformers-angular';
@Injectable({ providedIn: 'root' })
export class AppComponent {
constructor(private tf: TrustformersService) {}
async generate(prompt: string) {
return this.tf.generate(prompt).pipe(toArray()).toPromise();
}
}
Web Components
Utilities
// Performance measurement
const timer = ;
// ... do work ...
console.log;
// Memory statistics
const stats = ;
console.log;
// Feature detection
console.log;
console.log;
Feature Flags
webgpu— WebGPU compute shader backend (wgpu 29.0)web-workers— Web Workers parallelism (SharedArrayBuffer)shared-memory— Shared memory for multi-threaded WASMkernel-fusion— Fused transformer kernels (MHA, FFN, LayerNorm+Residual)async-executor— Async Rust executor for WASMindexeddb— IndexedDB model and KV-cache persistencememory64— WASM memory64 for models >4GBstreaming-loader— Progressive chunked model loadingreact-components— React hooks and component libraryvue-components— Vue composables and pluginangular-components— Angular services and directivesweb-components— Framework-agnostic custom elementsplayground— Interactive browser playgroundstreaming-generation— Token-by-token streaming inferencemobile-optimization— Battery/network-adaptive loadingscirs2— SciRS2-core tensor operations
WebGPU Notes (wgpu 29.0)
This crate targets wgpu 29.0 with the following API specifics:
InstanceDescriptor::new_without_display_handle()— headless instance creationbind_group_layoutsaccepts&[Option<&BindGroupLayout>]for sparse layouts- Kernel fusion enabled for MHA (2.5x), FFN (1.8x), LayerNorm+Residual (1.5x)
Examples
See the examples/ directory for complete examples:
index.html/playground.html— Interactive browser demodemo/— Full-featured playground application- Node.js example in
examples/
Performance Tips
- Enable WebGPU: Use Chrome 113+ / Edge 113+ for 50-100x speedup
- Enable SIMD: Compile with WASM SIMD128 target feature
- Batch operations: Process multiple inputs together
- Use IndexedDB caching: Avoid re-downloading models between sessions
- Enable kernel fusion:
webgpu+kernel-fusionfeatures - Reuse tensors: Minimize allocations in hot loops
Testing
# Run WASM tests
# Run with specific features
# Check compilation
128 unit tests with 100% pass rate, covering:
- Core tensor operations
- WebGPU backend (mock device)
- BERT forward pass
- Framework binding contracts
- Streaming generation
- IndexedDB model cache
Limitations
- WebGPU requires Chrome 113+, Edge 113+, or Safari (experimental)
- SharedArrayBuffer requires cross-origin isolation headers
- SIMD requires WASM SIMD128 browser support
- Memory typically capped at 2-4GB (use
memory64+ quantization for large models)
License
Apache-2.0