Expand description
§RuvLLM WASM - Browser-Compatible LLM Inference Runtime
This crate provides WebAssembly bindings for the RuvLLM inference runtime, enabling LLM inference directly in web browsers.
§Features
- KV Cache Management: Two-tier KV cache with FP16 tail and quantized store
- Memory Pooling: Efficient buffer reuse for minimal allocation overhead
- Chat Templates: Support for Llama3, Mistral, Qwen, Phi, Gemma formats
- Intelligent Learning: HNSW Router (150x faster), MicroLoRA (<1ms adaptation), SONA loops
- TypeScript-Friendly: All types have getter/setter methods for easy JS interop
§Quick Start (JavaScript)
import init, { RuvLLMWasm, GenerateConfig, ChatMessageWasm, ChatTemplateWasm } from 'ruvllm-wasm';
async function main() {
// Initialize WASM module
await init();
// Create inference engine
const llm = new RuvLLMWasm();
llm.initialize();
// Format a chat conversation
const template = ChatTemplateWasm.llama3();
const messages = [
ChatMessageWasm.system("You are a helpful assistant."),
ChatMessageWasm.user("What is WebAssembly?"),
];
const prompt = template.format(messages);
console.log("Formatted prompt:", prompt);
// KV Cache management
const config = new KvCacheConfigWasm();
config.tailLength = 256;
const kvCache = new KvCacheWasm(config);
const stats = kvCache.stats();
console.log("Cache stats:", stats.toJson());
// Intelligent LLM with learning
const intelligentConfig = new IntelligentConfigWasm();
const intelligentLLM = new IntelligentLLMWasm(intelligentConfig);
// Process with routing, LoRA, and SONA learning
const embedding = new Float32Array(384);
const output = intelligentLLM.process(embedding, "user query", 0.9);
console.log("Intelligent stats:", intelligentLLM.stats());
}
main();§Building
# Build for browser (bundler target)
wasm-pack build --target bundler
# Build for Node.js
wasm-pack build --target nodejs
# Build for web (no bundler)
wasm-pack build --target web§Architecture
+-------------------+ +-------------------+
| JavaScript/TS |---->| wasm-bindgen |
| Application | | Bindings |
+-------------------+ +-------------------+
|
v
+-------------------+
| RuvLLM Core |
| (Rust WASM) |
+-------------------+
|
v
+-------------------+
| Memory Pool |
| KV Cache |
| Chat Templates |
+-------------------+§Memory Management
The WASM module uses efficient memory management strategies:
- Arena Allocator: O(1) bump allocation for inference temporaries
- Buffer Pool: Pre-allocated buffers in size classes (1KB-256KB)
- Two-Tier KV Cache: FP32 tail + u8 quantized store
§Browser Compatibility
Requires browsers with WebAssembly support:
- Chrome 57+
- Firefox 52+
- Safari 11+
- Edge 16+
Re-exports§
pub use hnsw_router::HnswRouterWasm;pub use hnsw_router::PatternWasm;pub use hnsw_router::RouteResultWasm;pub use sona_instant::SonaAdaptResultWasm;pub use sona_instant::SonaConfigWasm;pub use sona_instant::SonaInstantWasm;pub use sona_instant::SonaStatsWasm;pub use utils::error;pub use utils::log;pub use utils::now_ms;pub use utils::set_panic_hook;pub use utils::warn;pub use utils::Timer;pub use workers::ParallelInference;pub use workers::is_atomics_available;pub use workers::cross_origin_isolated;pub use workers::optimal_worker_count;pub use workers::feature_summary;pub use workers::detect_capability_level;pub use workers::supports_parallel_inference;pub use bindings::*;
Modules§
- bindings
- JavaScript/WASM Bindings for RuvLLM
- hnsw_
router - HNSW Semantic Router for Browser-Compatible Pattern Routing
- micro_
lora - MicroLoRA for WASM - Browser-Compatible Lightweight LoRA Adaptation
- sona_
instant - SONA Instant Loop - Browser-Compatible Instant Learning
- utils
- Utility functions for WASM environment
- workers
- Web Workers for Parallel Inference in WASM
Functions§
- health_
check - Perform a simple health check.
- init
- Initialize the WASM module.