Crate ruvllm_wasm

Crate ruvllm_wasm 

Source
Expand description

§RuvLLM WASM - Browser-Compatible LLM Inference Runtime

This crate provides WebAssembly bindings for the RuvLLM inference runtime, enabling LLM inference directly in web browsers.

§Features

  • KV Cache Management: Two-tier KV cache with FP16 tail and quantized store
  • Memory Pooling: Efficient buffer reuse for minimal allocation overhead
  • Chat Templates: Support for Llama3, Mistral, Qwen, Phi, Gemma formats
  • Intelligent Learning: HNSW Router (150x faster), MicroLoRA (<1ms adaptation), SONA loops
  • TypeScript-Friendly: All types have getter/setter methods for easy JS interop

§Quick Start (JavaScript)

import init, { RuvLLMWasm, GenerateConfig, ChatMessageWasm, ChatTemplateWasm } from 'ruvllm-wasm';

async function main() {
    // Initialize WASM module
    await init();

    // Create inference engine
    const llm = new RuvLLMWasm();
    llm.initialize();

    // Format a chat conversation
    const template = ChatTemplateWasm.llama3();
    const messages = [
        ChatMessageWasm.system("You are a helpful assistant."),
        ChatMessageWasm.user("What is WebAssembly?"),
    ];
    const prompt = template.format(messages);

    console.log("Formatted prompt:", prompt);

    // KV Cache management
    const config = new KvCacheConfigWasm();
    config.tailLength = 256;
    const kvCache = new KvCacheWasm(config);

    const stats = kvCache.stats();
    console.log("Cache stats:", stats.toJson());

    // Intelligent LLM with learning
    const intelligentConfig = new IntelligentConfigWasm();
    const intelligentLLM = new IntelligentLLMWasm(intelligentConfig);

    // Process with routing, LoRA, and SONA learning
    const embedding = new Float32Array(384);
    const output = intelligentLLM.process(embedding, "user query", 0.9);

    console.log("Intelligent stats:", intelligentLLM.stats());
}

main();

§Building

# Build for browser (bundler target)
wasm-pack build --target bundler

# Build for Node.js
wasm-pack build --target nodejs

# Build for web (no bundler)
wasm-pack build --target web

§Architecture

+-------------------+     +-------------------+
| JavaScript/TS     |---->| wasm-bindgen      |
| Application       |     | Bindings          |
+-------------------+     +-------------------+
                                  |
                                  v
                          +-------------------+
                          | RuvLLM Core       |
                          | (Rust WASM)       |
                          +-------------------+
                                  |
                                  v
                          +-------------------+
                          | Memory Pool       |
                          | KV Cache          |
                          | Chat Templates    |
                          +-------------------+

§Memory Management

The WASM module uses efficient memory management strategies:

  • Arena Allocator: O(1) bump allocation for inference temporaries
  • Buffer Pool: Pre-allocated buffers in size classes (1KB-256KB)
  • Two-Tier KV Cache: FP32 tail + u8 quantized store

§Browser Compatibility

Requires browsers with WebAssembly support:

  • Chrome 57+
  • Firefox 52+
  • Safari 11+
  • Edge 16+

Re-exports§

pub use hnsw_router::HnswRouterWasm;
pub use hnsw_router::PatternWasm;
pub use hnsw_router::RouteResultWasm;
pub use sona_instant::SonaAdaptResultWasm;
pub use sona_instant::SonaConfigWasm;
pub use sona_instant::SonaInstantWasm;
pub use sona_instant::SonaStatsWasm;
pub use utils::error;
pub use utils::log;
pub use utils::now_ms;
pub use utils::set_panic_hook;
pub use utils::warn;
pub use utils::Timer;
pub use workers::ParallelInference;
pub use workers::is_shared_array_buffer_available;
pub use workers::is_atomics_available;
pub use workers::cross_origin_isolated;
pub use workers::optimal_worker_count;
pub use workers::feature_summary;
pub use workers::detect_capability_level;
pub use workers::supports_parallel_inference;
pub use bindings::*;

Modules§

bindings
JavaScript/WASM Bindings for RuvLLM
hnsw_router
HNSW Semantic Router for Browser-Compatible Pattern Routing
micro_lora
MicroLoRA for WASM - Browser-Compatible Lightweight LoRA Adaptation
sona_instant
SONA Instant Loop - Browser-Compatible Instant Learning
utils
Utility functions for WASM environment
workers
Web Workers for Parallel Inference in WASM

Functions§

health_check
Perform a simple health check.
init
Initialize the WASM module.