Crate ruvllm_wasm

Expand description

§RuvLLM WASM - Browser-Compatible LLM Inference Runtime

This crate provides WebAssembly bindings for the RuvLLM inference runtime, enabling LLM inference directly in web browsers.

§Features

KV Cache Management: Two-tier KV cache with FP16 tail and quantized store
Memory Pooling: Efficient buffer reuse for minimal allocation overhead
Chat Templates: Support for Llama3, Mistral, Qwen, Phi, Gemma formats
Intelligent Learning: HNSW Router (150x faster), MicroLoRA (<1ms adaptation), SONA loops
TypeScript-Friendly: All types have getter/setter methods for easy JS interop

§Quick Start (JavaScript)

import init, { RuvLLMWasm, GenerateConfig, ChatMessageWasm, ChatTemplateWasm } from 'ruvllm-wasm';

async function main() {
    // Initialize WASM module
    await init();

    // Create inference engine
    const llm = new RuvLLMWasm();
    llm.initialize();

    // Format a chat conversation
    const template = ChatTemplateWasm.llama3();
    const messages = [
        ChatMessageWasm.system("You are a helpful assistant."),
        ChatMessageWasm.user("What is WebAssembly?"),
    ];
    const prompt = template.format(messages);

    console.log("Formatted prompt:", prompt);

    // KV Cache management
    const config = new KvCacheConfigWasm();
    config.tailLength = 256;
    const kvCache = new KvCacheWasm(config);

    const stats = kvCache.stats();
    console.log("Cache stats:", stats.toJson());

    // Intelligent LLM with learning
    const intelligentConfig = new IntelligentConfigWasm();
    const intelligentLLM = new IntelligentLLMWasm(intelligentConfig);

    // Process with routing, LoRA, and SONA learning
    const embedding = new Float32Array(384);
    const output = intelligentLLM.process(embedding, "user query", 0.9);

    console.log("Intelligent stats:", intelligentLLM.stats());
}

main();

§Building

# Build for browser (bundler target)
wasm-pack build --target bundler

# Build for Node.js
wasm-pack build --target nodejs

# Build for web (no bundler)
wasm-pack build --target web

§Architecture

+-------------------+     +-------------------+
| JavaScript/TS     |---->| wasm-bindgen      |
| Application       |     | Bindings          |
+-------------------+     +-------------------+
                                  |
                                  v
                          +-------------------+
                          | RuvLLM Core       |
                          | (Rust WASM)       |
                          +-------------------+
                                  |
                                  v
                          +-------------------+
                          | Memory Pool       |
                          | KV Cache          |
                          | Chat Templates    |
                          +-------------------+

§Memory Management

The WASM module uses efficient memory management strategies:

Arena Allocator: O(1) bump allocation for inference temporaries
Buffer Pool: Pre-allocated buffers in size classes (1KB-256KB)
Two-Tier KV Cache: FP32 tail + u8 quantized store

§Browser Compatibility

Requires browsers with WebAssembly support:

Chrome 57+
Firefox 52+
Safari 11+
Edge 16+

Re-exports§

pub use hnsw_router::HnswRouterWasm;
pub use hnsw_router::PatternWasm;
pub use hnsw_router::RouteResultWasm;
pub use sona_instant::SonaAdaptResultWasm;
pub use sona_instant::SonaConfigWasm;
pub use sona_instant::SonaInstantWasm;
pub use sona_instant::SonaStatsWasm;
pub use utils::error;
pub use utils::log;
pub use utils::now_ms;
pub use utils::set_panic_hook;
pub use utils::warn;
pub use utils::Timer;
pub use workers::ParallelInference;
pub use workers::is_shared_array_buffer_available;
pub use workers::is_atomics_available;
pub use workers::cross_origin_isolated;
pub use workers::optimal_worker_count;
pub use workers::feature_summary;
pub use workers::detect_capability_level;
pub use workers::supports_parallel_inference;
pub use bindings::*;

Modules§

bindings: JavaScript/WASM Bindings for RuvLLM
hnsw_router: HNSW Semantic Router for Browser-Compatible Pattern Routing
micro_lora: MicroLoRA for WASM - Browser-Compatible Lightweight LoRA Adaptation
sona_instant: SONA Instant Loop - Browser-Compatible Instant Learning
utils: Utility functions for WASM environment
workers: Web Workers for Parallel Inference in WASM

Functions§

health_check: Perform a simple health check.
init: Initialize the WASM module.

Crate ruvllm_wasm

Crate ruvllm_wasm Copy item path

§RuvLLM WASM - Browser-Compatible LLM Inference Runtime

§Features

§Quick Start (JavaScript)

§Building

§Architecture

§Memory Management

§Browser Compatibility

Re-exports§

Modules§

Functions§

Crate ruvllm_wasm