Expand description
WebAssembly bindings for OxiLLaMa.
Exposes GGUF header parsing, Q4_0 dequantization, and full text generation to JavaScript/TypeScript via wasm-bindgen.
§Feature flags
| Feature | Default | Description |
|---|---|---|
inference | yes | Enables generate() via oxillama-runtime with |
the pure-Rust unstable_wasm tokenizer backend. |
§Usage (generate)
import init, { generate } from './oxillama_wasm.js';
await init();
const modelResp = await fetch('model.gguf');
const modelBytes = new Uint8Array(await modelResp.arrayBuffer());
const tokenizerResp = await fetch('tokenizer.json');
const tokenizerJson = await tokenizerResp.text();
// Streaming: pass a callback to receive each token as it is generated.
const text = generate(modelBytes, tokenizerJson, "Hello, world!", 128,
(token) => process.stdout.write(token));
console.log(text);Re-exports§
pub use service_worker::get_service_worker_script;pub use service_worker::register_service_worker;pub use service_worker::ServiceWorkerOptions;pub use simd_check::get_simd128_status;pub use streaming_loader::StreamingGgufLoader;pub use streaming_loader::StreamingLoadOptions;
Modules§
- gpu_
bridge - Async WebGPU bridge for OxiLLaMa WASM.
- idb_
cache - IndexedDB model cache for persisting GGUF model bytes across page reloads.
- service_
worker - Service-worker registration helper for the OxiLLaMa WASM browser build.
- simd_
check - SIMD128 capability detection for the OxiLLaMa WASM build.
- streaming_
load - Streaming / chunked GGUF loader.
- streaming_
loader - Streaming GGUF loader with LRU tensor cache and on-demand byte-range fetching.
- webgpu
- WebGPU compute bridge for accelerating dequantization in the browser.
- worker
- Web-worker message-passing helpers for offloaded inference.
Structs§
- Gguf
Metadata Js - Typed GGUF metadata returned by
parse_gguf_metadata. - Wasm
Engine - Opaque handle wrapping a loaded
InferenceEnginefor use from JS.
Functions§
- dequant_
q4_ 0 - Dequantize a buffer of Q4_0 blocks to an array of f32 values.
- dequant_
q4_ k - Dequantize a buffer of Q4_K blocks to an array of f32 values.
- dequant_
q5_ k - Dequantize a buffer of Q5_K blocks to an array of f32 values.
- dequant_
q6_ k - Dequantize a buffer of Q6_K blocks to an array of f32 values.
- generate
- Run full text generation from an in-memory GGUF model.
- init
- Initialize the WASM module (sets up panic hook for better error messages).
- list_
tensor_ names - Return all tensor names stored in a GGUF file as a JS array of strings.
- load_
model_ from_ bytes_ with_ progress - Load a GGUF model from raw bytes, reporting progress via an optional JS callback.
- parse_
gguf_ header - Parse a GGUF file header from raw bytes and return key metadata as a JS object.
- parse_
gguf_ metadata - Parse a GGUF file and return typed metadata as a JS object.