Skip to main content

Crate oxillama_wasm

Crate oxillama_wasm 

Source
Expand description

WebAssembly bindings for OxiLLaMa.

Exposes GGUF header parsing, Q4_0 dequantization, and full text generation to JavaScript/TypeScript via wasm-bindgen.

§Feature flags

FeatureDefaultDescription
inferenceyesEnables generate() via oxillama-runtime with
the pure-Rust unstable_wasm tokenizer backend.

§Usage (generate)

import init, { generate } from './oxillama_wasm.js';
await init();

const modelResp = await fetch('model.gguf');
const modelBytes = new Uint8Array(await modelResp.arrayBuffer());
const tokenizerResp = await fetch('tokenizer.json');
const tokenizerJson = await tokenizerResp.text();

// Streaming: pass a callback to receive each token as it is generated.
const text = generate(modelBytes, tokenizerJson, "Hello, world!", 128,
    (token) => process.stdout.write(token));
console.log(text);

Re-exports§

pub use service_worker::get_service_worker_script;
pub use service_worker::register_service_worker;
pub use service_worker::ServiceWorkerOptions;
pub use simd_check::get_simd128_status;
pub use streaming_loader::StreamingGgufLoader;
pub use streaming_loader::StreamingLoadOptions;

Modules§

gpu_bridge
Async WebGPU bridge for OxiLLaMa WASM.
idb_cache
IndexedDB model cache for persisting GGUF model bytes across page reloads.
service_worker
Service-worker registration helper for the OxiLLaMa WASM browser build.
simd_check
SIMD128 capability detection for the OxiLLaMa WASM build.
streaming_load
Streaming / chunked GGUF loader.
streaming_loader
Streaming GGUF loader with LRU tensor cache and on-demand byte-range fetching.
webgpu
WebGPU compute bridge for accelerating dequantization in the browser.
worker
Web-worker message-passing helpers for offloaded inference.

Structs§

GgufMetadataJs
Typed GGUF metadata returned by parse_gguf_metadata.
WasmEngine
Opaque handle wrapping a loaded InferenceEngine for use from JS.

Functions§

dequant_q4_0
Dequantize a buffer of Q4_0 blocks to an array of f32 values.
dequant_q4_k
Dequantize a buffer of Q4_K blocks to an array of f32 values.
dequant_q5_k
Dequantize a buffer of Q5_K blocks to an array of f32 values.
dequant_q6_k
Dequantize a buffer of Q6_K blocks to an array of f32 values.
generate
Run full text generation from an in-memory GGUF model.
init
Initialize the WASM module (sets up panic hook for better error messages).
list_tensor_names
Return all tensor names stored in a GGUF file as a JS array of strings.
load_model_from_bytes_with_progress
Load a GGUF model from raw bytes, reporting progress via an optional JS callback.
parse_gguf_header
Parse a GGUF file header from raw bytes and return key metadata as a JS object.
parse_gguf_metadata
Parse a GGUF file and return typed metadata as a JS object.