wasmicro 0.0.1

Tiny transformer inference for the web. One file. No build step.

Coverage
100%
96 out of 96 items documented1 out of 49 items with examples
Size
Source code size: 134.03 kB This is the summed size of all the files inside the crates.io package for this release.
Documentation size: 1.32 MB This is the summed size of all files generated by rustdoc for all configured targets
Ø build duration
this release: 10s Average build duration of successful builds.
all releases: 6s Average build duration of successful builds in releases after 2024-10-23.
Links
Homepage
Xzdes/wasmicro
1 0 0
crates.io
Dependencies
Versions
Owners

wasmicro

Tiny transformer inference for the web. One file. No build step.

wasmicro runs transformer models (embeddings, classifiers, small LLMs) in any JavaScript environment — browser, Node.js, Cloudflare Workers, Electron — with a single small .wasm file. The same crate also runs natively, so the same code powers your tests, your benchmarks, and your production website.

Status

Pre-alpha. Working today:

Tensor core. Owned Tensor with inline shape. No Rc<RefCell>, no autograd, no training state.
Forward ops. matmul, matmul_t_b, linear, embedding, softmax, layer_norm, relu, silu, gelu_tanh, gelu_erf, elementwise math, multi-head self-attention, mean pooling.
BERT encoder. Full forward pass against the HuggingFace BERT weight layout (bert-base-uncased, sentence-transformers/*, etc.).
Model loader. ModelFile::parse(&[u8]) reads safetensors with a hand-rolled JSON parser. No serde, no serde_json in the library.
Converter CLI. wasmicro-convert <hf-model-id> <out-dir> downloads a model from the HuggingFace Hub and validates it.
WASM build + demo. GitHub Actions builds the WASM bundle and deploys a live demo page on every push to main.

Quick start (using wasmicro in another project)

The most convenient way is a path dependency while iterating locally:

[dependencies]
wasmicro = { path = "../wasmicro" }

A git dependency is just as easy:

[dependencies]
wasmicro = { git = "https://github.com/Xzdes/wasmicro" }

Once it is published, crates.io will be the recommended path:

[dependencies]
wasmicro = "0.0.1"

Use it:

use std::fs;
use wasmicro::{models::bert::{BertConfig, BertModel}, ModelFile};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let bytes = fs::read("model.safetensors")?;
    let file = ModelFile::parse(&bytes)?;

    let config = BertConfig::mini_lm_l6_v2();
    let model = BertModel::from_safetensors(&file, config, "")?;

    let input_ids = vec![101u32, 7592, 2088, 102]; // [CLS] hello world [SEP]
    let embedding = model.embed_sentence(&input_ids, None, None);

    println!("embedding dim: {:?}", embedding.shape().as_slice());
    Ok(())
}

Get a model

# Build the converter (one-time)
cargo build --release -p wasmicro-convert

# Download all-MiniLM-L6-v2 from the HuggingFace Hub
./target/release/wasmicro-convert \
    sentence-transformers/all-MiniLM-L6-v2 \
    ./models/mini-lm

Output:

models/mini-lm/
├── model.safetensors    (~ 87 MB, ready to pass to ModelFile::parse)
├── config.json
└── tokenizer.json

Building

# Native — tests, benchmarks, examples.
cargo test --workspace
cargo run --example load_safetensors

# WASM bundle (browser, ES modules).
wasm-pack build --release --target web --features wasm \
    --out-dir demo/pkg --out-name wasmicro
wasm-opt -Oz demo/pkg/wasmicro_bg.wasm -o demo/pkg/wasmicro_bg.wasm

# Serve the demo locally
cd demo && python -m http.server 8080

Demo

A live demo is built and deployed automatically by GitHub Actions on every push to main. The workflow is at .github/workflows/pages.yml.

To enable Pages on your fork:

Settings → Pages → Build and deployment → Source: GitHub Actions.
Push to main. The pages workflow builds the WASM bundle, runs wasm-opt -Oz, and publishes demo/ to Pages.

Project layout

wasmicro/
├── src/                       # the library
│   ├── lib.rs
│   ├── tensor.rs              # owned f32 tensor + inline shape
│   ├── loader.rs              # safetensors parser (no serde)
│   ├── error.rs
│   ├── ops/                   # forward ops: matmul, attention, layernorm, …
│   ├── models/
│   │   └── bert.rs            # BertModel + forward + from_safetensors
│   └── wasm.rs                # wasm-bindgen surface (feature = "wasm")
├── tools/
│   └── wasmicro-convert/      # CLI to download & validate HF models
├── tests/                     # integration tests via the public API
├── examples/                  # runnable demos
├── demo/                      # static site for GitHub Pages
└── .github/workflows/         # CI + Pages deploy

Design rules

These are non-negotiable. Code that breaks them gets reverted.

Tiny WASM bundle. Target: < 250 KB after wasm-opt -Oz.
Forward only. No autograd, no optimizers, no training.
Owned tensors. Vec<f32>, no Rc, no RefCell.
No heavy dependencies. The library's default build pulls in only bytemuck. No ndarray, candle, rayon, serde_json, chrono. (The wasmicro-convert CLI is a separate crate — it can have any deps it likes.)
The host owns bytes. ModelFile::parse(&[u8]) works for files, fetches, mmap, ArrayBuffer — all the same to us.
Ops are free functions. Layers are functions, not objects.

Roadmap

Project skeleton
Plain tensor + shape
Forward ops: matmul, linear, embedding, softmax, layernorm, GELU/SiLU/ReLU
safetensors loader with no serde
Multi-head attention + mean pooling
BERT encoder forward + from_safetensors
HuggingFace → wasmicro converter CLI
CI + GitHub Pages deploy workflow
WASM demo page
WordPiece tokenizer (so the demo can run end-to-end without pre-tokenized input)
Real all-MiniLM-L6-v2 semantic-search demo
WASM SIMD128 paths
int8 quantization
GPT-2 + KV-cache
WebGPU backend

License

MIT OR Apache-2.0