wasmicro
Tiny transformer inference for the web. One file. No build step.
wasmicro runs transformer models (embeddings, classifiers, small LLMs) in
any JavaScript environment — browser, Node.js, Cloudflare Workers, Electron —
with a single small .wasm file. The same crate also runs natively, so the
same code powers your tests, your benchmarks, and your production website.
Status
Pre-alpha. Working today:
- Tensor core. Owned
Tensorwith inline shape. NoRc<RefCell>, no autograd, no training state. - Forward ops.
matmul,matmul_t_b,linear,embedding,softmax,layer_norm,relu,silu,gelu_tanh,gelu_erf, elementwise math, multi-head self-attention, mean pooling. - BERT encoder. Full forward pass against the HuggingFace BERT weight
layout (
bert-base-uncased,sentence-transformers/*, etc.). - Model loader.
ModelFile::parse(&[u8])reads safetensors with a hand-rolled JSON parser. Noserde, noserde_jsonin the library. - Converter CLI.
wasmicro-convert <hf-model-id> <out-dir>downloads a model from the HuggingFace Hub and validates it. - WASM build + demo. GitHub Actions builds the WASM bundle and deploys
a live demo page on every push to
main.
Quick start (using wasmicro in another project)
The most convenient way is a path dependency while iterating locally:
[]
= { = "../wasmicro" }
A git dependency is just as easy:
[]
= { = "https://github.com/Xzdes/wasmicro" }
Once it is published, crates.io will be the recommended path:
[]
= "0.0.1"
Use it:
use fs;
use ;
Get a model
# Build the converter (one-time)
# Download all-MiniLM-L6-v2 from the HuggingFace Hub
Output:
models/mini-lm/
├── model.safetensors (~ 87 MB, ready to pass to ModelFile::parse)
├── config.json
└── tokenizer.json
Building
# Native — tests, benchmarks, examples.
# WASM bundle (browser, ES modules).
# Serve the demo locally
&&
Demo
A live demo is built and deployed automatically by GitHub Actions on every
push to main. The workflow is at .github/workflows/pages.yml.
To enable Pages on your fork:
- Settings → Pages → Build and deployment → Source: GitHub Actions.
- Push to
main. Thepagesworkflow builds the WASM bundle, runswasm-opt -Oz, and publishesdemo/to Pages.
Project layout
wasmicro/
├── src/ # the library
│ ├── lib.rs
│ ├── tensor.rs # owned f32 tensor + inline shape
│ ├── loader.rs # safetensors parser (no serde)
│ ├── error.rs
│ ├── ops/ # forward ops: matmul, attention, layernorm, …
│ ├── models/
│ │ └── bert.rs # BertModel + forward + from_safetensors
│ └── wasm.rs # wasm-bindgen surface (feature = "wasm")
├── tools/
│ └── wasmicro-convert/ # CLI to download & validate HF models
├── tests/ # integration tests via the public API
├── examples/ # runnable demos
├── demo/ # static site for GitHub Pages
└── .github/workflows/ # CI + Pages deploy
Design rules
These are non-negotiable. Code that breaks them gets reverted.
- Tiny WASM bundle. Target: < 250 KB after
wasm-opt -Oz. - Forward only. No autograd, no optimizers, no training.
- Owned tensors.
Vec<f32>, noRc, noRefCell. - No heavy dependencies. The library's default build pulls in only
bytemuck. Nondarray,candle,rayon,serde_json,chrono. (Thewasmicro-convertCLI is a separate crate — it can have any deps it likes.) - The host owns bytes.
ModelFile::parse(&[u8])works for files, fetches,mmap,ArrayBuffer— all the same to us. - Ops are free functions. Layers are functions, not objects.
Roadmap
- Project skeleton
- Plain tensor + shape
- Forward ops: matmul, linear, embedding, softmax, layernorm, GELU/SiLU/ReLU
- safetensors loader with no
serde - Multi-head attention + mean pooling
- BERT encoder forward +
from_safetensors - HuggingFace → wasmicro converter CLI
- CI + GitHub Pages deploy workflow
- WASM demo page
- WordPiece tokenizer (so the demo can run end-to-end without pre-tokenized input)
- Real
all-MiniLM-L6-v2semantic-search demo - WASM SIMD128 paths
- int8 quantization
- GPT-2 + KV-cache
- WebGPU backend
License
MIT OR Apache-2.0