Expand description
§wasmicro
Tiny transformer inference for the web. One file. No build step.
§Design rules
- Tiny WASM bundle (target: < 250 KB after
wasm-opt -Oz). - Fast cold start (target: < 500 ms model load + first inference).
- Forward inference only — no autograd, no optimizers, no training.
- Owned tensors, no
Rc<RefCell>indirection. - Minimal dependencies. The full default build pulls in only
bytemuck. - Same code runs natively and in WASM. The library never opens files —
callers pass bytes in via
ModelFile::parse.
§Quick start
use std::fs;
use wasmicro::{
models::bert::{BertConfig, BertModel},
ModelFile, WordPieceTokenizer,
};
let model_bytes = fs::read("model.safetensors").unwrap();
let vocab_bytes = fs::read("vocab.txt").unwrap();
let file = ModelFile::parse(&model_bytes).unwrap();
let tokenizer = WordPieceTokenizer::from_vocab_bytes(&vocab_bytes).unwrap();
let config = BertConfig::mini_lm_l6_v2();
let model = BertModel::from_safetensors(&file, config, "").unwrap();
let embedding = model.embed_text(&tokenizer, "hello world", 128).unwrap();
println!("embedding dim: {:?}", embedding.shape().as_slice());Re-exports§
pub use error::Error;pub use error::Result;pub use loader::Dtype;pub use loader::ModelFile;pub use loader::TensorView;pub use pipeline::Pipeline;pub use quant::QuantizedTensorI8;pub use quant::QuantizedTensorQ4;pub use quant::QuantizedTensorU8;pub use tensor::Shape;pub use tensor::Tensor;pub use tokenizer::bpe::BpeTokenizer;pub use tokenizer::EncodedInput;pub use tokenizer::WordPieceOptions;pub use tokenizer::WordPieceTokenizer;
Modules§
- error
- Error types used across the crate.
- loader
- Safetensors model file loader.
- models
- Pre-built transformer architectures.
- ops
- Forward-only tensor operations.
- pipeline
- High-level pipeline API — one call to load and run any supported model.
- quant
- Small weight-only quantized tensor types.
- tensor
- Plain, forward-only tensor.
- tokenizer
- Minimal WordPiece tokenizer with Unicode support.