# AGENTS.md
Onboarding notes for LLM coding assistants (Claude Code, Cursor, Aider, etc.)
working on `wasmicro`. Read this before making changes.
## What this crate is
A minimal forward-only inference library for transformer models. The target
deployment is a small WebAssembly bundle (< 250 KB after `wasm-opt -Oz`) that
runs in a browser, Node, Cloudflare Workers, or any other JS host. The same
crate also runs natively for tests and benchmarks.
## What this crate is NOT
- Not a training framework. No autograd, no backward pass, no optimizers.
- Not a general tensor library. It exists to run transformer-shaped
computations. Reject feature requests that pull the crate away from that.
- Not a wrapper around `candle`, `tch`, `ort`, or `ndarray`. Adding any of
these as a dependency defeats the purpose.
## Hard rules (library crate `wasmicro`)
These exist so the WASM bundle stays small and predictable.
1. **No `Rc`, no `RefCell`, no `Arc`, no `Mutex` in the tensor or op layer.**
Tensors own their data. If you need an out parameter, take `&mut Tensor`.
2. **No autograd state on `Tensor`.** No `requires_grad`, no `grad`,
no `ctx`. Forward ops produce new owned tensors or write into out params.
3. **No `ndarray`, no `candle*`, no `rayon`, no `serde_json`, no `chrono`,
no `getrandom`** in the default build. The default dependency set is
exactly `bytemuck`. Adding anything else needs an explicit justification
in the PR description and a corresponding WASM size measurement.
4. **No `std::fs` inside the library.** The host provides bytes via
`ModelFile::parse(&[u8])`. Examples and tests may read files; the library
itself may not.
5. **All identifiers, comments, doc-comments, and error messages in English.**
No exceptions.
6. **Ops are free functions.** Do not introduce a `trait Layer` or a
`Module` zoo. Models are user-written structs of weights with their own
`forward` method.
7. **Public errors go through `wasmicro::Error`.** Do not return
`Box<dyn Error>` or panic on user input (panics are fine for internal
shape contracts).
8. **Every new op needs unit tests with known-good values.** Use
approximate-equality helpers for floats; do not rely on bitwise equality.
## CLI crate `wasmicro-convert`
The converter at `tools/wasmicro-convert/` is a separate workspace member
and is NOT subject to the same dependency restrictions as the library. It
can use `hf-hub`, `serde_json`, `reqwest`, `clap` — anything reasonable for
a desktop CLI. It never ships in the WASM bundle.
## File map
```
src/
lib.rs Crate root. Module declarations + re-exports.
error.rs `Error` enum and `Result<T>` alias.
tensor.rs `Tensor` and `Shape`. Owned data, inline shape.
loader.rs Safetensors parser + hand-rolled JSON.
wasm.rs `#[wasm_bindgen]` bindings (only with `wasm` feature).
ops/
mod.rs Module aggregator.
matmul.rs `matmul` (A @ B) and `matmul_t_b` (A @ B^T).
linear.rs `linear(x, W, b)` — built on `matmul_t_b` + `add_bias`.
embedding.rs `embedding(ids, W)` — row lookup.
elementwise.rs `add`, `sub`, `mul`, `scale`, `add_bias`.
softmax.rs `softmax_last_dim`, numerically stable.
layernorm.rs `layer_norm`, Welford single-pass.
activations.rs `relu`, `silu`, `gelu_tanh`, `gelu_erf`.
attention.rs `multi_head_attention`, `mean_pool`.
models/
mod.rs Module aggregator.
bert.rs `BertConfig`, `BertModel`, `forward`, `from_safetensors`.
tests/ Integration tests that consume the public API only.
examples/ Runnable demos.
tools/wasmicro-convert/ Standalone CLI: download HF model + validate.
demo/ Static site deployed to GitHub Pages.
.github/workflows/ CI (test + clippy + wasm check) and Pages deploy.
```
## Conventions
### Tensor convention
- 32-bit float (`f32`) only. f16/bf16/int8 support, when added, will come
as separate dtypes and conversion paths — never silent.
- Row-major layout. `[m, k] @ [k, n] -> [m, n]`.
- Linear weights match PyTorch: `[out_features, in_features]`. Use
`matmul_t_b` when applying them.
- Multi-head attention treats batch=1 implicitly. Inputs are 2D
`[seq_len, hidden]`. Add batched paths only when needed.
### Ops API shape
```rust
/// One-line doc explaining the math.
///
/// - `x`: shape and meaning.
/// - returned tensor shape: ...
pub fn my_op(x: &Tensor /* , ... */) -> Tensor {
// validate shapes with `assert_eq!` / `assert!` and clear messages
// produce output as `Vec<f32>` of the right size
// return `Tensor::from_vec(out, &[...])`
}
```
### Errors
`error::Error` is a flat enum with no heap-allocated payloads in the common
path. The static `&'static str` in `InvalidHeader(...)` is the only context
payload — use it for parser-level specificity. Do not add `String`
payloads casually.
### Testing
- Unit tests live next to their code in `#[cfg(test)] mod tests`.
- Integration tests under `tests/` exercise only the public API the way a
downstream user would.
- Run before every commit: `cargo test --workspace && cargo check --target
wasm32-unknown-unknown --features wasm`.
## Commands
```bash
# Native check + test (whole workspace)
cargo check --workspace
cargo test --workspace
# WASM target check (catches std accidentally leaking into the library)
cargo check --target wasm32-unknown-unknown --features wasm
# Build the WASM bundle for a browser
wasm-pack build --release --target web --no-opt \
--out-dir demo/pkg --out-name wasmicro --features wasm
wasm-opt --enable-bulk-memory --enable-nontrapping-float-to-int -Oz \
demo/pkg/wasmicro_bg.wasm -o demo/pkg/wasmicro_bg.wasm
# Build the converter
cargo build --release -p wasmicro-convert
# Publish dry-run for the library
cargo publish --dry-run --allow-dirty -p wasmicro
```
## When in doubt
- Smaller surface beats more features.
- Fewer dependencies beat faster code that needs a new dependency.
- A copy of 20 lines beats pulling in a 5,000-line crate.
- If you cannot justify a change as "this makes the WASM bundle smaller or
the cold start faster, or it adds a transformer architecture we want to
support" — do not make the change.