1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
//! Local in-browser model backend — Gemma 3 270M via Burn's wgpu/WebGPU
//! backend. Runs fully in the tab: no proxy, no `$LH`, no API key.
//!
//! **Status: scaffolding.** This module currently only proves the Burn/wgpu
//! tensor stack COMPILES on every target the crate builds for (native AND
//! `wasm32-unknown-unknown`) — the WebGPU-via-Burn feasibility gate for the
//! whole local-inference approach, validated before porting Gemma. The Gemma
//! transformer now lives in [`gemma`] (written + compiling, not yet
//! forward-pass-validated). The weight loader, tokenizer, and the
//! `ConnectionStrategy` / `Connection` impls land in the next phases.
/// Gemma 3 270M model architecture in Burn, verified against the official
/// `google/gemma-3-270m` config. Compiles native + wasm32; the forward pass is
/// not yet validated against reference logits — see the module docs.
/// Safetensors → Burn `GemmaModel` weight loader (HF→Burn rename, transpose,
/// bf16→f32, RMSNorm `(1+w)`, RoPE interleave permutation). In-memory bytes,
/// no filesystem — wasm-safe.
/// Gemma tokenizer (HF `tokenizers` crate, `unstable_wasm`) loaded from raw
/// `tokenizer.json` bytes.
/// Greedy (argmax) text generation over a loaded `GemmaModel` — no KV cache in
/// v1, recompute-per-step; identical on native + wasm32.
/// The `Connection` / `ConnectionStrategy` seam wiring the loaded Gemma engine
/// into the agent loop — mirrors the Anthropic backend. Weights are read from
/// OPFS; `send()` runs `generate()` and emits a text `Step`.
/// Best-effort textual tool-call parser: extracts the philschmid `tool_code`
/// markdown fence (`name(arg=val)`) from generated text into `(name, json
/// args)`, no `eval`. Drives the connection's parse-then-dispatch tool loop.
/// The Burn backend the local model runs on: `wgpu` — WebGPU on `wasm32`,
/// Vulkan/Metal/DX12 on native. Named here so the rest of the backend is
/// written once against `LocalBackend` and the concrete device is chosen at
/// the edges.
pub type LocalBackend = Wgpu;
/// Compile-feasibility smoke. References the Burn tensor type on the wgpu
/// backend so `cargo check` is forced to build the entire Burn/wgpu/cubecl
/// dependency tree for the current target. Not wired into the agent loop —
/// it exists solely so the wasm32 build proves WebGPU-via-Burn compiles
/// before we invest in the Gemma port. Returns the monomorphised type name.