edgebert 0.3.1

Fast local text embeddings library for Rust and WASM for BERT inference on native and edge devices with no dependencies.
Documentation
# EdgeBERT

**A pure Rust + WASM implementation for BERT inference with minimal dependencies**

[![crates.io]()](https://crates.io/crates/edgebert)
[![docs.rs](https://docs.rs/edgebert/badge.svg)](https://docs.rs/edgebert)
[![Build Status](https://github.com/olafurjohannsson/edgebert/workflows/CI/badge.svg)](https://github.com/olafurjohannsson/edgebert/actions)

---

## Overview

EdgeBERT is a lightweight Rust implementation of BERT inference focused on native, edge computing and WASM deployment. Run sentence-transformers models anywhere without Python or large ML runtimes.

## Status
- ✅ MiniLM-L6-v2 inference working
- ✅ WASM support
- ⚠️ Performance optimization ongoing
- 🚧 Additional models coming soon

## Contributions

All contributions welcome, this is very early stages.

## Components

- Encoder: Run inference to turn text into embeddings
- WordPiece tokenization: A small tokenization implementation based on WordPiece
- Cross-Platform (WebAssembly and native)
- No Python or C/C++ dependencies except for OpenBLAS for ndarray vectorized matrix operations

**Use this if you need:**
- Embeddings in pure Rust without Python/C++ dependencies
- BERT in browsers or edge devices
- Offline RAG systems
- Small binary size over maximum speed

**Don't use this if you need:**
- Multiple model architectures (currently only MiniLM)
- GPU acceleration
- Production-grade performance (use ONNX Runtime instead)

## Getting Started

### 1. Native Rust Application

For server-side or desktop applications, you can use the library directly.

**`Cargo.toml`**
```toml
[dependencies]
edgebert = "0.3.0"
anyhow = "1.0"
```

**`main.rs`**
```rust
use anyhow::Result;
use edgebert::{Model, ModelType};
fn main() -> Result<()> {
    let model = Model::from_pretrained(ModelType::MiniLML6V2)?;

    let texts = vec!["Hello world", "How are you"];
    let embeddings = model.encode(texts.clone(), true)?;

    for (i, embedding) in embeddings.iter().enumerate() {
        let n = embedding.len().min(10);
        println!("Text: {} == {:?}...", texts[i], &embedding[0..n]);
    }
    Ok(())
}
```

**Output:**
```
Text: Hello world == [-0.034439795, 0.030909885, 0.0066969804, 0.02608013, -0.03936993, -0.16037229, 0.06694216, -0.0065279473, -0.0474657, 0.014813968]...
Text: How are you == [-0.031447295, 0.03784213, 0.0761843, 0.045665547, -0.0012263817, -0.07488511, 0.08155286, 0.010209872, -0.11220472, 0.04075747]...
```

You can see the full example under `examples/basic.rs` - to build and run:
```bash
cargo run --release --example basic
```

### 2. WebAssembly

```bash
# Install wasm-pack
curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh

# Build
./scripts/wasm-build.sh

# Serve
cd examples && npx serve
```

```javascript
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>EdgeBERT WASM Test</title>
</head>
<body>
<script type="module">
    import init, { WasmModel, WasmModelType } from './pkg/edgebert.js';

    async function run() {
        await init();

        const model = await WasmModel.from_type(WasmModelType.MiniLML6V2);
        const texts = ["Hello world", "How are you"];
        const embeddings = model.encode(texts, true);

        console.log("First 10 values:", embeddings.slice(0, 10));
    }

    run().catch(console.error);
</script>
</body>
</html>

```

**Output:**
```
First 10 values: Float32Array(10) [-0.034439802169799805, 0.03090989589691162, 0.006696964148432016, 0.02608015574514866, -0.03936990723013878, -0.16037224233150482, 0.06694218516349792, -0.006527911406010389, -0.04746570065617561, 0.014813981018960476, buffer: ArrayBuffer(40), byteLength: 40, byteOffset: 0, length: 10, Symbol(Symbol.toStringTag): 'Float32Array']
```

You can see the full example under `examples/basic.html` - to build run `scripts/wasm-build.sh` and go into `examples/` and run a local server, `npx serve` can serve wasm.

### 3. Web Workers

You can look at `examples/worker.html` and `examples/worker.js` to see how to use web workers and web assembly, the library
handles both when window is defined, as with `basic.html` and also when it is not, web workers.

After compiling the WASM build, if you used the wasm-build.sh it should be inside examples/pkg, use npx serve
and open `localhost:3000/worker`

Clicking on generate embeddings after the model loads generates

```
Encoding texts: ["Hello world","How are you?"]
Embeddings shape: [2, 384]
'Hello world' vs 'How are you?': 0.305
First embedding norm: 1.000000
First 10 values: [-0.0344, 0.0309, 0.0067, 0.0261, -0.0394, -0.1604, 0.0669, -0.0065, -0.0475, 0.0148]
```


### 4. Comparison with PyTorch

Small example from pytorch to encode and show similarity

```python
from sentence_transformers import SentenceTransformer
import torch
import torch.nn.functional as F

texts = ["Hello world", "How are you"]

model = SentenceTransformer('all-MiniLM-L6-v2')

embeddings = model.encode(texts, convert_to_tensor=True)

for i, emb in enumerate(embeddings):
    print(f"Text: {texts[i]}")
    print("First 10 values:", emb[:10].tolist())
    print()

cos_sim = F.cosine_similarity(embeddings[0], embeddings[1], dim=0)
print(f"Cosine similarity ('{texts[0]}' vs '{texts[1]}'):", cos_sim.item())

```

**Output from Python:**
```
Text: Hello world == ['-0.0345', '0.0310', '0.0067', '0.0261', '-0.0394', '-0.1603', '0.0669', '-0.0064', '-0.0475', '0.0148']...
Text: How are you == ['-0.0314', '0.0378', '0.0763', '0.0457', '-0.0012', '-0.0748', '0.0816', '0.0102', '-0.1122', '0.0407']...

Cosine similarity ('Hello world' vs 'How are you'): 0.3624

```

**EdgeBERT**

```rust
use anyhow::Result;
use edgebert::{Model, ModelType};

fn main() -> Result<()> {
    let model = Model::from_pretrained(ModelType::MiniLML6V2)?;
    let texts = vec!["Hello world", "How are you"];
    let embeddings = model.encode(texts.clone(), true)?;

    for (i, embedding) in embeddings.iter().enumerate() {
        let n = embedding.len().min(10);
        println!("Text: {} == {:?}...", texts[i], &embedding[0..n]);
    }

    let dot: f32 = embeddings[0].iter().zip(&embeddings[1]).map(|(a, b)| a * b).sum();
    let norm_a: f32 = embeddings[0].iter().map(|v| v * v).sum::<f32>().sqrt();
    let norm_b: f32 = embeddings[1].iter().map(|v| v * v).sum::<f32>().sqrt();
    let cos_sim = dot / (norm_a * norm_b);

    println!("\nCosine similarity ('{}' vs '{}'): {:.4}", texts[0], texts[1], cos_sim);

    Ok(())
}
```

**Output**
```rust
Text: Hello world == [-0.034439795, 0.030909885, 0.0066969804, 0.02608013, -0.03936993, -0.16037229, 0.06694216, -0.0065279473, -0.0474657, 0.014813968]...
Text: How are you == [-0.031447295, 0.03784213, 0.0761843, 0.045665547, -0.0012263817, -0.07488511, 0.08155286, 0.010209872, -0.11220472, 0.04075747]...

Cosine similarity ('Hello world' vs 'How are you'): 0.3623

```

Cosine similarity has 0.3623 for Rust and Python 0.3624, acceptable with around 99.97% 
accuracy, tiny discrepancy because of floating point rounding differences.