EdgeBERT

A pure Rust + WASM implementation for BERT inference with minimal dependencies

Overview

EdgeBERT is a lightweight Rust implementation of BERT inference focused on native, edge computing and WASM deployment. Run sentence-transformers models anywhere without Python or large ML runtimes.

Status

✅ MiniLM-L6-v2 inference working
✅ WASM support
⚠️ Performance optimization ongoing
🚧 Additional models coming soon

Contributions

All contributions welcome, this is very early stages.

Components

Encoder: Run inference to turn text into embeddings
WordPiece tokenization: A small tokenization implementation based on WordPiece
Cross-Platform (WebAssembly and native)
No Python or C/C++ dependencies except for OpenBLAS for ndarray vectorized matrix operations

Use this if you need:

Embeddings in pure Rust without Python/C++ dependencies
BERT in browsers or edge devices
Offline RAG systems
Small binary size over maximum speed

Don't use this if you need:

Multiple model architectures (currently only MiniLM)
GPU acceleration
Production-grade performance (use ONNX Runtime instead)

Getting Started

1. Native Rust Application

For server-side or desktop applications, you can use the library directly.

Cargo.toml

[dependencies]
edgebert = "0.3.0"
anyhow = "1.0"

main.rs

use anyhow::Result;
use edgebert::{Model, ModelType};
fn main() -> Result<()> {
    let model = Model::from_pretrained(ModelType::MiniLML6V2)?;

    let texts = vec!["Hello world", "How are you"];
    let embeddings = model.encode(texts.clone(), true)?;

    for (i, embedding) in embeddings.iter().enumerate() {
        let n = embedding.len().min(10);
        println!("Text: {} == {:?}...", texts[i], &embedding[0..n]);
    }
    Ok(())
}

Output:

Text: Hello world == [-0.034439795, 0.030909885, 0.0066969804, 0.02608013, -0.03936993, -0.16037229, 0.06694216, -0.0065279473, -0.0474657, 0.014813968]...
Text: How are you == [-0.031447295, 0.03784213, 0.0761843, 0.045665547, -0.0012263817, -0.07488511, 0.08155286, 0.010209872, -0.11220472, 0.04075747]...

You can see the full example under examples/basic.rs - to build and run:

cargo run --release --example basic

2. WebAssembly

# Install wasm-pack
curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh

# Build
./scripts/wasm-build.sh

# Serve
cd examples && npx serve

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>EdgeBERT WASM Test</title>
</head>
<body>
<script type="module">
    import init, { WasmModel, WasmModelType } from './pkg/edgebert.js';

    async function run() {
        await init();

        const model = await WasmModel.from_type(WasmModelType.MiniLML6V2);
        const texts = ["Hello world", "How are you"];
        const embeddings = model.encode(texts, true);

        console.log("First 10 values:", embeddings.slice(0, 10));
    }

    run().catch(console.error);
</script>
</body>
</html>

Output:

First 10 values: Float32Array(10) [-0.034439802169799805, 0.03090989589691162, 0.006696964148432016, 0.02608015574514866, -0.03936990723013878, -0.16037224233150482, 0.06694218516349792, -0.006527911406010389, -0.04746570065617561, 0.014813981018960476, buffer: ArrayBuffer(40), byteLength: 40, byteOffset: 0, length: 10, Symbol(Symbol.toStringTag): 'Float32Array']

You can see the full example under examples/basic.html - to build run scripts/wasm-build.sh and go into examples/ and run a local server, npx serve can serve wasm.

3. Web Workers

You can look at examples/worker.html and examples/worker.js to see how to use web workers and web assembly, the library handles both when window is defined, as with basic.html and also when it is not, web workers.

After compiling the WASM build, if you used the wasm-build.sh it should be inside examples/pkg, use npx serve and open localhost:3000/worker

Clicking on generate embeddings after the model loads generates

Encoding texts: ["Hello world","How are you?"]
Embeddings shape: [2, 384]
'Hello world' vs 'How are you?': 0.305
First embedding norm: 1.000000
First 10 values: [-0.0344, 0.0309, 0.0067, 0.0261, -0.0394, -0.1604, 0.0669, -0.0065, -0.0475, 0.0148]

4. Comparison with PyTorch

Small example from pytorch to encode and show similarity

from sentence_transformers import SentenceTransformer
import torch
import torch.nn.functional as F

texts = ["Hello world", "How are you"]

model = SentenceTransformer('all-MiniLM-L6-v2')

embeddings = model.encode(texts, convert_to_tensor=True)

for i, emb in enumerate(embeddings):
    print(f"Text: {texts[i]}")
    print("First 10 values:", emb[:10].tolist())
    print()

cos_sim = F.cosine_similarity(embeddings[0], embeddings[1], dim=0)
print(f"Cosine similarity ('{texts[0]}' vs '{texts[1]}'):", cos_sim.item())

Output from Python:

Text: Hello world == ['-0.0345', '0.0310', '0.0067', '0.0261', '-0.0394', '-0.1603', '0.0669', '-0.0064', '-0.0475', '0.0148']...
Text: How are you == ['-0.0314', '0.0378', '0.0763', '0.0457', '-0.0012', '-0.0748', '0.0816', '0.0102', '-0.1122', '0.0407']...

Cosine similarity ('Hello world' vs 'How are you'): 0.3624

EdgeBERT

use anyhow::Result;
use edgebert::{Model, ModelType};

fn main() -> Result<()> {
    let model = Model::from_pretrained(ModelType::MiniLML6V2)?;
    let texts = vec!["Hello world", "How are you"];
    let embeddings = model.encode(texts.clone(), true)?;

    for (i, embedding) in embeddings.iter().enumerate() {
        let n = embedding.len().min(10);
        println!("Text: {} == {:?}...", texts[i], &embedding[0..n]);
    }

    let dot: f32 = embeddings[0].iter().zip(&embeddings[1]).map(|(a, b)| a * b).sum();
    let norm_a: f32 = embeddings[0].iter().map(|v| v * v).sum::<f32>().sqrt();
    let norm_b: f32 = embeddings[1].iter().map(|v| v * v).sum::<f32>().sqrt();
    let cos_sim = dot / (norm_a * norm_b);

    println!("\nCosine similarity ('{}' vs '{}'): {:.4}", texts[0], texts[1], cos_sim);

    Ok(())
}

Output

Text: Hello world == [-0.034439795, 0.030909885, 0.0066969804, 0.02608013, -0.03936993, -0.16037229, 0.06694216, -0.0065279473, -0.0474657, 0.014813968]...
Text: How are you == [-0.031447295, 0.03784213, 0.0761843, 0.045665547, -0.0012263817, -0.07488511, 0.08155286, 0.010209872, -0.11220472, 0.04075747]...

Cosine similarity ('Hello world' vs 'How are you'): 0.3623

Cosine similarity has 0.3623 for Rust and Python 0.3624, acceptable with around 99.97% accuracy, tiny discrepancy because of floating point rounding differences.

edgebert 0.3.2