<p align="center">
<picture>
<img src="https://github.com/user-attachments/assets/3916a870-378a-4bad-b819-04fd3c92040a" width="50%" alt="Project logo"/>
</picture>
</p>
<p align="center">
<a href="https://github.com/mozilla-ai/encoderfile/actions/workflows/pre-commit.yml">
<img src="https://github.com/mozilla-ai/encoderfile/actions/workflows/pre-commit.yml/badge.svg" />
</a>
<a href="https://github.com/mozilla-ai/encoderfile/actions/workflows/ci.yml">
<img src="https://github.com/mozilla-ai/encoderfile/actions/workflows/ci.yml/badge.svg" />
</a>
<a href="https://github.com/mozilla-ai/encoderfile/actions/workflows/docs.yml">
<img src="https://github.com/mozilla-ai/encoderfile/actions/workflows/docs.yml/badge.svg" />
</a>
</p>
<p align="center">
<a href="https://discord.com/invite/KTA26kGRyv">
<img src="https://img.shields.io/discord/1089876418936180786" />
</a>
<a href="https://codspeed.io/mozilla-ai/encoderfile?utm_source=badge">
<img src="https://img.shields.io/endpoint?url=https://codspeed.io/badge.json" />
</a>
<a href="https://codecov.io/gh/mozilla-ai/encoderfile">
<img src="https://codecov.io/gh/mozilla-ai/encoderfile/graph/badge.svg?token=45KUDEYD8Z" />
</a>
</p>
## 🚀 Overview
Encoderfile packages transformer encoders—optionally with classification heads—into a single, self-contained executable.
No Python runtime, no dependencies, no network calls. Just a fast, portable binary that runs anywhere.
While Llamafile focuses on generative models, Encoderfile is purpose-built for encoder architectures with optional classification heads. It supports embedding, sequence classification, and token classification models—covering most encoder-based NLP tasks, from text similarity to classification and tagging—all within one compact binary.
Under the hood, Encoderfile uses ONNX Runtime for inference, ensuring compatibility with a wide range of transformer architectures.
**Why?**
- **Smaller footprint:** a single binary measured in tens-to-hundreds of megabytes, not gigabytes of runtime and packages
- **Compliance-friendly:** deterministic, offline, security-boundary-safe
- **Integration-ready:** drop into existing systems as a CLI, microservice, or API without refactoring your stack
Encoderfiles can run as:
- REST API
- gRPC microservice
- CLI for batch processing
- MCP server (Model Context Protocol)
<p align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="docs/assets/encoderfile-dark.svg">
<source media="(prefers-color-scheme: light)" srcset="docs/assets/encoderfile-light.svg">
<img alt="Architecture Diagram" src="docs/assets/encoderfile-light.svg" width="80%">
</picture>
</p>
### Supported Architectures
Encoderfile supports the following Hugging Face model classes (and their ONNX-exported equivalents):
| **Embeddings / Feature Extraction** | `AutoModel`, `AutoModelForMaskedLM` | `bert-base-uncased`, `distilbert-base-uncased` |
| **Sequence Classification** | `AutoModelForSequenceClassification` | `distilbert-base-uncased-finetuned-sst-2-english`, `roberta-large-mnli` |
| **Token Classification** | `AutoModelForTokenClassification` | `dslim/bert-base-NER`, `bert-base-cased-finetuned-conll03-english` |
- ✅ All architectures must be encoder-only transformers — no decoders, no encoder–decoder hybrids (so no T5, no BART).
- ⚙️ Models must have ONNX-exported weights (`path/to/your/model/model.onnx`).
- đź§ The ONNX graph input must include `input_ids` and optionally `attention_mask`.
- đźš« Models relying on generation heads (AutoModelForSeq2SeqLM, AutoModelForCausalLM, etc.) are not supported.
- `XLNet`, `Transformer XL`, and derivative architectures are not yet supported.
## 📦 Installation
### Option 1: Download Pre-built CLI Tool (Recommended)
Download the encoderfile CLI tool to build your own model binaries:
```bash
> **Note for Windows users:** Pre-built binaries are not available for Windows. Please see our guide on [building from source](https://mozilla-ai.github.io/encoderfile/reference/building/) for instructions on building from source.
Move the binary to a location in your PATH:
```bash
# Linux/macOS
sudo mv encoderfile /usr/local/bin/
# Or add to your user bin
mkdir -p ~/.local/bin
mv encoderfile ~/.local/bin/
```
### Option 2: Build CLI Tool from Source
See our guide on [building from source](https://mozilla-ai.github.io/encoderfile/reference/building/) for detailed instructions on building the CLI tool from source.
Quick build:
```bash
cargo build --bin encoderfile --release
./target/release/encoderfile --help
```
## 🚀 Quick Start
### Step 1: Prepare Your Model
First, you need an ONNX-exported model. Export any HuggingFace model:
> Requires Python 3.13+ for ONNX export
```bash
# Install optimum for ONNX export
pip install optimum[onnx]
# Export a sentiment analysis model
optimum-cli export onnx \
--model distilbert-base-uncased-finetuned-sst-2-english \
--task text-classification \
./sentiment-model
```
### Step 2: Create Configuration File
Create `sentiment-config.yml`:
```yaml
encoderfile:
name: sentiment-analyzer
path: ./sentiment-model
model_type: sequence_classification
output_path: ./build/sentiment-analyzer.encoderfile
```
### Step 3: Build Your Encoderfile
Use the downloaded `encoderfile` CLI tool:
```bash
encoderfile build -f sentiment-config.yml
```
This creates a self-contained binary at `./build/sentiment-analyzer.encoderfile`.
### Step 4: Run Your Model
Start the server:
```bash
./build/sentiment-analyzer.encoderfile serve
```
The server will start on `http://localhost:8080` by default.
### Making Predictions
**Sentiment Analysis:**
```bash
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{
"inputs": [
"This is the cutest cat ever!",
"Boring video, waste of time",
"These cats are so funny!"
]
}'
```
**Response:**
```json
{
"results": [
{
"logits": [0.00021549065, 0.9997845],
"scores": [0.00021549074, 0.9997845],
"predicted_index": 1,
"predicted_label": "POSITIVE"
},
{
"logits": [0.9998148, 0.00018516644],
"scores": [0.9998148, 0.0001851664],
"predicted_index": 0,
"predicted_label": "NEGATIVE"
},
{
"logits": [0.00014975034, 0.9998503],
"scores": [0.00014975043, 0.9998503],
"predicted_index": 1,
"predicted_label": "POSITIVE"
}
],
"model_id": "sentiment-analyzer"
}
```
**Embeddings:**
```bash
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{
"inputs": ["Hello world"],
"normalize": true
}'
```
**Token Classification (NER):**
```bash
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{
"inputs": ["Apple Inc. is located in Cupertino, California"]
}'
```
## 🎯 Usage Modes
| REST API | `./my-model.encoderfile serve` | `http://localhost:8080` |
| gRPC | `./my-model.encoderfile serve` | `localhost:50051` |
| CLI | `./my-model.encoderfile infer "text"` | stdout |
| MCP Server | `./my-model.encoderfile mcp` | — |
Both HTTP and gRPC servers start by default. Use `--disable-grpc` or `--disable-http` to run only one.
See the **[CLI Reference](https://mozilla-ai.github.io/encoderfile/latest/reference/cli/)** for all server options, port configuration, and output formats.
## 📚 Documentation
- **[Getting Started Guide](https://mozilla-ai.github.io/encoderfile/getting-started/)** - Step-by-step tutorial
- **[Building Guide](https://mozilla-ai.github.io/encoderfile/reference/building/)** - Build encoderfiles from ONNX models
- **[CLI Reference](https://mozilla-ai.github.io/encoderfile/reference/cli/)** - Complete command-line documentation
- **[API Reference](https://mozilla-ai.github.io/encoderfile/reference/api-reference/)** - REST, gRPC, and MCP API docs
## 🛠️ Building Custom Encoderfiles
Once you have the `encoderfile` CLI tool installed, you can build binaries from any compatible HuggingFace model.
See our guide on [building from source](https://mozilla-ai.github.io/encoderfile/reference/building/) for detailed instructions including:
- How to export models to ONNX format
- Configuration file options
- Advanced features (Lua transforms, custom paths, etc.)
- Troubleshooting tips
**Quick workflow:**
1. Export your model to ONNX: `optimum-cli export onnx ...`
2. Create a config file: `config.yml`
3. Build the binary: `encoderfile build -f config.yml`
4. Deploy anywhere: `./build/my-model.encoderfile serve`
See our guide on [building from source](https://mozilla-ai.github.io/encoderfile/reference/building/) for detailed instructions.
## 🤝 Contributing
We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
### Development Setup
Make sure you have [Just](https://github.com/casey/just) installed.
```bash
# Clone the repository
git clone https://github.com/mozilla-ai/encoderfile.git
cd encoderfile
# Set up development environment
just setup
# Run tests
just test
# Build documentation
just docs
```
## đź“„ License
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
## 🙏 Acknowledgments
- Built with [ONNX Runtime](https://onnxruntime.ai/)
- Inspired by [Llamafile](https://github.com/Mozilla-Ocho/llamafile)
- Powered by the Hugging Face model ecosystem
## đź’¬ Community
- [Discord](https://discord.com/invite/KTA26kGRyv) - Join our community
- [GitHub Issues](https://github.com/mozilla-ai/encoderfile/issues) - Report bugs or request features
- [GitHub Discussions](https://github.com/mozilla-ai/encoderfile/discussions) - Ask questions and share ideas