Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
🚀 Overview
Encoderfile packages transformer encoders—optionally with classification heads—into a single, self-contained executable. No Python runtime, no dependencies, no network calls. Just a fast, portable binary that runs anywhere.
While Llamafile focuses on generative models, Encoderfile is purpose-built for encoder architectures with optional classification heads. It supports embedding, sequence classification, and token classification models—covering most encoder-based NLP tasks, from text similarity to classification and tagging—all within one compact binary.
Under the hood, Encoderfile uses ONNX Runtime for inference, ensuring compatibility with a wide range of transformer architectures.
Why?
- Smaller footprint: a single binary measured in tens-to-hundreds of megabytes, not gigabytes of runtime and packages
- Compliance-friendly: deterministic, offline, security-boundary-safe
- Integration-ready: drop into existing systems as a CLI, microservice, or API without refactoring your stack
Encoderfiles can run as:
- REST API
- gRPC microservice
- CLI
- (Future) MCP server
- (Future) FFI support for near-universal cross-language embedding
Supported Architectures
Encoderfile supports the following Hugging Face model classes (and their ONNX-exported equivalents):
| Task | Supported classes | Examples models |
|---|---|---|
| Embeddings / Feature Extraction | AutoModel, AutoModelForMaskedLM |
bert-base-uncased, distilbert-base-uncased |
| Sequence Classification | AutoModelForSequenceClassification |
distilbert-base-uncased-finetuned-sst-2-english, roberta-large-mnli |
| Token Classification | AutoModelForTokenClassification |
dslim/bert-base-NER, bert-base-cased-finetuned-conll03-english |
- ✅ All architectures must be encoder-only transformers — no decoders, no encoder–decoder hybrids (so no T5, no BART).
- ⚙️ Models must have ONNX-exported weights (
path/to/your/model/model.onnx). - 🧠 The ONNX graph input must include
input_idsand optionallyattention_mask. - 🚫 Models relying on generation heads (AutoModelForSeq2SeqLM, AutoModelForCausalLM, etc.) are not supported.
Gotchas
XLNet,Transfomer XL, and derivative architectures are not yet supported.
🧰 Setup
Prerequisites:
To set up your dev environment, run the following:
This will install Rust dependencies, create a virtual environment, and download model weights for integration tests (these will show up in models/).
🏗️ Building an Encoderfile
Prepare your Model
To create an Encoderfile, you must have a HuggingFace model downloaded in an accessible directory. The model directory must have exported ONNX weights.
Export a Model
Task types: See HuggingFace task guide for available tasks (feature-extraction, text-classification, token-classification, etc.)
Use a pre-exported model
Some models on HuggingFace already have ONNX weights in their repos.
Your model directory should look like this:
my_model/
├── config.json
├── model.onnx
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
└── vocab.txt
Build the binary
Run REST Server
Your final binary is target/release/encoderfile. To run it as a server:
Default port: 8080 (override with --http-port)
chmod +x target/release/encoderfile
./target/release/encoderfile serve
REST API Usage
Embeddings
Extracts token-level embeddings
Sequence Classification / Token Classification
Returns predictions and logits.
🔧 Walkthrough Example - Sequence Classification
Let's use encoderfile to perform sentiment analysis on a few input strings
We'll work with distilbert-base-uncased-finetuned-sst-2-english, which is a fine-tuned version of the DistilBERT model.
Export Model to ONNX
Build Encoderfile
Start Server
Use --http-port parameter to start the REST server on a specific port
Analyze Sentiment
Expected Output