🚀 Overview
Encoderfile packages transformer encoders—optionally with classification heads—into a single, self-contained executable. No Python runtime, no dependencies, no network calls. Just a fast, portable binary that runs anywhere.
While Llamafile focuses on generative models, Encoderfile is purpose-built for encoder architectures with optional classification heads. It supports embedding, sequence classification, and token classification models—covering most encoder-based NLP tasks, from text similarity to classification and tagging—all within one compact binary.
Under the hood, Encoderfile uses ONNX Runtime for inference, ensuring compatibility with a wide range of transformer architectures.
Why?
- Smaller footprint: a single binary measured in tens-to-hundreds of megabytes, not gigabytes of runtime and packages
- Compliance-friendly: deterministic, offline, security-boundary-safe
- Integration-ready: drop into existing systems as a CLI, microservice, or API without refactoring your stack
Encoderfiles can run as:
- REST API
- gRPC microservice
- CLI for batch processing
- MCP server (Model Context Protocol)
flowchart LR
%% Styling
classDef asset fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#000;
classDef tool fill:#fff8e1,stroke:#ff6f00,stroke-width:2px,stroke-dasharray: 5 5,color:#000;
classDef process fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#000;
classDef artifact fill:#f5f5f5,stroke:#616161,stroke-width:2px,color:#000;
classDef service fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px,color:#000;
classDef client fill:#e3f2fd,stroke:#0277bd,stroke-width:2px,stroke-dasharray: 5 5,color:#000;
subgraph Inputs ["1. Input Assets"]
direction TB
Onnx["ONNX Model<br/>(.onnx)"]:::asset
Tok["Tokenizer Data<br/>(tokenizer.json)"]:::asset
Config["Runtime Config<br/>(config.yml)"]:::asset
end
style Inputs fill:#e3f2fd,stroke:#0277bd,stroke-width:2px,stroke-dasharray: 5 5,color:#01579b
subgraph Compile ["2. Compile Phase"]
Compiler["Encoderfile Compiler<br/>(CLI Tool)"]:::asset
end
style Compile fill:#e3f2fd,stroke:#0277bd,stroke-width:2px,stroke-dasharray: 5 5,color:#01579b
subgraph Build ["3. Build Phase"]
direction TB
Builder["Wrapper Process<br/>(Embeds Assets + Runtime)"]:::process
end
style Build fill:#fff8e1,stroke:#ff8f00,stroke-width:2px,color:#e65100
subgraph Output ["4. Artifact"]
Binary["Single Binary Executable<br/>(Static File)"]:::artifact
end
style Output fill:#fafafa,stroke:#546e7a,stroke-width:2px,stroke-dasharray: 5 5,color:#546e7a
subgraph Runtime ["5. Runtime Phase"]
direction TB
%% Added fa:fa-server icons
Grpc["fa:fa-server gRPC Server<br/>(Protobuf)"]:::service
Http["fa:fa-server HTTP Server<br/>(JSON)"]:::service
MCP["fa:fa-server MCP Server<br/>(MCP)"]:::service
%% Added fa:fa-cloud icon
Client["fa:fa-cloud Client Apps /<br/>MCP Agent"]:::client
end
style Runtime fill:#f1f8e9,stroke:#2e7d32,stroke-width:2px,color:#1b5e20
%% Connections
Onnx & Tok & Config --> Builder
Compiler -.->|"Orchestrates"| Builder
Builder -->|"Outputs"| Binary
%% Runtime Connections
Binary -.->|"Executes"| Grpc
Binary -.->|"Executes"| Http
Grpc & Http & MCP-->|"Responds to"| Client
Supported Architectures
Encoderfile supports the following Hugging Face model classes (and their ONNX-exported equivalents):
| Task | Supported classes | Examples models |
|---|---|---|
| Embeddings / Feature Extraction | AutoModel, AutoModelForMaskedLM |
bert-base-uncased, distilbert-base-uncased |
| Sequence Classification | AutoModelForSequenceClassification |
distilbert-base-uncased-finetuned-sst-2-english, roberta-large-mnli |
| Token Classification | AutoModelForTokenClassification |
dslim/bert-base-NER, bert-base-cased-finetuned-conll03-english |
- ✅ All architectures must be encoder-only transformers — no decoders, no encoder–decoder hybrids (so no T5, no BART).
- ⚙️ Models must have ONNX-exported weights (
path/to/your/model/model.onnx). - 🧠 The ONNX graph input must include
input_idsand optionallyattention_mask. - 🚫 Models relying on generation heads (AutoModelForSeq2SeqLM, AutoModelForCausalLM, etc.) are not supported.
XLNet,Transfomer XL, and derivative architectures are not yet supported.
📦 Installation
Option 1: Download Pre-built CLI Tool (Recommended)
Download the encoderfile CLI tool to build your own model binaries:
|
Note for Windows users: Pre-built binaries are not available for Windows. Please see our guide on building from source for instructions on building from source.
Move the binary to a location in your PATH:
# Linux/macOS
# Or add to your user bin
Option 2: Build CLI Tool from Source
See our guide on building from source for detailed instructions on building the CLI tool from source.
Quick build:
🚀 Quick Start
Step 1: Prepare Your Model
First, you need an ONNX-exported model. Export any HuggingFace model:
# Install optimum for ONNX export
# Export a sentiment analysis model
Step 2: Create Configuration File
Create sentiment-config.yml:
encoderfile:
name: sentiment-analyzer
path: ./sentiment-model
model_type: sequence_classification
output_path: ./build/sentiment-analyzer.encoderfile
Step 3: Build Your Encoderfile
Use the downloaded encoderfile CLI tool:
This creates a self-contained binary at ./build/sentiment-analyzer.encoderfile.
Step 4: Run Your Model
Start the server:
The server will start on http://localhost:8080 by default.
Making Predictions
Sentiment Analysis:
Response:
Embeddings:
Token Classification (NER):
🎯 Usage Modes
1. REST API Server
Start an HTTP server (default port 8080):
Custom configuration:
Disable gRPC (HTTP only):
2. gRPC Server
Start with default gRPC server (port 50051):
gRPC only (no HTTP):
Custom gRPC configuration:
3. CLI Inference
Run one-off inference without starting a server:
# Single input
# Multiple inputs
# Save output to file
4. MCP Server
Run as a Model Context Protocol server:
🔧 Server Configuration
Port Configuration
# Custom HTTP port
# Custom gRPC port
# Both
Hostname Configuration
Service Selection
# HTTP only
# gRPC only
📚 Documentation
- Getting Started Guide - Step-by-step tutorial
- Building Guide - Build encoderfiles from ONNX models
- CLI Reference - Complete command-line documentation
- API Reference - REST, gRPC, and MCP API docs
🛠️ Building Custom Encoderfiles
Once you have the encoderfile CLI tool installed, you can build binaries from any compatible HuggingFace model.
See our guide on building from source for detailed instructions including:
- How to export models to ONNX format
- Configuration file options
- Advanced features (Lua transforms, custom paths, etc.)
- Troubleshooting tips
Quick workflow:
- Export your model to ONNX:
optimum-cli export onnx ... - Create a config file:
config.yml - Build the binary:
encoderfile build -f config.yml - Deploy anywhere:
./build/my-model.encoderfile serve
🤝 Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
Development Setup
# Clone the repository
# Set up development environment
# Run tests
# Build documentation - Check command with Raz
📄 License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
🙏 Acknowledgments
- Built with ONNX Runtime
- Inspired by Llamafile
- Powered by the Hugging Face model ecosystem
💬 Community
- Discord - Join our community
- GitHub Issues - Report bugs or request features
- GitHub Discussions - Ask questions and share ideas