🚀 Overview
Encoderfile packages transformer encoders—optionally with classification heads—into a single, self-contained executable. No Python runtime, no dependencies, no network calls. Just a fast, portable binary that runs anywhere.
While Llamafile focuses on generative models, Encoderfile is purpose-built for encoder architectures with optional classification heads. It supports embedding, sequence classification, and token classification models—covering most encoder-based NLP tasks, from text similarity to classification and tagging—all within one compact binary.
Under the hood, Encoderfile uses ONNX Runtime for inference, ensuring compatibility with a wide range of transformer architectures.
Why?
- Smaller footprint: a single binary measured in tens-to-hundreds of megabytes, not gigabytes of runtime and packages
- Compliance-friendly: deterministic, offline, security-boundary-safe
- Integration-ready: drop into existing systems as a CLI, microservice, or API without refactoring your stack
Encoderfiles can run as:
- REST API
- gRPC microservice
- CLI for batch processing
- MCP server (Model Context Protocol)
flowchart LR
%% Styling
classDef asset fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#000;
classDef tool fill:#fff8e1,stroke:#ff6f00,stroke-width:2px,stroke-dasharray: 5 5,color:#000;
classDef process fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#000;
classDef artifact fill:#f5f5f5,stroke:#616161,stroke-width:2px,color:#000;
classDef service fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px,color:#000;
classDef client fill:#e3f2fd,stroke:#0277bd,stroke-width:2px,stroke-dasharray: 5 5,color:#000;
subgraph Inputs ["1. Input Assets"]
direction TB
Onnx["ONNX Model<br/>(.onnx)"]:::asset
Tok["Tokenizer Data<br/>(tokenizer.json)"]:::asset
Config["Runtime Config<br/>(config.yml)"]:::asset
end
style Inputs fill:#e3f2fd,stroke:#0277bd,stroke-width:2px,stroke-dasharray: 5 5,color:#01579b
subgraph Compile ["2. Compile Phase"]
Compiler["Encoderfile Compiler<br/>(CLI Tool)"]:::asset
end
style Compile fill:#e3f2fd,stroke:#0277bd,stroke-width:2px,stroke-dasharray: 5 5,color:#01579b
subgraph Build ["3. Build Phase"]
direction TB
Builder["Wrapper Process<br/>(Embeds Assets + Runtime)"]:::process
end
style Build fill:#fff8e1,stroke:#ff8f00,stroke-width:2px,color:#e65100
subgraph Output ["4. Artifact"]
Binary["Single Binary Executable<br/>(Static File)"]:::artifact
end
style Output fill:#fafafa,stroke:#546e7a,stroke-width:2px,stroke-dasharray: 5 5,color:#546e7a
subgraph Runtime ["5. Runtime Phase"]
direction TB
%% Added fa:fa-server icons
Grpc["fa:fa-server gRPC Server<br/>(Protobuf)"]:::service
Http["fa:fa-server HTTP Server<br/>(JSON)"]:::service
MCP["fa:fa-server MCP Server<br/>(MCP)"]:::service
%% Added fa:fa-cloud icon
Client["fa:fa-cloud Client Apps /<br/>MCP Agent"]:::client
end
style Runtime fill:#f1f8e9,stroke:#2e7d32,stroke-width:2px,color:#1b5e20
%% Connections
Onnx & Tok & Config --> Builder
Compiler -.->|"Orchestrates"| Builder
Builder -->|"Outputs"| Binary
%% Runtime Connections
Binary -.->|"Executes"| Grpc
Binary -.->|"Executes"| Http
Grpc & Http & MCP-->|"Responds to"| Client
Supported Architectures
Encoderfile supports the following Hugging Face model classes (and their ONNX-exported equivalents):
| Task | Supported classes | Examples models |
|---|---|---|
| Embeddings / Feature Extraction | AutoModel, AutoModelForMaskedLM |
bert-base-uncased, distilbert-base-uncased |
| Sequence Classification | AutoModelForSequenceClassification |
distilbert-base-uncased-finetuned-sst-2-english, roberta-large-mnli |
| Token Classification | AutoModelForTokenClassification |
dslim/bert-base-NER, bert-base-cased-finetuned-conll03-english |
- ✅ All architectures must be encoder-only transformers — no decoders, no encoder–decoder hybrids (so no T5, no BART).
- ⚙️ Models must have ONNX-exported weights (
path/to/your/model/model.onnx). - 🧠 The ONNX graph input must include
input_idsand optionallyattention_mask. - 🚫 Models relying on generation heads (AutoModelForSeq2SeqLM, AutoModelForCausalLM, etc.) are not supported.
XLNet,Transfomer XL, and derivative architectures are not yet supported.
📦 Installation
Option 1: Download Pre-built CLI Tool (Recommended)
Download the encoderfile CLI tool to build your own model binaries:
|
Note for Windows users: Pre-built binaries are not available for Windows. Please see BUILDING.md for instructions on building from source.
Move the binary to a location in your PATH:
# Linux/macOS
# Or add to your user bin
Option 2: Build CLI Tool from Source
See BUILDING.md for detailed instructions on building the CLI tool from source.
Quick build:
🚀 Quick Start
Step 1: Prepare Your Model
First, you need an ONNX-exported model. Export any HuggingFace model:
# Install optimum for ONNX export
# Export a sentiment analysis model
Step 2: Create Configuration File
Create sentiment-config.yml:
encoderfile:
name: sentiment-analyzer
path: ./sentiment-model
model_type: sequence_classification
output_path: ./build/sentiment-analyzer.encoderfile
Step 3: Build Your Encoderfile
Use the downloaded encoderfile CLI tool:
This creates a self-contained binary at ./build/sentiment-analyzer.encoderfile.
Step 4: Run Your Model
Start the server:
The server will start on http://localhost:8080 by default.
Making Predictions
Sentiment Analysis:
Response:
Embeddings:
Token Classification (NER):
🎯 Usage Modes
1. REST API Server
Start an HTTP server (default port 8080):
Custom configuration:
Disable gRPC (HTTP only):
2. gRPC Server
Start with default gRPC server (port 50051):
gRPC only (no HTTP):
Custom gRPC configuration:
3. CLI Inference
Run one-off inference without starting a server:
# Single input
# Multiple inputs
# Save output to file
4. MCP Server
Run as a Model Context Protocol server:
🔧 Server Configuration
Port Configuration
# Custom HTTP port
# Custom gRPC port
# Both
Hostname Configuration
Service Selection
# HTTP only
# gRPC only
📚 Documentation
- Getting Started Guide - Step-by-step tutorial
- Building Guide - Build encoderfiles from ONNX models
- CLI Reference - Complete command-line documentation
- API Reference - REST, gRPC, and MCP API docs
🛠️ Building Custom Encoderfiles
Once you have the encoderfile CLI tool installed, you can build binaries from any compatible HuggingFace model.
See BUILDING.md for detailed instructions including:
- How to export models to ONNX format
- Configuration file options
- Advanced features (Lua transforms, custom paths, etc.)
- Troubleshooting tips
Quick workflow:
- Export your model to ONNX:
optimum-cli export onnx ... - Create a config file:
config.yml - Build the binary:
encoderfile build -f config.yml - Deploy anywhere:
./build/my-model.encoderfile serve
🤝 Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
Development Setup
# Clone the repository
# Set up development environment
# Run tests
# Build documentation - Check command with Raz
📄 License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
🙏 Acknowledgments
- Built with ONNX Runtime
- Inspired by Llamafile
- Powered by the Hugging Face model ecosystem
💬 Community
- Discord - Join our community
- GitHub Issues - Report bugs or request features
- GitHub Discussions - Ask questions and share ideas