modelc
modelc compiles model weight files into standalone executable binaries that embed a deterministic weight blob (embedded_weights.bin) and expose a small HTTP API (Axum / Tokio).
Why modelc (vs. loading weights from disk)
A weight-file workflow keeps checkpoints on disk (or blob storage): a generic runtime parses the format, memory-maps tensors, then runs inference. That is flexible—swap files without rebuilding—but you always coordinate runner + paths + versioning + mounts.
modelc favors the opposite trade‑off:
- Single deployable artifact. One executable embeds that snapshot’s weights at compile time (
embedded_weights.binin the generated crate). Fewer broken paths and “wrong checkpoint on prod” drift. - Reproducible bundles. Build once; the binary ties together weights, embedded listen address, and metadata (
/info) for a frozen snapshot auditors and CI can pin. - Simpler operations. Serve a minimal HTTP binary without shipping a separate weight tree next to every replica (you still pay image/binary size instead of mounting files).
- Clear boundary. Parsing and tensor layout happen at compile time; runtime is deliberately small (
/inferis still evolving toward real forward passes).
When weight files remain the better fit: rapid A/B swaps without rebuilds, very large checkpoints where embedding blows up images, multitenant “one server, many paths,” or ecosystems that assume on-disk formats (mmap, GGUF loaders, ONNX Runtime with external weights).
See SPEC.md for scope and limits.
Prerequisites
- Rust toolchain with
cargoon yourPATH(the compiler runscargo buildon a generated project).
Build
Produce the modelc binary:
Binary path: ./target/release/modelc (or target/debug/modelc without --release). To put modelc on your PATH:
Test
Usage
Examples use the modelc command. Put it on your PATH with cargo install --path ., or after cargo build --release call the binary by path (see Build).
Inspect weights (tensor names, shapes, dtypes, sizes):
Compile to a standalone binary (default output: <stem>_serve next to the input):
From the built artifact in this repository (no install):
modelc --help and modelc --version show subcommands and semver plus a short git revision (from build.rs when .git is present).
compile networking
The generated model-serve binary binds to --bind (IP, default 0.0.0.0) plus --port (default 8080), unless --listen ADDR:PORT is set, which wins and is embedded verbatim (IPv6 literals such as [::1]:8080 are supported).
compile other flags
--arch— optional hint (llama,gpt2, …) stored in the model and surfaced in/info.--format/-f— weight format when extensions or magic-byte sniffing are ambiguous.--target— passed through tocargo build --target.--debug— builds the generated crate with Cargo’s debug profile instead of--release(release is the default).
Supported input formats (-f when needed):
| Flag value | Typical extensions |
|---|---|
safetensors |
.safetensors |
gguf |
.gguf, .bin with sniff / name heuristics |
onnx |
.onnx |
pytorch |
.pt, .pth, name-heuristic .bin |
Ambiguous files (e.g. extensionless or generic .bin): the CLI may sniff GGUF / zip (PyTorch-ish) / small Safetensors blobs (see SPEC.md).
Generated server HTTP API (model-serve)
| Method | Path | Body / response |
|---|---|---|
GET |
/info |
JSON: name, architecture, total_params, total_bytes, tensors (names). |
POST |
/infer |
Request JSON: { "input": [f32, ...] }. Response: { "output": [f32, ...] } (placeholder passthrough until real graph lowering exists). |
Both responses are application/json.
Format references (parsers / exports)
- Safetensors — huggingface/safetensors
- GGUF — GGML GGUF notes
- ONNX — onnx.ai
- PyTorch checkpoints — pickle/zip layouts; prefer Safetensors exports for portability.
Repository layout
src/— CLI, parsers,ModelIR, codegen, runtime helpers.examples/— usage examples.tests/— integration tests.
See SPEC.md, ARCHITECTURE.md, and TODO.md.
License
Licensed under the Apache License, Version 2.0 (LICENSE).