modelc

modelc compiles model weight files into standalone executable binaries that embed a deterministic weight blob (embedded_weights.bin) and expose a small HTTP API (Axum / Tokio).

Why modelc (vs. loading weights from disk)

A weight-file workflow keeps checkpoints on disk (or blob storage): a generic runtime parses the format, memory-maps tensors, then runs inference. That is flexible—swap files without rebuilding—but you always coordinate runner + paths + versioning + mounts.

modelc favors the opposite trade‑off:

Single deployable artifact. One executable embeds that snapshot’s weights at compile time (embedded_weights.bin in the generated crate). Fewer broken paths and “wrong checkpoint on prod” drift.
Reproducible bundles. Build once; the binary ties together weights, embedded listen address, and metadata (/info) for a frozen snapshot auditors and CI can pin.
Simpler operations. Serve a minimal HTTP binary without shipping a separate weight tree next to every replica (you still pay image/binary size instead of mounting files).
Clear boundary. Parsing and tensor layout happen at compile time; runtime is deliberately small (/infer is still evolving toward real forward passes).

When weight files remain the better fit: rapid A/B swaps without rebuilds, very large checkpoints where embedding blows up images, multitenant “one server, many paths,” or ecosystems that assume on-disk formats (mmap, GGUF loaders, ONNX Runtime with external weights).

See SPEC.md for scope and limits.

Prerequisites

Rust toolchain with cargo on your PATH (the compiler runs cargo build on a generated project).

Build

Produce the modelc binary:

cargo build --release

Binary path: ./target/release/modelc (or target/debug/modelc without --release). To put modelc on your PATH:

cargo install --path .

Test

cargo test

Usage

Examples use the modelc command. Put it on your PATH with cargo install --path ., or after cargo build --release call the binary by path (see Build).

Inspect weights (tensor names, shapes, dtypes, sizes):

modelc inspect path/to/model.safetensors
modelc inspect path/to/file -f gguf

Compile to a standalone binary (default output: <stem>_serve next to the input):

modelc compile path/to/model.safetensors -o ./my-model-serve

From the built artifact in this repository (no install):

./target/release/modelc inspect path/to/model.safetensors
./target/release/modelc compile path/to/model.safetensors -o ./my-model-serve

modelc --help and modelc --version show subcommands and semver plus a short git revision (from build.rs when .git is present).

`compile` networking

The generated model-serve binary binds to --bind (IP, default 0.0.0.0) plus --port (default 8080), unless --listen ADDR:PORT is set, which wins and is embedded verbatim (IPv6 literals such as [::1]:8080 are supported).

`compile` other flags

--arch — optional hint (llama, gpt2, …) stored in the model and surfaced in /info.
--format / -f — weight format when extensions or magic-byte sniffing are ambiguous.
--target — passed through to cargo build --target.
--debug — builds the generated crate with Cargo’s debug profile instead of --release (release is the default).

Supported input formats (-f when needed):

Flag value	Typical extensions
`safetensors`	`.safetensors`
`gguf`	`.gguf`, `.bin` with sniff / name heuristics
`onnx`	`.onnx`
`pytorch`	`.pt`, `.pth`, name-heuristic `.bin`

Ambiguous files (e.g. extensionless or generic .bin): the CLI may sniff GGUF / zip (PyTorch-ish) / small Safetensors blobs (see SPEC.md).

Generated server HTTP API (`model-serve`)

Method	Path	Body / response
`GET`	`/info`	JSON: `name`, `architecture`, `total_params`, `total_bytes`, `tensors` (names).
`POST`	`/infer`	Request JSON: `{ "input": [f32, ...] }`. Response: `{ "output": [f32, ...] }` (placeholder passthrough until real graph lowering exists).

Both responses are application/json.

Format references (parsers / exports)

Safetensors — huggingface/safetensors
GGUF — GGML GGUF notes
ONNX — onnx.ai
PyTorch checkpoints — pickle/zip layouts; prefer Safetensors exports for portability.

Repository layout

src/ — CLI, parsers, Model IR, codegen, runtime helpers.
examples/ — usage examples.
tests/ — integration tests.

See SPEC.md, ARCHITECTURE.md, and TODO.md.

License

Licensed under the Apache License, Version 2.0 (LICENSE).

modelc 0.1.0