modelc 0.1.0

Compile model weight files to standalone executable binaries
Documentation

modelc

modelc compiles model weight files into standalone executable binaries that embed a deterministic weight blob (embedded_weights.bin) and expose a small HTTP API (Axum / Tokio).

Why modelc (vs. loading weights from disk)

A weight-file workflow keeps checkpoints on disk (or blob storage): a generic runtime parses the format, memory-maps tensors, then runs inference. That is flexible—swap files without rebuilding—but you always coordinate runner + paths + versioning + mounts.

modelc favors the opposite trade‑off:

  • Single deployable artifact. One executable embeds that snapshot’s weights at compile time (embedded_weights.bin in the generated crate). Fewer broken paths and “wrong checkpoint on prod” drift.
  • Reproducible bundles. Build once; the binary ties together weights, embedded listen address, and metadata (/info) for a frozen snapshot auditors and CI can pin.
  • Simpler operations. Serve a minimal HTTP binary without shipping a separate weight tree next to every replica (you still pay image/binary size instead of mounting files).
  • Clear boundary. Parsing and tensor layout happen at compile time; runtime is deliberately small (/infer is still evolving toward real forward passes).

When weight files remain the better fit: rapid A/B swaps without rebuilds, very large checkpoints where embedding blows up images, multitenant “one server, many paths,” or ecosystems that assume on-disk formats (mmap, GGUF loaders, ONNX Runtime with external weights).

See SPEC.md for scope and limits.

Prerequisites

  • Rust toolchain with cargo on your PATH (the compiler runs cargo build on a generated project).

Build

Produce the modelc binary:

cargo build --release

Binary path: ./target/release/modelc (or target/debug/modelc without --release). To put modelc on your PATH:

cargo install --path .

Test

cargo test

Usage

Examples use the modelc command. Put it on your PATH with cargo install --path ., or after cargo build --release call the binary by path (see Build).

Inspect weights (tensor names, shapes, dtypes, sizes):

modelc inspect path/to/model.safetensors
modelc inspect path/to/file -f gguf

Compile to a standalone binary (default output: <stem>_serve next to the input):

modelc compile path/to/model.safetensors -o ./my-model-serve

From the built artifact in this repository (no install):

./target/release/modelc inspect path/to/model.safetensors
./target/release/modelc compile path/to/model.safetensors -o ./my-model-serve

modelc --help and modelc --version show subcommands and semver plus a short git revision (from build.rs when .git is present).

compile networking

The generated model-serve binary binds to --bind (IP, default 0.0.0.0) plus --port (default 8080), unless --listen ADDR:PORT is set, which wins and is embedded verbatim (IPv6 literals such as [::1]:8080 are supported).

compile other flags

  • --arch — optional hint (llama, gpt2, …) stored in the model and surfaced in /info.
  • --format / -f — weight format when extensions or magic-byte sniffing are ambiguous.
  • --target — passed through to cargo build --target.
  • --debug — builds the generated crate with Cargo’s debug profile instead of --release (release is the default).

Supported input formats (-f when needed):

Flag value Typical extensions
safetensors .safetensors
gguf .gguf, .bin with sniff / name heuristics
onnx .onnx
pytorch .pt, .pth, name-heuristic .bin

Ambiguous files (e.g. extensionless or generic .bin): the CLI may sniff GGUF / zip (PyTorch-ish) / small Safetensors blobs (see SPEC.md).

Generated server HTTP API (model-serve)

Method Path Body / response
GET /info JSON: name, architecture, total_params, total_bytes, tensors (names).
POST /infer Request JSON: { "input": [f32, ...] }. Response: { "output": [f32, ...] } (placeholder passthrough until real graph lowering exists).

Both responses are application/json.

Format references (parsers / exports)

Repository layout

  • src/ — CLI, parsers, Model IR, codegen, runtime helpers.
  • examples/ — usage examples.
  • tests/ — integration tests.

See SPEC.md, ARCHITECTURE.md, and TODO.md.

License

Licensed under the Apache License, Version 2.0 (LICENSE).