neuronbox-runtime 1.0.1

Local ML runner: declarative neuron.yaml, model store, daemon, and terminal dashboard
Documentation

NeuronBox

Build, run, and iterate on local AI workloads (training, fine-tuning, inference, benchmarks) with one workflow.

Website: neuronbox.dev

Describe your project once in neuron.yaml:

  • where model weights live (HF id, local folder, or file)
  • which Python stack to use
  • GPU expectations
  • which script to run

NeuronBox then handles the rest: reusable hashed virtualenvs, model store, environment wiring, and runtime visibility through neuron stats and neuron dashboard.

For stronger isolation, use neuron run --oci with runtime.mode: oci (Docker path only).

neuron opens a short getting-started screen, and neuron help lists all commands.

Scope: NeuronBox is a local-first stack: CLI, neurond, Unix-socket protocol, terminal dashboard, and shared model store. It is not a hosted multi-tenant cloud.

License: GNU Affero General Public License v3 (open source). SPDX identifier in manifests: AGPL-3.0-only. If you cannot meet AGPL obligations (e.g. closed-source SaaS), you need a commercial license — see LICENSING.md and contact neuronbox.contact@proton.me.


Contents


Quick start (60 seconds)

# 1) Build
cargo build -p neuronbox-cli -p neuronbox-runtime --bin neurond

# 2) Create a project
mkdir ~/my-llm-project && cd ~/my-llm-project
/path/to/neuronbox/target/debug/neuron init --template inference

# 3) Pull weights
/path/to/neuronbox/target/debug/neuron pull org/model

# 4) Run
/path/to/neuronbox/target/debug/neuron run

# 5) Observe
/path/to/neuronbox/target/debug/neuron dashboard

You can pin exact model revisions with neuron pull org/model --revision <sha-or-tag>.


Why NeuronBox (at a glance)

  1. Declare the job, not the plumbing: gpu.min_vram, runtime.packages, model.source, and entrypoint live in neuron.yaml instead of ad-hoc CUDA matrices and one-off volume maps.

  2. Model store, not a 50 GB image layer: weights are first-class artifacts in ~/.neuronbox/store, shared across projects, with paths exposed via NEURONBOX_MODEL_DIR (and related vars). neuron pull fetches Hugging Face–style org/model trees into that store (see neuron pull --help for aliases and local paths).

  3. Hot-swap for iteration: neuron swap updates daemon state and ~/.neuronbox/swap_signal.json; neuron serve runs a long-lived Python worker that can react without cold-starting your whole stack for every weight change.

  4. One view of the machine: neuron host inspect and neuron gpu list summarize Metal, ROCm, CUDA, and optional NVML so laptops and Linux servers share one mental model.


Tutorial: end-to-end

1. Build the binaries

From your clone:

cd neuronbox
cargo build -p neuronbox-cli -p neuronbox-runtime --bin neurond

You need target/debug/neuron and target/debug/neurond side by side (or set NEUROND_PATH to the daemon binary). Add target/debug to PATH if you like.

./target/debug/neuron          # welcome screen
./target/debug/neuron help     # full command list

2. Create a project

mkdir ~/my-llm-project && cd ~/my-llm-project
/path/to/neuronbox/target/debug/neuron init

Edit neuron.yaml: model, entrypoint, runtime.packages, gpu.min_vram, runtime.mode (host vs oci), etc. Schema: specs/neuron.yaml.schema.json.

The template sets entrypoint: train.py — create that script (or change entrypoint to your own file) before neuron run.

3. Get weights

  • Hub-style id (one slash, no colon):

    /path/to/neuronbox/target/debug/neuron pull org/model
    

    Artifacts land under ~/.neuronbox/store by default.

  • Local tree or file (.gguf, .safetensors, …): set model.source: local and model.name to the path; no pull step.

  • Container images are not pulled by neuron pull. Use docker pull yourself, or neuron oci prepare when building a runc bundle (docs/OCI_AND_DOCKER.md).

Optional: HF_TOKEN in the environment for private Hub repos.

4. Run the entrypoint

/path/to/neuronbox/target/debug/neuron run

From the directory that contains neuron.yaml, or point at another manifest:

neuron run -f path/to/neuron.yaml

neuron run resolves the model (pull if needed for Hub ids), ensures the hashed venv, sets NEURONBOX_MODEL_DIR, NEURONBOX_SESSION_NAME, NEURONBOX_SESSION_VRAM_MB, and related vars, then spawns your entrypoint script. It registers the child with neurond and unregisters when the process exits.

Shortcut: neuron run org/model with a single HF-style argument only pulls and prints where the model lives—you still need a neuron.yaml and entrypoint to execute code.

neuron run tries to start neurond if the socket is down (best effort). If stats / dashboard cannot connect, run neuron daemon in another terminal.

5. Watch the machine

/path/to/neuronbox/target/debug/neuron dashboard   # TUI: sessions + charts + host/GPU
/path/to/neuronbox/target/debug/neuron stats         # plain-text snapshot

Default socket: ~/.neuronbox/neuron.sock, overridable with NEURONBOX_SOCKET.


How a run works

Piece Behavior
Virtualenv Path under store/envs/ is a hash of runtime.python, runtime.cuda, and runtime.packages. Same manifest shape ⇒ same env. Optional requirements.lock in that env dir + neuron lock for pinned installs.
Installer Prefers uv pip install when uv is on PATH; otherwise pip. Empty packages and no CUDA/ROCm extra index ⇒ no pip invocation.
Pinned revisions Set model.revision in neuron.yaml (or use neuron pull org/model --revision <sha-or-tag>) for reproducible model snapshots.
ROCm index control Set runtime.rocm (for example 6.0) to control the ROCm PyTorch extra-index URL when ROCm is detected.
Model path NEURONBOX_MODEL_DIR points at the resolved tree (store or local). NEURONBOX_MODEL_PATH when the manifest points at a single weights file.
Soft VRAM check If gpu.min_vram is set and the host reports GPU memory, neuron run can warn when estimates exceed what looks available (non-blocking).
Child environment Inherited PYTHONPATH is removed unless you set PYTHONPATH under env: in neuron.yaml (avoids IDE-injected paths breaking venv numpy/torch).

Daemon, sessions, and throughput

neurond keeps an in-memory registry of sessions (name, PID, estimated VRAM, tokens_per_sec). neuron run sends register_session after spawn and unregister_session after exit.

Automatic throughput detection

When neuron run spawns your entrypoint, it sets NEURONBOX_AUTOHOOK=1 and injects a valid SDK path into PYTHONPATH (NEURONBOX_SDK, local repo SDK, user SDK path, or bundled SDK extract). This installs lightweight hooks that automatically report tok/s for:

Framework Hooked method
transformers GenerationMixin.generate
vLLM LLM.generate
llama.cpp (Python) Llama.__call__, Llama.create_completion
OpenAI client Completions.create (local endpoints)

The hooks measure wall-clock time and output token count, then push updates to the daemon. No code change required in your script.

For neuron serve hot-swap flows, swap_signal.json can include resolved_model_dir. When present, workers should prefer it over model_ref so reloads stay local/store-aligned.

For unsupported frameworks or custom pipelines, you can call neuronbox.DaemonClient().call("register_session", ...) with the same PID and an updated tokens_per_sec (see specs/daemon-sessions.md).

Protocol types: runtime/src/protocol.rs.


Dashboard and demo mode

  • neuron dashboard — real Stats from the daemon, HostProbe for OS/arch/backends/GPUs, ~10 Hz UI refresh for session table and throughput history (history is client-side, not stored in the daemon).

  • neuron dashboard --demo (Unix) — starts synthetic sessions (helper sleep PIDs), animated tok/s, a mock swap model, and optional synthetic VRAM styling. Quit with q / Esc so the demo task can unregister. For cosmetic gauges on real hardware without fake sessions, you can set NEURONBOX_DEMO_SYNTHETIC_METRICS=1 (see docs/GPU_VRAM.md).


Use cases

Scenario Why NeuronBox
Training, LoRA, eval, batch inference One manifest ties code + weights + Python + GPU; same commands on laptop or server.
Large models and shared disks Central store; projects reference paths, not duplicate trees.
Reproducible envs Hashed env dirs + neuron lock / requirements.lock.
Visibility Daemon + dashboard / stats for sessions and reported tok/s.
Optional isolation neuron run --oci when runtime.mode: oci and you want Docker mounts + NVIDIA toolkit without hand-written docker run.
Mixed hardware neuron host inspect / neuron gpu list for support and CI notes.

NeuronBox vs Docker

Docker NeuronBox
Primary unit Image + container neuron.yaml + host paths
Strength Portability, isolation, orchestration Fast iteration on metal: hashed venvs, model store, one command to run the manifest
ML weights You map volumes yourself Native pull/store, NEURONBOX_* wiring
When to prefer Docker alone Production parity, K8s
When NeuronBox helps Daily local work; Docker only when you opt into OCI

CLI reference

Command Role
neuron Welcome screen
neuron help Full help
neuron init Create neuron.yaml in the current directory
neuron init --template NAME Create from template (inference, finetune, local-model)
neuron init --list-templates List available templates
neuron doctor Diagnostic checks for the NeuronBox environment
neuron doctor --strict Exit non-zero on any warning (for CI)
neuron pull <id> ML artifacts: HF-style org/model, configured alias, or local path → store
neuron pull <id> --revision SHA Pull a specific HF commit or tag
neuron run Run entrypoint from neuron.yaml (host by default)
neuron run -f FILE Use another manifest path
neuron run --gpu 0 Sets CUDA_VISIBLE_DEVICES for the child
neuron run --vram 12gb CLI VRAM hint for the session record
neuron run --oci Force Docker OCI path (requires runtime.mode: oci alignment; Linux+NVIDIA for GPU containers)
neuron run org/model Pull-only shortcut when a single HF-like arg is given
neuron serve [-f FILE] Long-lived worker + swap signal (same venv resolution as run)
neuron swap MODEL Daemon active model + swap_signal.json
neuron stats Text: sessions + GPU lines + swap
neuron dashboard Full-screen TUI
neuron dashboard --demo TUI + built-in mock load (Unix)
neuron host inspect JSON HostSnapshot
neuron gpu list Detected GPUs
neuron model list Store index
neuron model list --sizes Store index with disk usage
neuron model du Disk usage for all models
neuron model prune <id> Remove a model (dry-run by default)
neuron model prune <id> --execute Actually delete the model
neuron lock [-f FILE] Write requirements.lock into the hashed env (uv pip compile)
neuron daemon Run neurond in the foreground
neuron oci prepare Runc bundle (Docker on host for rootfs export)
neuron oci runc Run runc against a prepared bundle

Container note

Use neuron pull for model artifacts (HF ids, aliases, local paths).
For container images, use docker pull, or NeuronBox OCI commands (neuron oci ..., neuron run --oci) when you want containerized project execution with NeuronBox mounts.


Environment variables

Variable Purpose
NEURONBOX_SOCKET Unix socket path for neurond (default ~/.neuronbox/neuron.sock)
NEUROND_PATH Path to neurond if not beside neuron
HF_TOKEN Authenticated Hub downloads for neuron pull
NEURONBOX_SDK Override path to the SDK directory (for auto-hooks)
NEURONBOX_DISABLE_AUTOHOOK 1 / true / yes — disable automatic throughput hooks
NEURONBOX_HF_LAYOUT copy (default) or symlink — how to store HF models (Unix only for symlink)
NEURONBOX_METRICS_LOG Path to NDJSON file for throughput metrics logging
NEURONBOX_DEMO_SYNTHETIC_METRICS 1 / true / yes — extra synthetic styling in dashboard (optional)
NEURONBOX_DISABLE_VRAM_WATCH Disables daemon VRAM watch path (e.g. demo spawn)

Set per-project secrets and flags in neuron.yamlenv: (applied to run / serve children).


Prerequisites and build

  • Rust (workspace; see rust-toolchain.toml if present)
  • Python 3 on PATH (version should match runtime.python in your manifest when possible)
  • uv (optional, recommended for faster pip installs)
  • GPU tooling (optional): NVIDIA, AMD, or Apple Silicon; see neuron host inspect
cargo build --workspace

Linux + NVIDIA (richer reporting when linked):

cargo build -p neuronbox-cli --features nvml
cargo build -p neuronbox-runtime --features nvml

Outputs: target/debug/neuron, target/debug/neurond (or release/).

cargo install --path cli

installs neuron; install or copy neurond accordingly, or rely on NEUROND_PATH.


References

Doc Topic
docs/CLI_UX.md Welcome screen, theme, dashboard behavior
docs/OCI_AND_DOCKER.md When Docker runs
specs/neuron.yaml.schema.json Manifest schema
specs/swap-signal.schema.json Swap signal file
specs/daemon-sessions.md Socket protocol, sessions, tok/s updates
docs/MULTI_GPU.md Multi-GPU / DDP
docs/GPU_VRAM.md VRAM, NVML, NEURONBOX_DISABLE_VRAM_WATCH
docs/SECURITY.md Socket trust, limits, model trust
specs/examples/ Example YAML snippets

Repository layout

  • cli/neuron binary; cli/scripts/serve_worker.py (used by neuron serve)
  • runtime/ — shared library + neurond
  • specs/ — JSON Schema, protocol docs, YAML examples
  • sdk/ — optional Python client for the daemon socket (sdk/neuronbox/client.py); pip install -e sdk/ from the repo root if you want it on your PYTHONPATH

Contributing

Small changes welcome. Before opening a PR:

cargo fmt --all
cargo clippy --workspace --all-targets -- -D warnings
cargo test --workspace

By contributing, you agree your contributions are licensed under the same terms as the project (AGPL v3 for the open-source distribution; see LICENSING.md). For security-sensitive issues, see docs/SECURITY.md.