NeuronBox
Build, run, and iterate on local AI workloads (training, fine-tuning, inference, benchmarks) with one workflow.
Website: neuronbox.dev

Describe your project once in neuron.yaml:
- where model weights live (HF id, local folder, or file)
- which Python stack to use
- GPU expectations
- which script to run
NeuronBox then handles the rest: reusable hashed virtualenvs, model store, environment wiring, and runtime visibility through neuron stats and neuron dashboard.
For stronger isolation, use neuron run --oci with runtime.mode: oci (Docker path only).
neuron opens a short getting-started screen, and neuron help lists all commands.
Scope: NeuronBox is a local-first stack: CLI, neurond, Unix-socket protocol, terminal dashboard, and shared model store. It is not a hosted multi-tenant cloud.
License: GNU Affero General Public License v3 (open source). SPDX identifier in manifests: AGPL-3.0-only. If you cannot meet AGPL obligations (e.g. closed-source SaaS), you need a commercial license — see LICENSING.md and contact neuronbox.contact@proton.me.
Contents
- Quick start (60 seconds)
- Why NeuronBox (at a glance)
- Tutorial: end-to-end
- How a run works
- Daemon, sessions, and throughput
- Dashboard and demo mode
- Use cases
- NeuronBox vs Docker
- CLI reference
- Environment variables
- Prerequisites and build
- References
- Repository layout
- Contributing
Quick start (60 seconds)
# 1) Build
# 2) Create a project
&&
# 3) Pull weights
# 4) Run
# 5) Observe
You can pin exact model revisions with neuron pull org/model --revision <sha-or-tag>.
Why NeuronBox (at a glance)
-
Declare the job, not the plumbing:
gpu.min_vram,runtime.packages,model.source, andentrypointlive inneuron.yamlinstead of ad-hoc CUDA matrices and one-off volume maps. -
Model store, not a 50 GB image layer: weights are first-class artifacts in
~/.neuronbox/store, shared across projects, with paths exposed viaNEURONBOX_MODEL_DIR(and related vars).neuron pullfetches Hugging Face–styleorg/modeltrees into that store (seeneuron pull --helpfor aliases and local paths). -
Hot-swap for iteration:
neuron swapupdates daemon state and~/.neuronbox/swap_signal.json;neuron serveruns a long-lived Python worker that can react without cold-starting your whole stack for every weight change. -
One view of the machine:
neuron host inspectandneuron gpu listsummarize Metal, ROCm, CUDA, and optional NVML so laptops and Linux servers share one mental model.
Tutorial: end-to-end
1. Build the binaries
From your clone:
You need target/debug/neuron and target/debug/neurond side by side (or set NEUROND_PATH to the daemon binary). Add target/debug to PATH if you like.
2. Create a project
&&
Edit neuron.yaml: model, entrypoint, runtime.packages, gpu.min_vram, runtime.mode (host vs oci), etc. Schema: specs/neuron.yaml.schema.json.
The template sets entrypoint: train.py — create that script (or change entrypoint to your own file) before neuron run.
3. Get weights
-
Hub-style id (one slash, no colon):
Artifacts land under
~/.neuronbox/storeby default. -
Local tree or file (
.gguf,.safetensors, …): setmodel.source: localandmodel.nameto the path; nopullstep. -
Container images are not pulled by
neuron pull. Usedocker pullyourself, orneuron oci preparewhen building a runc bundle (docs/OCI_AND_DOCKER.md).
Optional: HF_TOKEN in the environment for private Hub repos.
4. Run the entrypoint
From the directory that contains neuron.yaml, or point at another manifest:
neuron run resolves the model (pull if needed for Hub ids), ensures the hashed venv, sets NEURONBOX_MODEL_DIR, NEURONBOX_SESSION_NAME, NEURONBOX_SESSION_VRAM_MB, and related vars, then spawns your entrypoint script. It registers the child with neurond and unregisters when the process exits.
Shortcut: neuron run org/model with a single HF-style argument only pulls and prints where the model lives—you still need a neuron.yaml and entrypoint to execute code.
neuron run tries to start neurond if the socket is down (best effort). If stats / dashboard cannot connect, run neuron daemon in another terminal.
5. Watch the machine
Default socket: ~/.neuronbox/neuron.sock, overridable with NEURONBOX_SOCKET.
How a run works
| Piece | Behavior |
|---|---|
| Virtualenv | Path under store/envs/ is a hash of runtime.python, runtime.cuda, and runtime.packages. Same manifest shape ⇒ same env. Optional requirements.lock in that env dir + neuron lock for pinned installs. |
| Installer | Prefers uv pip install when uv is on PATH; otherwise pip. Empty packages and no CUDA/ROCm extra index ⇒ no pip invocation. |
| Pinned revisions | Set model.revision in neuron.yaml (or use neuron pull org/model --revision <sha-or-tag>) for reproducible model snapshots. |
| ROCm index control | Set runtime.rocm (for example 6.0) to control the ROCm PyTorch extra-index URL when ROCm is detected. |
| Model path | NEURONBOX_MODEL_DIR points at the resolved tree (store or local). NEURONBOX_MODEL_PATH when the manifest points at a single weights file. |
| Soft VRAM check | If gpu.min_vram is set and the host reports GPU memory, neuron run can warn when estimates exceed what looks available (non-blocking). |
| Child environment | Inherited PYTHONPATH is removed unless you set PYTHONPATH under env: in neuron.yaml (avoids IDE-injected paths breaking venv numpy/torch). |
Daemon, sessions, and throughput
neurond keeps an in-memory registry of sessions (name, PID, estimated VRAM, tokens_per_sec). neuron run sends register_session after spawn and unregister_session after exit.
Automatic throughput detection
When neuron run spawns your entrypoint, it sets NEURONBOX_AUTOHOOK=1 and injects a valid SDK path into PYTHONPATH (NEURONBOX_SDK, local repo SDK, user SDK path, or bundled SDK extract). This installs lightweight hooks that automatically report tok/s for:
| Framework | Hooked method |
|---|---|
| transformers | GenerationMixin.generate |
| vLLM | LLM.generate |
| llama.cpp (Python) | Llama.__call__, Llama.create_completion |
| OpenAI client | Completions.create (local endpoints) |
The hooks measure wall-clock time and output token count, then push updates to the daemon. No code change required in your script.
For neuron serve hot-swap flows, swap_signal.json can include resolved_model_dir. When present, workers should prefer it over model_ref so reloads stay local/store-aligned.
For unsupported frameworks or custom pipelines, you can call neuronbox.DaemonClient().call("register_session", ...) with the same PID and an updated tokens_per_sec (see specs/daemon-sessions.md).
Protocol types: runtime/src/protocol.rs.
Dashboard and demo mode
-
neuron dashboard— real Stats from the daemon, HostProbe for OS/arch/backends/GPUs, ~10 Hz UI refresh for session table and throughput history (history is client-side, not stored in the daemon). -
neuron dashboard --demo(Unix) — starts synthetic sessions (helpersleepPIDs), animated tok/s, a mock swap model, and optional synthetic VRAM styling. Quit withq/Escso the demo task can unregister. For cosmetic gauges on real hardware without fake sessions, you can setNEURONBOX_DEMO_SYNTHETIC_METRICS=1(see docs/GPU_VRAM.md).
Use cases
| Scenario | Why NeuronBox |
|---|---|
| Training, LoRA, eval, batch inference | One manifest ties code + weights + Python + GPU; same commands on laptop or server. |
| Large models and shared disks | Central store; projects reference paths, not duplicate trees. |
| Reproducible envs | Hashed env dirs + neuron lock / requirements.lock. |
| Visibility | Daemon + dashboard / stats for sessions and reported tok/s. |
| Optional isolation | neuron run --oci when runtime.mode: oci and you want Docker mounts + NVIDIA toolkit without hand-written docker run. |
| Mixed hardware | neuron host inspect / neuron gpu list for support and CI notes. |
NeuronBox vs Docker
| Docker | NeuronBox | |
|---|---|---|
| Primary unit | Image + container | neuron.yaml + host paths |
| Strength | Portability, isolation, orchestration | Fast iteration on metal: hashed venvs, model store, one command to run the manifest |
| ML weights | You map volumes yourself | Native pull/store, NEURONBOX_* wiring |
| When to prefer Docker alone | Production parity, K8s | — |
| When NeuronBox helps | — | Daily local work; Docker only when you opt into OCI |
CLI reference
| Command | Role |
|---|---|
neuron |
Welcome screen |
neuron help |
Full help |
neuron init |
Create neuron.yaml in the current directory |
neuron init --template NAME |
Create from template (inference, finetune, local-model) |
neuron init --list-templates |
List available templates |
neuron doctor |
Diagnostic checks for the NeuronBox environment |
neuron doctor --strict |
Exit non-zero on any warning (for CI) |
neuron pull <id> |
ML artifacts: HF-style org/model, configured alias, or local path → store |
neuron pull <id> --revision SHA |
Pull a specific HF commit or tag |
neuron run |
Run entrypoint from neuron.yaml (host by default) |
neuron run -f FILE |
Use another manifest path |
neuron run --gpu 0 |
Sets CUDA_VISIBLE_DEVICES for the child |
neuron run --vram 12gb |
CLI VRAM hint for the session record |
neuron run --oci |
Force Docker OCI path (requires runtime.mode: oci alignment; Linux+NVIDIA for GPU containers) |
neuron run org/model |
Pull-only shortcut when a single HF-like arg is given |
neuron serve [-f FILE] |
Long-lived worker + swap signal (same venv resolution as run) |
neuron swap MODEL |
Daemon active model + swap_signal.json |
neuron stats |
Text: sessions + GPU lines + swap |
neuron dashboard |
Full-screen TUI |
neuron dashboard --demo |
TUI + built-in mock load (Unix) |
neuron host inspect |
JSON HostSnapshot |
neuron gpu list |
Detected GPUs |
neuron model list |
Store index |
neuron model list --sizes |
Store index with disk usage |
neuron model du |
Disk usage for all models |
neuron model prune <id> |
Remove a model (dry-run by default) |
neuron model prune <id> --execute |
Actually delete the model |
neuron lock [-f FILE] |
Write requirements.lock into the hashed env (uv pip compile) |
neuron daemon |
Run neurond in the foreground |
neuron oci prepare |
Runc bundle (Docker on host for rootfs export) |
neuron oci runc |
Run runc against a prepared bundle |
Container note
Use neuron pull for model artifacts (HF ids, aliases, local paths).
For container images, use docker pull, or NeuronBox OCI commands (neuron oci ..., neuron run --oci) when you want containerized project execution with NeuronBox mounts.
Environment variables
| Variable | Purpose |
|---|---|
NEURONBOX_SOCKET |
Unix socket path for neurond (default ~/.neuronbox/neuron.sock) |
NEUROND_PATH |
Path to neurond if not beside neuron |
HF_TOKEN |
Authenticated Hub downloads for neuron pull |
NEURONBOX_SDK |
Override path to the SDK directory (for auto-hooks) |
NEURONBOX_DISABLE_AUTOHOOK |
1 / true / yes — disable automatic throughput hooks |
NEURONBOX_HF_LAYOUT |
copy (default) or symlink — how to store HF models (Unix only for symlink) |
NEURONBOX_METRICS_LOG |
Path to NDJSON file for throughput metrics logging |
NEURONBOX_DEMO_SYNTHETIC_METRICS |
1 / true / yes — extra synthetic styling in dashboard (optional) |
NEURONBOX_DISABLE_VRAM_WATCH |
Disables daemon VRAM watch path (e.g. demo spawn) |
Set per-project secrets and flags in neuron.yaml → env: (applied to run / serve children).
Prerequisites and build
- Rust (workspace; see rust-toolchain.toml if present)
- Python 3 on
PATH(version should matchruntime.pythonin your manifest when possible) uv(optional, recommended for fasterpipinstalls)- GPU tooling (optional): NVIDIA, AMD, or Apple Silicon; see
neuron host inspect
Linux + NVIDIA (richer reporting when linked):
Outputs: target/debug/neuron, target/debug/neurond (or release/).
installs neuron; install or copy neurond accordingly, or rely on NEUROND_PATH.
References
| Doc | Topic |
|---|---|
| docs/CLI_UX.md | Welcome screen, theme, dashboard behavior |
| docs/OCI_AND_DOCKER.md | When Docker runs |
| specs/neuron.yaml.schema.json | Manifest schema |
| specs/swap-signal.schema.json | Swap signal file |
| specs/daemon-sessions.md | Socket protocol, sessions, tok/s updates |
| docs/MULTI_GPU.md | Multi-GPU / DDP |
| docs/GPU_VRAM.md | VRAM, NVML, NEURONBOX_DISABLE_VRAM_WATCH |
| docs/SECURITY.md | Socket trust, limits, model trust |
| specs/examples/ | Example YAML snippets |
Repository layout
cli/—neuronbinary;cli/scripts/serve_worker.py(used byneuron serve)runtime/— shared library +neurondspecs/— JSON Schema, protocol docs, YAML examplessdk/— optional Python client for the daemon socket (sdk/neuronbox/client.py);pip install -e sdk/from the repo root if you want it on yourPYTHONPATH
Contributing
Small changes welcome. Before opening a PR:
By contributing, you agree your contributions are licensed under the same terms as the project (AGPL v3 for the open-source distribution; see LICENSING.md). For security-sensitive issues, see docs/SECURITY.md.