burn_dragon 🔥🐉🐣
burn inference and training of the dragon model

current state
This repository is now organized as a generic Dragon framework with curated library-facing APIs:
- root facade:
burn_dragon::api - core Dragon concepts:
burn_dragon_core::api - fused execution layer:
burn_dragon_kernel::api - checkpoint/deployment helpers:
burn_dragon_checkpoint::api - shared stream semantics:
burn_dragon_stream::api - multimodal composition:
burn_dragon_multimodal::api - domain adapters:
burn_dragon_language::apiburn_dragon_vision::apiburn_dragon_graph::apiburn_dragon_sudoku::api
Lower-level surfaces remain available for advanced use through ::api::expert::*, but they are
not the recommended entrypoint.
Framework status/tracking:
- framework support matrix
- Dragon Hatchling alignment spec
- Dragon Hatchling progress tracker
- Dragon mHC roadmap
- Dragon multimodal VL-JEPA roadmap
- Dragon multimodal progress tracker
features
- cached inference
- training benchmarks and reporting
- wasm deployments
- burnpack multipart deployment + streamed web initialization
- native burnpack bootstrap/cache helpers
- payload-agnostic stream/TBPTT crate
- topology-agnostic structured recurrent state contracts
- fused recurrent, local-grid, structured-pyramid, and sparse-graph kernels
- attention residuals and block-attention-residual connectors
- graph recurrent adapters with compiled execution
- pope positional embeddings
- additive/reference bitnet b1.58 low-bit training + packed artifact export
- sparsity metrics and visualization
- mHC core integration (experimental)
- vision dragon cortical column/stack
- recurrent, foveated saccade training and inference
- vision v-jepa 2.1 training surface
- vision rac training family (experimental)
- multimodal vl-jepa composition foundation
- GDPO
- adaptive tool discovery
- conditional (deep) gating
- document-coherent dataloading and scale mixup
- episodic memory
- hierarchical, memory-aware recurrent state
- mixture-of-expert routing
- neuromorphic backend
- streaming, sparse synaptic backpropagation
- temporal neuron dampening
Dataset configuration (built-in presets and Hugging Face examples) is documented inline in config/language/base.toml.
library quickstart
Compile-checked examples:
- examples/README.md
- examples/core_bdh_api.rs
- examples/vision_pyramid_api.rs
- examples/stream_api.rs
- examples/multimodal_vl_jepa_api.rs
- examples/graph_compiled_executor_api.rs
- examples/checkpoint_export_api.rs
Typical imports:
use core;
use checkpoint;
use stream;
use multimodal;
use vision;
use graph;
Use burn_dragon::api::expert::* only when you need lower-level compiled plans or internal
layout/kernel details.
deployment checkpoints
The shared deployment/checkpoint crate is burn_dragon_checkpoint.
Recommended deployment path:
- export model weights as burnpack (
.bpk) - optionally split them into
*.bpk.parts.json+*.bpk.part-*shards for web/CDN delivery - load them through the shared multipart helpers, the native bootstrap/cache helpers, or the web
loadModelFromUrl(...)path
Current status:
- multipart burnpack loading is shared and supported
- shared burnpack bundle export helpers are supported
- streamed web initialization is supported
- native cache/bootstrap resolution for remote burnpack bundles is supported in the shared checkpoint crate
- monolithic-or-parts burnpack loading is supported in CLI inference
- dragon checkpoint export is supported in CLI through
export_burnpack - vision encoder checkpoint export is supported in CLI through
export_burnpackfordistill,lejepa, andvideo_lejepa - sudoku checkpoint export is supported in CLI through
export_burnpack - graph checkpoint export is supported in CLI through
export_burnpack - multimodal VL-JEPA checkpoint export is supported in CLI through
export_burnpack - exporters currently expect checkpoints recorded by the current Burn runtime/layout used by this repository
- multimodal VL-JEPA runtime/config/export support now covers image-text and video-text composition on top of the shared stream crate
- the main remaining gaps are broader vision/video wrappers beyond encoder-style export and true deployment quantization beyond float downcast
Example language deployment export:
Example graph deployment export:
Example vision encoder deployment export:
The vision-encoder family exports the student VisionDragon encoder from the supported vision
training wrappers, including distill, lejepa, and video_lejepa.
Example multimodal VL-JEPA deployment export:
Example multimodal VL-JEPA training config chains:
# reproducible low-budget image-text smoke
# reproducible low-budget video-text smoke
Tracked real runs on the current codepath:
- image-text MNIST-label-text smoke:
runs/multimodal/wgpu/fat-operation - video-text centered-MNIST-label-text smoke:
runs/multimodal/wgpu/verdant-wealth
The cheap centered-MNIST video smoke is the current recommended reproducible multimodal video benchmark. The Moving-MNIST label-text path remains available as a harder temporal benchmark, but it is not the recommended low-budget validation target today.
Example sudoku deployment export:
training
cargo run -p burn_dragon_cli --release(defaults to the cuda backend)
inference
cargo run -p burn_dragon_cli --bin infer -- --max-tokens 2048 --streaming
benchmarks
cargo bench -p burn_dragon --features train,benchmark(executes both wgpu and cuda benchmarks)- open
target/criterion/report/index.html
compatible burn versions
burn_dragon |
burn |
|---|---|
0.4 |
0.21.0-pre.2 |
0.2 |
0.19 |
0.1 |
0.18 |
citation
If you found this useful, copy the below citation.
license
licensed under either of
- Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (http://opensource.org/licenses/MIT)
at your option.
contribution
unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
