Alpha Rust implementation of JEPA (Joint Embedding Predictive Architecture) — the self-supervised learning framework from Yann LeCun and Meta AI for learning world models that predict in representation space rather than pixel space.
jepa-rs provides modular, backend-agnostic building blocks for I-JEPA (images), V-JEPA (video), and hierarchical world models, built on top of the burn deep learning framework. It includes a CLI and interactive TUI dashboard, safetensors checkpoint loading, ONNX metadata inspection, and a pretrained model registry for Facebook Research models.
┌──────────────┐
│ Context │──── Encoder ────┐
│ (visible) │ │
Image/Video ─────┤ │ ┌───────▼───────┐
│ Target │ │ Predictor │──── predicted repr
│ (masked) │──┐ └───────────────┘ │
└──────────────┘ │ │
│ ┌───────────────┐ │
└──────│ Target Encoder│── target repr
EMA │ (frozen) │ │
└───────────────┘ │
│
┌───────────────┐ │
│ Energy Loss │◄─────────┘
└───────────────┘
Why jepa-rs?
| jepa-rs | Python (PyTorch) | |
|---|---|---|
| Runtime | Native binary, no Python/CUDA dependency | Requires Python + PyTorch + CUDA |
| Inference | Safetensors checkpoint loading, ONNX metadata | PyTorch runtime |
| Memory | Rust ownership, no GC pauses | Python GC + PyTorch allocator |
| Backend | Any burn backend (CPU, GPU, WebGPU, WASM) | CUDA-centric |
| Type safety | Compile-time tensor shape checks | Runtime shape errors |
| Deployment | Single static binary | Docker + Python environment |
Pretrained Models
jepa-rs supports loading official Facebook Research pretrained JEPA models:
| Model | Architecture | Params | Resolution | Dataset | Weights |
|---|---|---|---|---|---|
| I-JEPA ViT-H/14 | ViT-Huge, patch 14 | 632M | 224x224 | ImageNet-1K | Download | HuggingFace |
| I-JEPA ViT-H/16-448 | ViT-Huge, patch 16 | 632M | 448x448 | ImageNet-1K | Download | HuggingFace |
| I-JEPA ViT-H/14 | ViT-Huge, patch 14 | 632M | 224x224 | ImageNet-22K | Download |
| I-JEPA ViT-G/16 | ViT-Giant, patch 16 | 1.0B | 224x224 | ImageNet-22K | Download |
| V-JEPA ViT-L/16 | ViT-Large, patch 16 | 304M | 224x224 | VideoMix2M | Download |
| V-JEPA ViT-H/16 | ViT-Huge, patch 16 | 632M | 224x224 | VideoMix2M | Download |
Quick Start
Installation
# Cargo.toml
[]
= "0.1.0"
= "0.1.0"
= "0.1.0" # For ONNX + checkpoint loading
CLI
The jepa binary provides a unified CLI for the workspace:
# Install the CLI from crates.io
# Or install from the local workspace checkout
# Launch the interactive TUI dashboard
# List pretrained models in the registry
# Inspect a safetensors checkpoint
# Analyze checkpoint with key remapping
# Launch a training run
# Train from a normal image directory tree with deterministic resize/crop/normalize
# Train from a safetensors image tensor dataset [N, C, H, W]
# Encode inputs through a safetensors checkpoint
# Or through an ONNX model
The CLI train command now runs real strict masked-image optimization with
AdamW and EMA. It chooses one input source per run:
--dataset-dir <PATH>for a recursive image-folder dataset (jpg,jpeg,png,webp) with decode, RGB conversion, shorter-side resize, center crop, CHW tensor conversion, and normalization--dataset <FILE> --dataset-key <KEY>for a safetensors image tensor shaped[N, C, H, W]- no dataset flags for the synthetic random-tensor fallback
Image-folder preprocessing defaults to the preset image size for --crop-size
and the ImageNet RGB normalization statistics when --mean and --std are
omitted. Dataset loading is currently single-threaded.
jepa encode executes real encoder weights for .safetensors and .onnx
inputs; other extensions still fall back to the preset demo path.
Runnable Examples
The jepa crate now ships runnable examples under
crates/jepa/examples/ that exercise the real training command instead of
mocking the CLI path:
# Create a tiny recursive image-folder dataset under target/example-data/jepa/
# Train for 2 steps on that generated image-folder dataset
# Train for 2 steps with the synthetic fallback path
The image-folder example deliberately uses a very small generated dataset (6 PNG files across nested subdirectories). That is enough for a meaningful smoke demo of recursive dataset discovery, decode, resize, crop, normalize, batching, masking, optimizer updates, and EMA without checking a large image corpus into git. It is not large enough to demonstrate real representation learning quality; it is an execution demo, not a benchmark dataset.
The TUI now incorporates these demos in the Training tab as a guided demo
runner. Launch jepa, switch to tab 3, choose a demo with j/k, and press
Enter to run it. The panel streams real run logs, step metrics, loss/energy
charts, and a short interpretation of what happened.
The TUI Inference tab on 4 adds a separate guided walkthrough for encoder
inference. It runs deterministic demo image patterns through a preset ViT,
streams phase changes, per-sample latency and embedding statistics, and explains
what the representation telemetry means. The walkthrough is intentionally a
pipeline demo rather than a pretrained semantic benchmark.
If you want to run the CLI directly after generating the demo dataset:
Loading SafeTensors Checkpoints
use load_checkpoint;
use ijepa_vit_keymap;
Building JEPA Models from Scratch
use *;
use NdArray;
use ;
use InputShape;
use IJepaConfig;
use VitConfig;
type B = ;
Browse Available Models
use ;
Architecture
jepa-rs/
├── jepa-core Core traits, tensor wrappers, masking, energy, EMA
│ ├── Encoder Trait for context/target encoders
│ ├── Predictor Trait for latent predictors
│ ├── EnergyFn L2, Cosine, SmoothL1 energy functions
│ ├── MaskingStrategy Block, MultiBlock, Spatiotemporal masking
│ ├── CollapseReg VICReg, BarlowTwins collapse prevention
│ └── EMA Exponential moving average with cosine schedule
│
├── jepa-vision Vision transformers and JEPA models
│ ├── VitEncoder ViT-S/B/L/H/G with 2D RoPE
│ ├── IJepa I-JEPA pipeline (image)
│ ├── VJepa V-JEPA pipeline (video, 3D tubelets)
│ └── Predictor Transformer-based cross-attention predictor
│
├── jepa-world World models and planning
│ ├── ActionPredictor Action-conditioned latent prediction
│ ├── Planner Random shooting planner with cost functions
│ ├── HierarchicalJepa Multi-level H-JEPA
│ └── ShortTermMemory Sliding-window memory for temporal context
│
├── jepa-train Training orchestration
│ ├── TrainConfig Learning rate schedules, EMA config
│ ├── JepaComponents Generic forward step orchestration
│ └── CheckpointMeta Save/resume metadata
│
├── jepa-compat Model compatibility and interop
│ ├── ModelRegistry Pretrained model catalog (Facebook Research)
│ ├── SafeTensors Load .safetensors checkpoints
│ ├── KeyMap PyTorch → burn key remapping
│ └── OnnxModelInfo ONNX metadata inspection and initializer loading
│
└── jepa CLI and interactive TUI dashboard
├── CLI models, inspect, checkpoint, train, encode commands
└── TUI Dashboard, Models, Training, Checkpoint, About tabs
All tensor-bearing APIs are generic over B: Backend, allowing transparent execution on CPU (NdArray), GPU (WGPU), or WebAssembly backends.
ONNX Support
jepa-rs provides ONNX metadata inspection and initializer loading through jepa-compat. This allows inspecting model structure, input/output specs, and importing weight initializers from .onnx files.
Current scope: metadata inspection and weight import are production-ready. Tract-based ONNX graph execution exists (OnnxSession, OnnxEncoder) but is not yet production-grade — it is functional for prototyping and testing.
Examples
| Example | Description | Run command |
|---|---|---|
jepa |
Interactive TUI dashboard | cargo run -p jepa |
jepa models |
Browse pretrained model registry | cargo run -p jepa -- models |
jepa train |
Launch a training run | cargo run -p jepa -- train --preset vit-base-16 |
prepare_demo_image_folder |
Generate a tiny recursive dataset for --dataset-dir demos |
cargo run -p jepa --example prepare_demo_image_folder |
train_image_folder_demo |
Run the real jepa train image-folder path on generated images |
cargo run -p jepa --example train_image_folder_demo |
train_synthetic_demo |
Run the real jepa train synthetic fallback path |
cargo run -p jepa --example train_synthetic_demo |
ijepa_demo |
Full I-JEPA forward pass pipeline | cargo run -p jepa-vision --example ijepa_demo |
ijepa_train_loop |
Training loop with metrics | cargo run -p jepa-vision --example ijepa_train_loop |
world_model_planning |
World model with random shooting | cargo run -p jepa-world --example world_model_planning |
model_registry |
Browse pretrained models (library) | cargo run -p jepa-compat --example model_registry |
Build & Test
# Build everything
# Run all tests
# Lint
# Format check
# Generate docs
# Run differential parity tests
# Target a single crate
Extended quality gates
# Code coverage (requires cargo-llvm-cov)
# Fuzz testing (requires cargo-fuzz)
( && )
# Benchmark smoke test
Project Status
Alpha — suitable for research, experimentation, and extension.
What works
- Complete I-JEPA and V-JEPA architectures with strict masked-encoder paths
- CLI with 6 commands (
models,inspect,checkpoint,train,encode,tui) - Interactive TUI dashboard with 6 tabs (Dashboard, Models, Training, Inference, Checkpoint, About)
- SafeTensors checkpoint loading with automatic key remapping
- ONNX metadata inspection and initializer loading
- Pretrained model registry with download URLs
- Differential parity tests against 3 checked-in strict image fixtures
- Comprehensive test suite (365 tests), property-based testing, fuzz targets
- All standard ViT configs: ViT-S/16, ViT-B/16, ViT-L/16, ViT-H/14, ViT-H/16, ViT-G/16
Known limitations
- The generic trainer slices tokens after encoder forward; strict pre-attention masking is available via
IJepa::forward_step_strictandVJepa::forward_step_strict - ONNX support covers metadata inspection and initializer loading only, not graph execution
- Differential parity runs in CI for strict image fixtures; broader video parity is pending
- First-time crates.io release must be published in dependency order because the workspace crates depend on each other by version
JEPA Variants: What We Implement
The JEPA family has grown across several papers. Here is exactly what jepa-rs implements and how each component maps to a specific paper and reference codebase.
I-JEPA (Image)
| Paper | Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (Assran et al., CVPR 2023) |
| Reference code | facebookresearch/ijepa (archived) |
| jepa-rs struct | IJepa<B> in jepa-vision (crates/jepa-vision/src/image.rs) |
| What it does | Self-supervised image representation learning. A ViT context-encoder sees only visible patches; a lightweight predictor predicts representations of masked target patches. The target-encoder is an EMA copy of the context-encoder. |
| Masking | BlockMasking — contiguous rectangular blocks on the 2D patch grid. |
| Faithful path | IJepa::forward_step_strict — filters tokens before encoder self-attention (matches the paper). |
| Approximate path | JepaComponents::forward_step in jepa-train — encodes full input then slices (post-encoder masking; cheaper but not faithful). |
| Parity status | 3 checked-in strict image fixtures verified in CI. |
V-JEPA (Video)
| Paper | Revisiting Feature Prediction for Learning Visual Representations from Video (Bardes et al., 2024) |
| Reference code | facebookresearch/jepa |
| jepa-rs struct | VJepa<B> in jepa-vision (crates/jepa-vision/src/video.rs) |
| What it does | Extends I-JEPA to video. A ViT encoder processes 3D tubelets (space + time) with 3D RoPE. |
| Masking | SpatiotemporalMasking — contiguous 3D regions in the spatiotemporal grid. |
| Faithful path | VJepa::forward_step_strict — pre-attention masking. |
| Parity status | Implemented but strict video parity not yet proven (pending). |
V-JEPA 2 features
| Paper | V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning (Bardes et al., 2025) |
| Reference code | facebookresearch/vjepa2 |
| jepa-rs support | Not a separate struct. The VJepa<B> struct can be configured with V-JEPA 2 features. |
| What we take from V-JEPA 2 | Cosine momentum schedule for EMA — CosineMomentumSchedule in jepa-core (Ema::with_cosine_schedule). Momentum ramps from base (e.g. 0.996) to 1.0 over training. Also: MultiBlockMasking strategy, ViT-Giant/14 preset. |
| What we don't implement | The full V-JEPA 2 training recipe, attentive probing, or the planning/action heads from the paper. |
Hierarchical JEPA (H-JEPA) — experimental
| Paper | Inspired by A Path Towards Autonomous Machine Intelligence (LeCun, 2022) — the original JEPA position paper describes hierarchical prediction as a long-term goal. No standalone H-JEPA paper exists yet. |
| jepa-rs struct | HierarchicalJepa<B> in jepa-world (crates/jepa-world/src/hierarchy.rs) |
| What it does | Stacks multiple JEPA levels at different temporal strides (e.g. stride 2, 6, 24). Each level has its own encoder and predictor. This is experimental — no reference implementation exists. |
Action-Conditioned World Model — experimental
| Paper | Draws from both the LeCun position paper and V-JEPA 2 (planning component). |
| jepa-rs structs | Action<B>, ActionConditionedPredictor<B> trait, RandomShootingPlanner in jepa-world (crates/jepa-world/src/action.rs, crates/jepa-world/src/planner.rs) |
| What it does | Predicts next-state representations given current state + action. Supports random-shooting (CEM) planning. This is experimental. |
What about EB-JEPA?
EB-JEPA (Terver et al., 2026) is a separate lightweight Python library for energy-based JEPA. jepa-rs is not an implementation of EB-JEPA. We reference it for comparison only. The energy functions in jepa-core (L2, Cosine, SmoothL1) are standard loss formulations, not the EB-JEPA energy framework.
Quick summary
| Variant | Paper | jepa-rs struct | Status |
|---|---|---|---|
| I-JEPA | Assran et al. 2023 | IJepa<B> |
Strict path implemented, parity verified |
| V-JEPA | Bardes et al. 2024 | VJepa<B> |
Strict path implemented, parity pending |
| V-JEPA 2 | Bardes et al. 2025 | VJepa<B> + cosine EMA schedule |
Select features only |
| H-JEPA | LeCun 2022 (position paper) | HierarchicalJepa<B> |
Experimental, no reference impl |
| World model | LeCun 2022 + V-JEPA 2 | ActionConditionedPredictor, RandomShootingPlanner |
Experimental |
| EB-JEPA | Terver et al. 2026 | Not implemented | Referenced for comparison only |
References
Papers
| Paper | Focus |
|---|---|
| A Path Towards Autonomous Machine Intelligence | JEPA position paper — hierarchical world models (LeCun, 2022) |
| I-JEPA | Self-supervised image learning with masked prediction in latent space (Assran et al., CVPR 2023) |
| V-JEPA | Extension to video with spatiotemporal masking (Bardes et al., 2024) |
| V-JEPA 2 | Video understanding, prediction, and planning (Bardes et al., 2025) |
| EB-JEPA | Lightweight energy-based JEPA library — referenced for comparison (Terver et al., 2026) |
Official reference implementations
| Repo | Models | Relationship to jepa-rs |
|---|---|---|
facebookresearch/ijepa |
I-JEPA (archived) | Primary reference for IJepa<B> and key remapping |
facebookresearch/jepa |
V-JEPA | Primary reference for VJepa<B> |
facebookresearch/vjepa2 |
V-JEPA 2 | Reference for cosine EMA schedule, ViT-G config |
facebookresearch/eb_jepa |
EB-JEPA tutorial | Not implemented — comparison only |
Contributing
See CONTRIBUTING.md for guidelines.
License
MIT License. See LICENSE for details.