jepa 0.1.0 - Docs.rs

Alpha Rust implementation of JEPA (Joint Embedding Predictive Architecture) — the self-supervised learning framework from Yann LeCun and Meta AI for learning world models that predict in representation space rather than pixel space.

jepa-rs provides modular, backend-agnostic building blocks for I-JEPA (images), V-JEPA (video), and hierarchical world models, built on top of the burn deep learning framework. It includes a CLI and interactive TUI dashboard, safetensors checkpoint loading, ONNX metadata inspection, and a pretrained model registry for Facebook Research models.

                    ┌──────────────┐
                    │   Context    │──── Encoder ────┐
                    │   (visible)  │                 │
   Image/Video ─────┤              │         ┌───────▼───────┐
                    │   Target     │         │   Predictor   │──── predicted repr
                    │   (masked)   │──┐      └───────────────┘          │
                    └──────────────┘  │                                 │
                                      │      ┌───────────────┐         │
                                      └──────│ Target Encoder│── target repr
                                        EMA  │   (frozen)    │         │
                                             └───────────────┘          │
                                                                        │
                                             ┌───────────────┐          │
                                             │  Energy Loss  │◄─────────┘
                                             └───────────────┘

Why jepa-rs?

	jepa-rs	Python (PyTorch)
Runtime	Native binary, no Python/CUDA dependency	Requires Python + PyTorch + CUDA
Inference	Safetensors checkpoint loading, ONNX metadata	PyTorch runtime
Memory	Rust ownership, no GC pauses	Python GC + PyTorch allocator
Backend	Any burn backend (CPU, GPU, WebGPU, WASM)	CUDA-centric
Type safety	Compile-time tensor shape checks	Runtime shape errors
Deployment	Single static binary	Docker + Python environment

Pretrained Models

jepa-rs supports loading official Facebook Research pretrained JEPA models:

Model	Architecture	Params	Resolution	Dataset	Weights
I-JEPA ViT-H/14	ViT-Huge, patch 14	632M	224x224	ImageNet-1K	Download \| HuggingFace
I-JEPA ViT-H/16-448	ViT-Huge, patch 16	632M	448x448	ImageNet-1K	Download \| HuggingFace
I-JEPA ViT-H/14	ViT-Huge, patch 14	632M	224x224	ImageNet-22K	Download
I-JEPA ViT-G/16	ViT-Giant, patch 16	1.0B	224x224	ImageNet-22K	Download
V-JEPA ViT-L/16	ViT-Large, patch 16	304M	224x224	VideoMix2M	Download
V-JEPA ViT-H/16	ViT-Huge, patch 16	632M	224x224	VideoMix2M	Download

Quick Start

Installation

# Cargo.toml
[dependencies]
jepa-core   = "0.1.0"
jepa-vision = "0.1.0"
jepa-compat = "0.1.0"  # For ONNX + checkpoint loading

CLI

The jepa binary provides a unified CLI for the workspace:

# Install the CLI from crates.io
cargo install jepa

# Or install from the local workspace checkout
cargo install --path crates/jepa

# Launch the interactive TUI dashboard
jepa

# List pretrained models in the registry
jepa models

# Inspect a safetensors checkpoint
jepa inspect model.safetensors

# Analyze checkpoint with key remapping
jepa checkpoint model.safetensors --keymap ijepa --verbose

# Launch a training run
jepa train --preset vit-base-16 --steps 10 --batch-size 1 --lr 1e-3

# Train from a normal image directory tree with deterministic resize/crop/normalize
jepa train --preset vit-base-16 --steps 100 --batch-size 4 \
  --dataset-dir ./images/train --resize 256 --crop-size 224 --shuffle

# Train from a safetensors image tensor dataset [N, C, H, W]
jepa train --preset vit-base-16 --steps 100 --batch-size 1 \
  --dataset train.safetensors --dataset-key images

# Encode inputs through a safetensors checkpoint
jepa encode --model model.safetensors --preset vit-base-16

# Or through an ONNX model
jepa encode --model model.onnx --height 224 --width 224

The CLI train command now runs real strict masked-image optimization with AdamW and EMA. It chooses one input source per run:

--dataset-dir <PATH> for a recursive image-folder dataset (jpg, jpeg, png, webp) with decode, RGB conversion, shorter-side resize, center crop, CHW tensor conversion, and normalization
--dataset <FILE> --dataset-key <KEY> for a safetensors image tensor shaped [N, C, H, W]
no dataset flags for the synthetic random-tensor fallback

Image-folder preprocessing defaults to the preset image size for --crop-size and the ImageNet RGB normalization statistics when --mean and --std are omitted. Dataset loading is currently single-threaded. jepa encode executes real encoder weights for .safetensors and .onnx inputs; other extensions still fall back to the preset demo path.

Runnable Examples

The jepa crate now ships runnable examples under crates/jepa/examples/ that exercise the real training command instead of mocking the CLI path:

# Create a tiny recursive image-folder dataset under target/example-data/jepa/
cargo run -p jepa --example prepare_demo_image_folder

# Train for 2 steps on that generated image-folder dataset
cargo run -p jepa --example train_image_folder_demo

# Train for 2 steps with the synthetic fallback path
cargo run -p jepa --example train_synthetic_demo

The image-folder example deliberately uses a very small generated dataset (6 PNG files across nested subdirectories). That is enough for a meaningful smoke demo of recursive dataset discovery, decode, resize, crop, normalize, batching, masking, optimizer updates, and EMA without checking a large image corpus into git. It is not large enough to demonstrate real representation learning quality; it is an execution demo, not a benchmark dataset.

The TUI now incorporates these demos in the Training tab as a guided demo runner. Launch jepa, switch to tab 3, choose a demo with j/k, and press Enter to run it. The panel streams real run logs, step metrics, loss/energy charts, and a short interpretation of what happened.

The TUI Inference tab on 4 adds a separate guided walkthrough for encoder inference. It runs deterministic demo image patterns through a preset ViT, streams phase changes, per-sample latency and embedding statistics, and explains what the representation telemetry means. The walkthrough is intentionally a pipeline demo rather than a pretrained semantic benchmark.

If you want to run the CLI directly after generating the demo dataset:

cargo run -p jepa -- train --preset vit-small-16 --steps 2 --batch-size 2 \
  --dataset-dir target/example-data/jepa/demo-image-folder \
  --resize 256 --crop-size 224 --shuffle --dataset-limit 6

Loading SafeTensors Checkpoints

use jepa_compat::safetensors::load_checkpoint;
use jepa_compat::keymap::ijepa_vit_keymap;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mappings = ijepa_vit_keymap();
    let checkpoint = load_checkpoint("model.safetensors", &mappings)?;

    println!("Loaded {} tensors", checkpoint.len());
    for key in checkpoint.keys() {
        println!("  {}: {:?}", key, checkpoint.get(key).unwrap().shape);
    }
    Ok(())
}

Building JEPA Models from Scratch

use burn::prelude::*;
use burn_ndarray::NdArray;
use jepa_core::masking::{BlockMasking, MaskingStrategy};
use jepa_core::types::InputShape;
use jepa_vision::image::IJepaConfig;
use jepa_vision::vit::VitConfig;

type B = NdArray<f32>;

fn main() {
    let device = burn_ndarray::NdArrayDevice::Cpu;

    // Configure I-JEPA with ViT-Huge/14 (matches Facebook pretrained)
    let config = IJepaConfig {
        encoder: VitConfig::vit_huge_patch14(),
        predictor: jepa_vision::image::TransformerPredictorConfig {
            encoder_embed_dim: 1280,
            predictor_embed_dim: 384,
            num_layers: 12,
            num_heads: 12,
            max_target_len: 256,
        },
    };
    let model = config.init::<B>(&device);

    // Generate masks (I-JEPA block masking)
    let shape = InputShape::Image { height: 16, width: 16 }; // 224/14 = 16
    let mut rng = rand_chacha::ChaCha8Rng::seed_from_u64(42);
    let masking = BlockMasking {
        num_targets: 4,
        target_scale: (0.15, 0.2),
        target_aspect_ratio: (0.75, 1.5),
    };
    let mask = masking.generate_mask(&shape, &mut rng);

    println!("Context tokens: {}, Target tokens: {}",
             mask.context_indices.len(), mask.target_indices.len());
}

Browse Available Models

use jepa_compat::registry::{list_models, find_model};

fn main() {
    for model in list_models() {
        println!("{}: {} ({}, {})",
            model.name,
            model.param_count_human(),
            model.architecture,
            model.pretrained_on);
    }

    // Search for a specific model
    if let Some(m) = find_model("vit-h/14") {
        println!("\nFound: {} with {} patches",
            m.name, m.num_patches());
    }
}

Architecture

jepa-rs/
├── jepa-core        Core traits, tensor wrappers, masking, energy, EMA
│   ├── Encoder          Trait for context/target encoders
│   ├── Predictor        Trait for latent predictors
│   ├── EnergyFn         L2, Cosine, SmoothL1 energy functions
│   ├── MaskingStrategy  Block, MultiBlock, Spatiotemporal masking
│   ├── CollapseReg      VICReg, BarlowTwins collapse prevention
│   └── EMA              Exponential moving average with cosine schedule
│
├── jepa-vision      Vision transformers and JEPA models
│   ├── VitEncoder       ViT-S/B/L/H/G with 2D RoPE
│   ├── IJepa            I-JEPA pipeline (image)
│   ├── VJepa            V-JEPA pipeline (video, 3D tubelets)
│   └── Predictor        Transformer-based cross-attention predictor
│
├── jepa-world       World models and planning
│   ├── ActionPredictor  Action-conditioned latent prediction
│   ├── Planner          Random shooting planner with cost functions
│   ├── HierarchicalJepa Multi-level H-JEPA
│   └── ShortTermMemory  Sliding-window memory for temporal context
│
├── jepa-train       Training orchestration
│   ├── TrainConfig      Learning rate schedules, EMA config
│   ├── JepaComponents   Generic forward step orchestration
│   └── CheckpointMeta   Save/resume metadata
│
├── jepa-compat      Model compatibility and interop
│   ├── ModelRegistry     Pretrained model catalog (Facebook Research)
│   ├── SafeTensors       Load .safetensors checkpoints
│   ├── KeyMap            PyTorch → burn key remapping
│   └── OnnxModelInfo     ONNX metadata inspection and initializer loading
│
└── jepa             CLI and interactive TUI dashboard
    ├── CLI               models, inspect, checkpoint, train, encode commands
    └── TUI               Dashboard, Models, Training, Checkpoint, About tabs

All tensor-bearing APIs are generic over B: Backend, allowing transparent execution on CPU (NdArray), GPU (WGPU), or WebAssembly backends.

ONNX Support

jepa-rs provides ONNX metadata inspection and initializer loading through jepa-compat. This allows inspecting model structure, input/output specs, and importing weight initializers from .onnx files.

Current scope: metadata inspection and weight import are production-ready. Tract-based ONNX graph execution exists (OnnxSession, OnnxEncoder) but is not yet production-grade — it is functional for prototyping and testing.

Examples

Example	Description	Run command
`jepa`	Interactive TUI dashboard	`cargo run -p jepa`
`jepa models`	Browse pretrained model registry	`cargo run -p jepa -- models`
`jepa train`	Launch a training run	`cargo run -p jepa -- train --preset vit-base-16`
`prepare_demo_image_folder`	Generate a tiny recursive dataset for `--dataset-dir` demos	`cargo run -p jepa --example prepare_demo_image_folder`
`train_image_folder_demo`	Run the real `jepa train` image-folder path on generated images	`cargo run -p jepa --example train_image_folder_demo`
`train_synthetic_demo`	Run the real `jepa train` synthetic fallback path	`cargo run -p jepa --example train_synthetic_demo`
`ijepa_demo`	Full I-JEPA forward pass pipeline	`cargo run -p jepa-vision --example ijepa_demo`
`ijepa_train_loop`	Training loop with metrics	`cargo run -p jepa-vision --example ijepa_train_loop`
`world_model_planning`	World model with random shooting	`cargo run -p jepa-world --example world_model_planning`
`model_registry`	Browse pretrained models (library)	`cargo run -p jepa-compat --example model_registry`

Build & Test

# Build everything
cargo build --workspace

# Run all tests
cargo test --workspace

# Lint
cargo clippy --workspace --all-targets -- -D warnings

# Format check
cargo fmt -- --check

# Generate docs
cargo doc --workspace --no-deps --open

# Run differential parity tests
scripts/run_parity_suite.sh

# Target a single crate
cargo test -p jepa-core
cargo test -p jepa-vision
cargo test -p jepa-compat

Extended quality gates

# Code coverage (requires cargo-llvm-cov)
cargo llvm-cov --workspace --all-features --fail-under-lines 80

# Fuzz testing (requires cargo-fuzz)
(cd fuzz && cargo fuzz run masking -- -runs=1000)

# Benchmark smoke test
cargo bench --workspace --no-run

Project Status

Alpha — suitable for research, experimentation, and extension.

What works

Complete I-JEPA and V-JEPA architectures with strict masked-encoder paths
CLI with 6 commands (models, inspect, checkpoint, train, encode, tui)
Interactive TUI dashboard with 6 tabs (Dashboard, Models, Training, Inference, Checkpoint, About)
SafeTensors checkpoint loading with automatic key remapping
ONNX metadata inspection and initializer loading
Pretrained model registry with download URLs
Differential parity tests against 3 checked-in strict image fixtures
Comprehensive test suite (365 tests), property-based testing, fuzz targets
All standard ViT configs: ViT-S/16, ViT-B/16, ViT-L/16, ViT-H/14, ViT-H/16, ViT-G/16

Known limitations

The generic trainer slices tokens after encoder forward; strict pre-attention masking is available via IJepa::forward_step_strict and VJepa::forward_step_strict
ONNX support covers metadata inspection and initializer loading only, not graph execution
Differential parity runs in CI for strict image fixtures; broader video parity is pending
First-time crates.io release must be published in dependency order because the workspace crates depend on each other by version

JEPA Variants: What We Implement

The JEPA family has grown across several papers. Here is exactly what jepa-rs implements and how each component maps to a specific paper and reference codebase.

I-JEPA (Image)


Paper	Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (Assran et al., CVPR 2023)
Reference code	`facebookresearch/ijepa` (archived)
jepa-rs struct	`IJepa<B>` in `jepa-vision` (`crates/jepa-vision/src/image.rs`)
What it does	Self-supervised image representation learning. A ViT context-encoder sees only visible patches; a lightweight predictor predicts representations of masked target patches. The target-encoder is an EMA copy of the context-encoder.
Masking	`BlockMasking` — contiguous rectangular blocks on the 2D patch grid.
Faithful path	`IJepa::forward_step_strict` — filters tokens before encoder self-attention (matches the paper).
Approximate path	`JepaComponents::forward_step` in `jepa-train` — encodes full input then slices (post-encoder masking; cheaper but not faithful).
Parity status	3 checked-in strict image fixtures verified in CI.

V-JEPA (Video)


Paper	Revisiting Feature Prediction for Learning Visual Representations from Video (Bardes et al., 2024)
Reference code	`facebookresearch/jepa`
jepa-rs struct	`VJepa<B>` in `jepa-vision` (`crates/jepa-vision/src/video.rs`)
What it does	Extends I-JEPA to video. A ViT encoder processes 3D tubelets (space + time) with 3D RoPE.
Masking	`SpatiotemporalMasking` — contiguous 3D regions in the spatiotemporal grid.
Faithful path	`VJepa::forward_step_strict` — pre-attention masking.
Parity status	Implemented but strict video parity not yet proven (pending).

V-JEPA 2 features


Paper	V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning (Bardes et al., 2025)
Reference code	`facebookresearch/vjepa2`
jepa-rs support	Not a separate struct. The `VJepa<B>` struct can be configured with V-JEPA 2 features.
What we take from V-JEPA 2	Cosine momentum schedule for EMA — `CosineMomentumSchedule` in `jepa-core` (`Ema::with_cosine_schedule`). Momentum ramps from base (e.g. 0.996) to 1.0 over training. Also: `MultiBlockMasking` strategy, ViT-Giant/14 preset.
What we don't implement	The full V-JEPA 2 training recipe, attentive probing, or the planning/action heads from the paper.

Hierarchical JEPA (H-JEPA) — experimental


Paper	Inspired by A Path Towards Autonomous Machine Intelligence (LeCun, 2022) — the original JEPA position paper describes hierarchical prediction as a long-term goal. No standalone H-JEPA paper exists yet.
jepa-rs struct	`HierarchicalJepa<B>` in `jepa-world` (`crates/jepa-world/src/hierarchy.rs`)
What it does	Stacks multiple JEPA levels at different temporal strides (e.g. stride 2, 6, 24). Each level has its own encoder and predictor. This is experimental — no reference implementation exists.

Action-Conditioned World Model — experimental


Paper	Draws from both the LeCun position paper and V-JEPA 2 (planning component).
jepa-rs structs	`Action<B>`, `ActionConditionedPredictor<B>` trait, `RandomShootingPlanner` in `jepa-world` (`crates/jepa-world/src/action.rs`, `crates/jepa-world/src/planner.rs`)
What it does	Predicts next-state representations given current state + action. Supports random-shooting (CEM) planning. This is experimental.

What about EB-JEPA?

EB-JEPA (Terver et al., 2026) is a separate lightweight Python library for energy-based JEPA. jepa-rs is not an implementation of EB-JEPA. We reference it for comparison only. The energy functions in jepa-core (L2, Cosine, SmoothL1) are standard loss formulations, not the EB-JEPA energy framework.

Quick summary

Variant	Paper	jepa-rs struct	Status
I-JEPA	Assran et al. 2023	`IJepa<B>`	Strict path implemented, parity verified
V-JEPA	Bardes et al. 2024	`VJepa<B>`	Strict path implemented, parity pending
V-JEPA 2	Bardes et al. 2025	`VJepa<B>` + cosine EMA schedule	Select features only
H-JEPA	LeCun 2022 (position paper)	`HierarchicalJepa<B>`	Experimental, no reference impl
World model	LeCun 2022 + V-JEPA 2	`ActionConditionedPredictor`, `RandomShootingPlanner`	Experimental
EB-JEPA	Terver et al. 2026	Not implemented	Referenced for comparison only

References

Papers

Paper	Focus
A Path Towards Autonomous Machine Intelligence	JEPA position paper — hierarchical world models (LeCun, 2022)
I-JEPA	Self-supervised image learning with masked prediction in latent space (Assran et al., CVPR 2023)
V-JEPA	Extension to video with spatiotemporal masking (Bardes et al., 2024)
V-JEPA 2	Video understanding, prediction, and planning (Bardes et al., 2025)
EB-JEPA	Lightweight energy-based JEPA library — referenced for comparison (Terver et al., 2026)

Official reference implementations

Repo	Models	Relationship to jepa-rs
`facebookresearch/ijepa`	I-JEPA (archived)	Primary reference for `IJepa<B>` and key remapping
`facebookresearch/jepa`	V-JEPA	Primary reference for `VJepa<B>`
`facebookresearch/vjepa2`	V-JEPA 2	Reference for cosine EMA schedule, ViT-G config
`facebookresearch/eb_jepa`	EB-JEPA tutorial	Not implemented — comparison only

Contributing

See CONTRIBUTING.md for guidelines.

License

MIT License. See LICENSE for details.