jepa 0.1.0

CLI and TUI for the jepa-rs workspace
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
<p align="center">
  <h1 align="center">jepa-rs</h1>
  <p align="center">
    <strong>Joint Embedding Predictive Architecture in Rust</strong>
  </p>
  <p align="center">
    <a href="https://github.com/AbdelStark/jepa-rs/actions"><img src="https://img.shields.io/github/actions/workflow/status/AbdelStark/jepa-rs/ci.yml?branch=main&style=flat-square&logo=github&label=CI" alt="CI"></a>
    <a href="https://github.com/AbdelStark/jepa-rs/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg?style=flat-square" alt="License: MIT"></a>
    <a href="https://docs.rs/jepa-core"><img src="https://img.shields.io/badge/docs-docs.rs-blue.svg?style=flat-square" alt="docs.rs"></a>
    <a href="https://crates.io/crates/jepa-core"><img src="https://img.shields.io/crates/v/jepa-core.svg?style=flat-square" alt="crates.io"></a>
  </p>
</p>

---

Alpha Rust implementation of **JEPA** (Joint Embedding Predictive Architecture) — the self-supervised learning framework from [Yann LeCun and Meta AI](https://openreview.net/pdf?id=BZ5a1r-kVsf) for learning world models that predict in representation space rather than pixel space.

**jepa-rs** provides modular, backend-agnostic building blocks for I-JEPA (images), V-JEPA (video), and hierarchical world models, built on top of the [burn](https://burn.dev) deep learning framework. It includes a CLI and interactive TUI dashboard, safetensors checkpoint loading, ONNX metadata inspection, and a pretrained model registry for Facebook Research models.

```
                    ┌──────────────┐
                    │   Context    │──── Encoder ────┐
                    │   (visible)  │                 │
   Image/Video ─────┤              │         ┌───────▼───────┐
                    │   Target     │         │   Predictor   │──── predicted repr
                    │   (masked)   │──┐      └───────────────┘          │
                    └──────────────┘  │                                 │
                                      │      ┌───────────────┐         │
                                      └──────│ Target Encoder│── target repr
                                        EMA  │   (frozen)    │         │
                                             └───────────────┘          │
                                             ┌───────────────┐          │
                                             │  Energy Loss  │◄─────────┘
                                             └───────────────┘
```

## Why jepa-rs?

| | jepa-rs | Python (PyTorch) |
|---|---|---|
| **Runtime** | Native binary, no Python/CUDA dependency | Requires Python + PyTorch + CUDA |
| **Inference** | Safetensors checkpoint loading, ONNX metadata | PyTorch runtime |
| **Memory** | Rust ownership, no GC pauses | Python GC + PyTorch allocator |
| **Backend** | Any burn backend (CPU, GPU, WebGPU, WASM) | CUDA-centric |
| **Type safety** | Compile-time tensor shape checks | Runtime shape errors |
| **Deployment** | Single static binary | Docker + Python environment |

## Pretrained Models

jepa-rs supports loading official Facebook Research pretrained JEPA models:

| Model | Architecture | Params | Resolution | Dataset | Weights |
|-------|-------------|--------|-----------|---------|---------|
| **I-JEPA ViT-H/14** | ViT-Huge, patch 14 | 632M | 224x224 | ImageNet-1K | [Download]https://dl.fbaipublicfiles.com/ijepa/IN1K-vit.h.14-300e.pth.tar \| [HuggingFace]https://huggingface.co/facebook/ijepa_vith14_1k |
| **I-JEPA ViT-H/16-448** | ViT-Huge, patch 16 | 632M | 448x448 | ImageNet-1K | [Download]https://dl.fbaipublicfiles.com/ijepa/IN1K-vit.h.16.448-300e.pth.tar \| [HuggingFace]https://huggingface.co/facebook/ijepa_vith16_448 |
| **I-JEPA ViT-H/14** | ViT-Huge, patch 14 | 632M | 224x224 | ImageNet-22K | [Download]https://dl.fbaipublicfiles.com/ijepa/IN22K-vit.h.14-900e.pth.tar |
| **I-JEPA ViT-G/16** | ViT-Giant, patch 16 | 1.0B | 224x224 | ImageNet-22K | [Download]https://dl.fbaipublicfiles.com/ijepa/IN22K-vit.g.16-600e.pth.tar |
| **V-JEPA ViT-L/16** | ViT-Large, patch 16 | 304M | 224x224 | VideoMix2M | [Download]https://dl.fbaipublicfiles.com/jepa/vit.l.16-k400-300e.pth.tar |
| **V-JEPA ViT-H/16** | ViT-Huge, patch 16 | 632M | 224x224 | VideoMix2M | [Download]https://dl.fbaipublicfiles.com/jepa/vit.h.16-k400-300e.pth.tar |

## Quick Start

### Installation

```toml
# Cargo.toml
[dependencies]
jepa-core   = "0.1.0"
jepa-vision = "0.1.0"
jepa-compat = "0.1.0"  # For ONNX + checkpoint loading
```

### CLI

The `jepa` binary provides a unified CLI for the workspace:

```bash
# Install the CLI from crates.io
cargo install jepa

# Or install from the local workspace checkout
cargo install --path crates/jepa

# Launch the interactive TUI dashboard
jepa

# List pretrained models in the registry
jepa models

# Inspect a safetensors checkpoint
jepa inspect model.safetensors

# Analyze checkpoint with key remapping
jepa checkpoint model.safetensors --keymap ijepa --verbose

# Launch a training run
jepa train --preset vit-base-16 --steps 10 --batch-size 1 --lr 1e-3

# Train from a normal image directory tree with deterministic resize/crop/normalize
jepa train --preset vit-base-16 --steps 100 --batch-size 4 \
  --dataset-dir ./images/train --resize 256 --crop-size 224 --shuffle

# Train from a safetensors image tensor dataset [N, C, H, W]
jepa train --preset vit-base-16 --steps 100 --batch-size 1 \
  --dataset train.safetensors --dataset-key images

# Encode inputs through a safetensors checkpoint
jepa encode --model model.safetensors --preset vit-base-16

# Or through an ONNX model
jepa encode --model model.onnx --height 224 --width 224
```

The CLI `train` command now runs real strict masked-image optimization with
AdamW and EMA. It chooses one input source per run:

- `--dataset-dir <PATH>` for a recursive image-folder dataset (`jpg`, `jpeg`, `png`, `webp`) with decode, RGB conversion, shorter-side resize, center crop, CHW tensor conversion, and normalization
- `--dataset <FILE> --dataset-key <KEY>` for a safetensors image tensor shaped `[N, C, H, W]`
- no dataset flags for the synthetic random-tensor fallback

Image-folder preprocessing defaults to the preset image size for `--crop-size`
and the ImageNet RGB normalization statistics when `--mean` and `--std` are
omitted. Dataset loading is currently single-threaded.
`jepa encode` executes real encoder weights for `.safetensors` and `.onnx`
inputs; other extensions still fall back to the preset demo path.

### Runnable Examples

The `jepa` crate now ships runnable examples under
`crates/jepa/examples/` that exercise the real training command instead of
mocking the CLI path:

```bash
# Create a tiny recursive image-folder dataset under target/example-data/jepa/
cargo run -p jepa --example prepare_demo_image_folder

# Train for 2 steps on that generated image-folder dataset
cargo run -p jepa --example train_image_folder_demo

# Train for 2 steps with the synthetic fallback path
cargo run -p jepa --example train_synthetic_demo
```

The image-folder example deliberately uses a very small generated dataset
(6 PNG files across nested subdirectories). That is enough for a meaningful
smoke demo of recursive dataset discovery, decode, resize, crop, normalize,
batching, masking, optimizer updates, and EMA without checking a large image
corpus into git. It is not large enough to demonstrate real representation
learning quality; it is an execution demo, not a benchmark dataset.

The TUI now incorporates these demos in the `Training` tab as a guided demo
runner. Launch `jepa`, switch to tab `3`, choose a demo with `j/k`, and press
`Enter` to run it. The panel streams real run logs, step metrics, loss/energy
charts, and a short interpretation of what happened.

The TUI `Inference` tab on `4` adds a separate guided walkthrough for encoder
inference. It runs deterministic demo image patterns through a preset ViT,
streams phase changes, per-sample latency and embedding statistics, and explains
what the representation telemetry means. The walkthrough is intentionally a
pipeline demo rather than a pretrained semantic benchmark.

If you want to run the CLI directly after generating the demo dataset:

```bash
cargo run -p jepa -- train --preset vit-small-16 --steps 2 --batch-size 2 \
  --dataset-dir target/example-data/jepa/demo-image-folder \
  --resize 256 --crop-size 224 --shuffle --dataset-limit 6
```

### Loading SafeTensors Checkpoints

```rust
use jepa_compat::safetensors::load_checkpoint;
use jepa_compat::keymap::ijepa_vit_keymap;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mappings = ijepa_vit_keymap();
    let checkpoint = load_checkpoint("model.safetensors", &mappings)?;

    println!("Loaded {} tensors", checkpoint.len());
    for key in checkpoint.keys() {
        println!("  {}: {:?}", key, checkpoint.get(key).unwrap().shape);
    }
    Ok(())
}
```

### Building JEPA Models from Scratch

```rust
use burn::prelude::*;
use burn_ndarray::NdArray;
use jepa_core::masking::{BlockMasking, MaskingStrategy};
use jepa_core::types::InputShape;
use jepa_vision::image::IJepaConfig;
use jepa_vision::vit::VitConfig;

type B = NdArray<f32>;

fn main() {
    let device = burn_ndarray::NdArrayDevice::Cpu;

    // Configure I-JEPA with ViT-Huge/14 (matches Facebook pretrained)
    let config = IJepaConfig {
        encoder: VitConfig::vit_huge_patch14(),
        predictor: jepa_vision::image::TransformerPredictorConfig {
            encoder_embed_dim: 1280,
            predictor_embed_dim: 384,
            num_layers: 12,
            num_heads: 12,
            max_target_len: 256,
        },
    };
    let model = config.init::<B>(&device);

    // Generate masks (I-JEPA block masking)
    let shape = InputShape::Image { height: 16, width: 16 }; // 224/14 = 16
    let mut rng = rand_chacha::ChaCha8Rng::seed_from_u64(42);
    let masking = BlockMasking {
        num_targets: 4,
        target_scale: (0.15, 0.2),
        target_aspect_ratio: (0.75, 1.5),
    };
    let mask = masking.generate_mask(&shape, &mut rng);

    println!("Context tokens: {}, Target tokens: {}",
             mask.context_indices.len(), mask.target_indices.len());
}
```

### Browse Available Models

```rust
use jepa_compat::registry::{list_models, find_model};

fn main() {
    for model in list_models() {
        println!("{}: {} ({}, {})",
            model.name,
            model.param_count_human(),
            model.architecture,
            model.pretrained_on);
    }

    // Search for a specific model
    if let Some(m) = find_model("vit-h/14") {
        println!("\nFound: {} with {} patches",
            m.name, m.num_patches());
    }
}
```

## Architecture

```text
jepa-rs/
├── jepa-core        Core traits, tensor wrappers, masking, energy, EMA
│   ├── Encoder          Trait for context/target encoders
│   ├── Predictor        Trait for latent predictors
│   ├── EnergyFn         L2, Cosine, SmoothL1 energy functions
│   ├── MaskingStrategy  Block, MultiBlock, Spatiotemporal masking
│   ├── CollapseReg      VICReg, BarlowTwins collapse prevention
│   └── EMA              Exponential moving average with cosine schedule
│
├── jepa-vision      Vision transformers and JEPA models
│   ├── VitEncoder       ViT-S/B/L/H/G with 2D RoPE
│   ├── IJepa            I-JEPA pipeline (image)
│   ├── VJepa            V-JEPA pipeline (video, 3D tubelets)
│   └── Predictor        Transformer-based cross-attention predictor
│
├── jepa-world       World models and planning
│   ├── ActionPredictor  Action-conditioned latent prediction
│   ├── Planner          Random shooting planner with cost functions
│   ├── HierarchicalJepa Multi-level H-JEPA
│   └── ShortTermMemory  Sliding-window memory for temporal context
│
├── jepa-train       Training orchestration
│   ├── TrainConfig      Learning rate schedules, EMA config
│   ├── JepaComponents   Generic forward step orchestration
│   └── CheckpointMeta   Save/resume metadata
│
├── jepa-compat      Model compatibility and interop
│   ├── ModelRegistry     Pretrained model catalog (Facebook Research)
│   ├── SafeTensors       Load .safetensors checkpoints
│   ├── KeyMap            PyTorch → burn key remapping
│   └── OnnxModelInfo     ONNX metadata inspection and initializer loading
│
└── jepa             CLI and interactive TUI dashboard
    ├── CLI               models, inspect, checkpoint, train, encode commands
    └── TUI               Dashboard, Models, Training, Checkpoint, About tabs
```

All tensor-bearing APIs are generic over `B: Backend`, allowing transparent execution on CPU (NdArray), GPU (WGPU), or WebAssembly backends.

## ONNX Support

jepa-rs provides ONNX metadata inspection and initializer loading through `jepa-compat`. This allows inspecting model structure, input/output specs, and importing weight initializers from `.onnx` files.

**Current scope**: metadata inspection and weight import are production-ready. Tract-based ONNX graph execution exists (`OnnxSession`, `OnnxEncoder`) but is not yet production-grade — it is functional for prototyping and testing.

## Examples

| Example | Description | Run command |
|---------|-------------|-------------|
| `jepa` | Interactive TUI dashboard | `cargo run -p jepa` |
| `jepa models` | Browse pretrained model registry | `cargo run -p jepa -- models` |
| `jepa train` | Launch a training run | `cargo run -p jepa -- train --preset vit-base-16` |
| `prepare_demo_image_folder` | Generate a tiny recursive dataset for `--dataset-dir` demos | `cargo run -p jepa --example prepare_demo_image_folder` |
| `train_image_folder_demo` | Run the real `jepa train` image-folder path on generated images | `cargo run -p jepa --example train_image_folder_demo` |
| `train_synthetic_demo` | Run the real `jepa train` synthetic fallback path | `cargo run -p jepa --example train_synthetic_demo` |
| `ijepa_demo` | Full I-JEPA forward pass pipeline | `cargo run -p jepa-vision --example ijepa_demo` |
| `ijepa_train_loop` | Training loop with metrics | `cargo run -p jepa-vision --example ijepa_train_loop` |
| `world_model_planning` | World model with random shooting | `cargo run -p jepa-world --example world_model_planning` |
| `model_registry` | Browse pretrained models (library) | `cargo run -p jepa-compat --example model_registry` |

## Build & Test

```bash
# Build everything
cargo build --workspace

# Run all tests
cargo test --workspace

# Lint
cargo clippy --workspace --all-targets -- -D warnings

# Format check
cargo fmt -- --check

# Generate docs
cargo doc --workspace --no-deps --open

# Run differential parity tests
scripts/run_parity_suite.sh

# Target a single crate
cargo test -p jepa-core
cargo test -p jepa-vision
cargo test -p jepa-compat
```

### Extended quality gates

```bash
# Code coverage (requires cargo-llvm-cov)
cargo llvm-cov --workspace --all-features --fail-under-lines 80

# Fuzz testing (requires cargo-fuzz)
(cd fuzz && cargo fuzz run masking -- -runs=1000)

# Benchmark smoke test
cargo bench --workspace --no-run
```

## Project Status

**Alpha** — suitable for research, experimentation, and extension.

### What works

- Complete I-JEPA and V-JEPA architectures with strict masked-encoder paths
- CLI with 6 commands (`models`, `inspect`, `checkpoint`, `train`, `encode`, `tui`)
- Interactive TUI dashboard with 6 tabs (Dashboard, Models, Training, Inference, Checkpoint, About)
- SafeTensors checkpoint loading with automatic key remapping
- ONNX metadata inspection and initializer loading
- Pretrained model registry with download URLs
- Differential parity tests against 3 checked-in strict image fixtures
- Comprehensive test suite (365 tests), property-based testing, fuzz targets
- All standard ViT configs: ViT-S/16, ViT-B/16, ViT-L/16, ViT-H/14, ViT-H/16, ViT-G/16

### Known limitations

- The generic trainer slices tokens after encoder forward; strict pre-attention masking is available via `IJepa::forward_step_strict` and `VJepa::forward_step_strict`
- ONNX support covers metadata inspection and initializer loading only, not graph execution
- Differential parity runs in CI for strict image fixtures; broader video parity is pending
- First-time crates.io release must be published in dependency order because the workspace crates depend on each other by version

## JEPA Variants: What We Implement

The JEPA family has grown across several papers. Here is exactly what jepa-rs implements and how each component maps to a specific paper and reference codebase.

### I-JEPA (Image)

| | |
|---|---|
| **Paper** | [Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture]https://arxiv.org/abs/2301.08243 (Assran et al., CVPR 2023) |
| **Reference code** | [`facebookresearch/ijepa`]https://github.com/facebookresearch/ijepa (archived) |
| **jepa-rs struct** | `IJepa<B>` in `jepa-vision` ([`crates/jepa-vision/src/image.rs`]crates/jepa-vision/src/image.rs) |
| **What it does** | Self-supervised image representation learning. A ViT context-encoder sees only visible patches; a lightweight predictor predicts representations of masked target patches. The target-encoder is an EMA copy of the context-encoder. |
| **Masking** | `BlockMasking` — contiguous rectangular blocks on the 2D patch grid. |
| **Faithful path** | `IJepa::forward_step_strict` — filters tokens *before* encoder self-attention (matches the paper). |
| **Approximate path** | `JepaComponents::forward_step` in `jepa-train` — encodes full input then slices (post-encoder masking; cheaper but not faithful). |
| **Parity status** | 3 checked-in strict image fixtures verified in CI. |

### V-JEPA (Video)

| | |
|---|---|
| **Paper** | [Revisiting Feature Prediction for Learning Visual Representations from Video]https://arxiv.org/abs/2404.08471 (Bardes et al., 2024) |
| **Reference code** | [`facebookresearch/jepa`]https://github.com/facebookresearch/jepa |
| **jepa-rs struct** | `VJepa<B>` in `jepa-vision` ([`crates/jepa-vision/src/video.rs`]crates/jepa-vision/src/video.rs) |
| **What it does** | Extends I-JEPA to video. A ViT encoder processes 3D tubelets (space + time) with 3D RoPE. |
| **Masking** | `SpatiotemporalMasking` — contiguous 3D regions in the spatiotemporal grid. |
| **Faithful path** | `VJepa::forward_step_strict` — pre-attention masking. |
| **Parity status** | Implemented but strict video parity not yet proven (pending). |

### V-JEPA 2 features

| | |
|---|---|
| **Paper** | [V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning]https://arxiv.org/abs/2506.09985 (Bardes et al., 2025) |
| **Reference code** | [`facebookresearch/vjepa2`]https://github.com/facebookresearch/vjepa2 |
| **jepa-rs support** | Not a separate struct. The `VJepa<B>` struct can be configured with V-JEPA 2 features. |
| **What we take from V-JEPA 2** | **Cosine momentum schedule** for EMA — `CosineMomentumSchedule` in `jepa-core` (`Ema::with_cosine_schedule`). Momentum ramps from base (e.g. 0.996) to 1.0 over training. Also: `MultiBlockMasking` strategy, ViT-Giant/14 preset. |
| **What we don't implement** | The full V-JEPA 2 training recipe, attentive probing, or the planning/action heads from the paper. |

### Hierarchical JEPA (H-JEPA) — experimental

| | |
|---|---|
| **Paper** | Inspired by [A Path Towards Autonomous Machine Intelligence]https://openreview.net/pdf?id=BZ5a1r-kVsf (LeCun, 2022) — the original JEPA position paper describes hierarchical prediction as a long-term goal. No standalone H-JEPA paper exists yet. |
| **jepa-rs struct** | `HierarchicalJepa<B>` in `jepa-world` ([`crates/jepa-world/src/hierarchy.rs`]crates/jepa-world/src/hierarchy.rs) |
| **What it does** | Stacks multiple JEPA levels at different temporal strides (e.g. stride 2, 6, 24). Each level has its own encoder and predictor. This is **experimental** — no reference implementation exists. |

### Action-Conditioned World Model — experimental

| | |
|---|---|
| **Paper** | Draws from both the LeCun position paper and [V-JEPA 2]https://arxiv.org/abs/2506.09985 (planning component). |
| **jepa-rs structs** | `Action<B>`, `ActionConditionedPredictor<B>` trait, `RandomShootingPlanner` in `jepa-world` ([`crates/jepa-world/src/action.rs`]crates/jepa-world/src/action.rs, [`crates/jepa-world/src/planner.rs`]crates/jepa-world/src/planner.rs) |
| **What it does** | Predicts next-state representations given current state + action. Supports random-shooting (CEM) planning. This is **experimental**. |

### What about EB-JEPA?

[EB-JEPA](https://arxiv.org/abs/2602.03604) (Terver et al., 2026) is a separate lightweight Python library for energy-based JEPA. jepa-rs is **not** an implementation of EB-JEPA. We reference it for comparison only. The energy functions in `jepa-core` (L2, Cosine, SmoothL1) are standard loss formulations, not the EB-JEPA energy framework.

### Quick summary

| Variant | Paper | jepa-rs struct | Status |
|---------|-------|----------------|--------|
| I-JEPA | Assran et al. 2023 | `IJepa<B>` | Strict path implemented, parity verified |
| V-JEPA | Bardes et al. 2024 | `VJepa<B>` | Strict path implemented, parity pending |
| V-JEPA 2 | Bardes et al. 2025 | `VJepa<B>` + cosine EMA schedule | Select features only |
| H-JEPA | LeCun 2022 (position paper) | `HierarchicalJepa<B>` | Experimental, no reference impl |
| World model | LeCun 2022 + V-JEPA 2 | `ActionConditionedPredictor`, `RandomShootingPlanner` | Experimental |
| EB-JEPA | Terver et al. 2026 | **Not implemented** | Referenced for comparison only |

## References

### Papers

| Paper | Focus |
|-------|-------|
| [A Path Towards Autonomous Machine Intelligence]https://openreview.net/pdf?id=BZ5a1r-kVsf | JEPA position paper — hierarchical world models (LeCun, 2022) |
| [I-JEPA]https://arxiv.org/abs/2301.08243 | Self-supervised image learning with masked prediction in latent space (Assran et al., CVPR 2023) |
| [V-JEPA]https://arxiv.org/abs/2404.08471 | Extension to video with spatiotemporal masking (Bardes et al., 2024) |
| [V-JEPA 2]https://arxiv.org/abs/2506.09985 | Video understanding, prediction, and planning (Bardes et al., 2025) |
| [EB-JEPA]https://arxiv.org/abs/2602.03604 | Lightweight energy-based JEPA library — referenced for comparison (Terver et al., 2026) |

### Official reference implementations

| Repo | Models | Relationship to jepa-rs |
|------|--------|------------------------|
| [`facebookresearch/ijepa`]https://github.com/facebookresearch/ijepa | I-JEPA (archived) | Primary reference for `IJepa<B>` and key remapping |
| [`facebookresearch/jepa`]https://github.com/facebookresearch/jepa | V-JEPA | Primary reference for `VJepa<B>` |
| [`facebookresearch/vjepa2`]https://github.com/facebookresearch/vjepa2 | V-JEPA 2 | Reference for cosine EMA schedule, ViT-G config |
| [`facebookresearch/eb_jepa`]https://github.com/facebookresearch/eb_jepa | EB-JEPA tutorial | Not implemented — comparison only |

## Contributing

See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.

## License

MIT License. See [LICENSE](./LICENSE) for details.

---

<p align="center">
  <sub>Built with <a href="https://burn.dev">burn</a> and <a href="https://github.com/sonos/tract">tract</a></sub>
</p>