# axonml-vision
<p align="center">
<img src="https://raw.githubusercontent.com/AutomataNexus/AxonML/main/AxonML-logo.png" alt="AxonML Logo" width="200"/>
</p>
<p align="center">
<a href="https://opensource.org/licenses/Apache-2.0"><img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" alt="License"></a>
<a href="https://www.rust-lang.org/"><img src="https://img.shields.io/badge/Rust-1.75%2B-orange.svg" alt="Rust"></a>
<a href="https://crates.io/crates/axonml-vision"><img src="https://img.shields.io/badge/crates.io-0.6.1-green.svg" alt="Version"></a>
<a href="https://github.com/axonml/axonml"><img src="https://img.shields.io/badge/part_of-AxonML-purple.svg" alt="Part of AxonML"></a>
</p>
## Overview
**axonml-vision** provides the computer-vision stack for AxonML: image-specific transforms, loaders for classical vision datasets (MNIST, Fashion-MNIST, CIFAR-10/100) plus synthetic variants, and a wide catalog of neural-network architectures covering classification, detection, dense prediction, anomaly detection, VQA, 3D reconstruction, and biometrics. A pretrained-weights hub with on-disk caching rounds it out.
## Features
- **Image transforms** — `Resize`, `CenterCrop`, `RandomHorizontalFlip`, `RandomVerticalFlip`, `RandomRotation`, `ColorJitter`, `Grayscale`, `ImageNormalize` (presets: `imagenet`, `mnist`, `cifar10`), `Pad`, `ToTensorImage`.
- **Datasets** — real-file loaders for `MNIST`, `FashionMNIST`, `CIFAR10`, `CIFAR100`, and synthetic variants `SyntheticMNIST` / `SyntheticCIFAR` for fast tests.
- **Classification** — `LeNet`, `MLP`, `SimpleCNN`, `ResNet` (`resnet18`, `resnet34`, `BasicBlock`, `Bottleneck`), `VGG` (`vgg11`, `vgg13`, `vgg16`, `vgg19` with optional batch-norm), `VisionTransformer` (`vit_base`, `vit_large`).
- **Detection** — `BlazeFace` (dual-scale 128×128 face detector, 896 anchors), `RetinaFace` (ResNet34 backbone + multi-level FPN head), `DETR` (transformer-based, `small` preset), `NanoDet` (mobile-class detector), `Helios` (YOLO-family detector with 5 sizes Nano/Small/Medium/Large/XLarge and loss utilities `HeliosLoss`, `CIoULoss`, `TaskAlignedAssigner`).
- **Novel detection architectures** — `Nexus` (predictive dual-pathway detector with multi-scale fusion, object-memory bank, and predictive-coding surprise gating) and `Phantom` (temporal event-driven face detection with pseudo-event encoder and GRU-based face-state tracker). `NightVision` (multi-domain infrared detector with thermal stem, CSP backbone, thermal FPN, YOLOX-style decoupled heads, `ThermalDomain` domain tagging).
- **Dense prediction** — `DPT` (depth transformer, `small`/`base` presets) and `FastDepth` (mobile depth estimator).
- **Anomaly detection** — `PatchCore` and `StudentTeacher`, both with `default_rgb()` constructors.
- **Visual Question Answering** — `VQAModel` (`small` preset).
- **3D reconstruction** — `Aegis3D`: Fourier-feature SDF networks (`LocalSDF` + `GlobalSDF`), adaptive octree spatial indexing, differentiable sphere-tracing renderer, and marching-cubes mesh extraction.
- **FPN infrastructure** — shared `FPN` (feature pyramid network) used by multiple detectors.
- **Aegis biometric identity suite** — `AegisIdentity` orchestrator with `full` / `face_only` / `edge_minimal` constructors; modality models `MnemosyneIdentity` (face), `AriadneFingerprint`, `EchoSpeaker` (voice), `ArgusIris`, plus `ThemisFusion` (uncertainty-weighted fusion). Enrollment, verification, forensic verification, liveness, secure verification, identification. Companion losses: `AngularMarginLoss`, `CenterLoss`, `ContrastiveLoss`, `CrystallizationLoss`, `DiversityRegularization`, `EchoLoss`, `ArgusLoss`, `LivenessLoss`, `ThemisLoss`.
- **Model Hub** — `download_weights`, `load_state_dict`, `list_models`, `model_info`, `is_cached`, `model_registry`, with on-disk caching.
- **CUDA feature** — optional `cuda` cargo feature propagates to core/tensor/autograd/nn.
## Modules
| Module | Description |
|--------|-------------|
| `transforms` | Image data-augmentation and preprocessing transforms |
| `datasets` | MNIST, Fashion-MNIST, CIFAR-10/100 loaders plus synthetic variants |
| `models` | All neural network architectures (see below) |
| `models::biometric` | Aegis biometric suite (Mnemosyne, Ariadne, Echo, Argus, Themis + identity orchestrator) |
| `models::helios` | YOLO-style object detector with 5 size variants |
| `models::nexus` | Predictive dual-pathway detector with object memory |
| `models::phantom` | Temporal event-driven face detection |
| `models::nightvision` | Multi-domain infrared detection |
| `models::aegis3d` | Octree-adaptive neural implicit surface reconstruction |
| `camera` | Camera I/O utilities |
| `edge` | Edge-deployment helpers |
| `hub` | Pretrained model weights management |
| `image_io` | Image load/save helpers |
| `losses` | Vision-specific loss functions |
| `ops` | Low-level vision ops |
| `training` | Training utilities |
## Usage
Add to your `Cargo.toml`:
```toml
[dependencies]
axonml-vision = "0.6.1"
```
### Loading Datasets
```rust
use axonml_vision::prelude::*;
// Synthetic MNIST for fast tests
let train_data = SyntheticMNIST::train();
let test_data = SyntheticMNIST::test();
// Synthetic CIFAR-10
let cifar = SyntheticCIFAR::small();
let (image, label) = train_data.get(0).unwrap();
assert_eq!(image.shape(), &[1, 28, 28]); // MNIST: 1 channel, 28x28
assert_eq!(label.shape(), &[10]); // One-hot encoded
```
### Image Transforms
```rust
use axonml_vision::{Resize, CenterCrop, RandomHorizontalFlip, ImageNormalize};
use axonml_data::{Compose, Transform};
let transform = Compose::empty()
.add(Resize::new(256, 256))
.add(CenterCrop::new(224, 224))
.add(RandomHorizontalFlip::new())
.add(ImageNormalize::imagenet());
let output = transform.apply(&image);
assert_eq!(output.shape(), &[3, 224, 224]);
```
### Normalization Presets
```rust
use axonml_vision::ImageNormalize;
let imagenet = ImageNormalize::imagenet(); // mean=[0.485,0.456,0.406] std=[0.229,0.224,0.225]
let mnist = ImageNormalize::mnist(); // mean=[0.1307] std=[0.3081]
let cifar10 = ImageNormalize::cifar10(); // mean=[0.4914,0.4822,0.4465] std=[0.2470,0.2435,0.2616]
```
### Classification Models
```rust
use axonml_vision::{LeNet, MLP, SimpleCNN};
use axonml_vision::models::{resnet18, resnet34, vgg16, vit_base};
use axonml_nn::Module;
use axonml_autograd::Variable;
let lenet = LeNet::new(); // [N, 1, 28, 28] -> [N, 10]
let mlp = MLP::for_mnist(); // 784 -> 256 -> 128 -> 10
let rn18 = resnet18(1000); // ImageNet classes
let vgg = vgg16(1000, /*batch_norm=*/ true);
let vit = vit_base(1000);
```
### Detection Models
```rust
use axonml_vision::models::{BlazeFace, RetinaFace, NanoDet, DETR};
use axonml_vision::models::helios::{Helios, HeliosSize};
let blaze = BlazeFace::new(); // dual-scale 128x128 face detector
let retina = RetinaFace::new(); // ResNet34 backbone
let nanodet = NanoDet::new(/*num_classes=*/ 80);
let detr = DETR::small(10);
let helios = Helios::small(80); // also: new(config), large(num_classes)
```
### Novel Detection Architectures
```rust
use axonml_vision::models::{Nexus, Phantom, NightVision, NightVisionConfig};
let nexus = Nexus::default(); // predictive dual-pathway + object memory
let phantom = Phantom::default(); // event-driven temporal face detector
let night = NightVision::new(NightVisionConfig::default());
```
### Dense Prediction & Anomaly / VQA
```rust
use axonml_vision::models::{DPT, FastDepth, PatchCore, StudentTeacher, VQAModel};
let dpt = DPT::small(); // transformer depth
let fast = FastDepth::new(); // mobile depth
let patch = PatchCore::default_rgb(); // anomaly detection, 256-d features
let st = StudentTeacher::default_rgb(); // student-teacher anomaly
let vqa = VQAModel::small(100, 50); // vocab=100, answers=50
```
### Aegis3D — 3D Reconstruction
```rust
use axonml_vision::models::{Aegis3D, aegis3d::{Aegis3DConfig, AABB, SphereTracingConfig}};
let aegis3d = Aegis3D::new(); // Fourier-feature SDF + adaptive octree + sphere tracing + marching cubes
```
### Full Training Pipeline
```rust
use axonml_vision::prelude::*;
use axonml_data::DataLoader;
use axonml_optim::{Adam, Optimizer};
use axonml_nn::{CrossEntropyLoss, Module};
let dataset = SyntheticMNIST::train();
let loader = DataLoader::new(dataset, 32).shuffle(true);
let model = LeNet::new();
let mut optim = Adam::new(model.parameters(), 0.001);
let loss_fn = CrossEntropyLoss::new();
for batch in loader.iter() {
let input = Variable::new(batch.data, true);
let target = batch.targets;
optim.zero_grad();
let output = model.forward(&input);
let loss = loss_fn.compute(&output, &target);
loss.backward();
optim.step();
}
```
### Model Hub for Pretrained Weights
```rust
use axonml_vision::hub::{
download_weights, load_state_dict, list_models, model_info, is_cached, model_registry,
};
for model in list_models() {
println!("{}: {} classes, {:.1} MB", model.name, model.num_classes,
model.size_bytes as f64 / 1_000_000.0);
}
if let Some(info) = model_info("resnet18") {
println!("Top-1 accuracy: {:.2}%", info.accuracy);
}
if !is_cached("resnet18") {
let path = download_weights("resnet18", /*force=*/ false)?;
let state_dict = load_state_dict(&path)?;
// model.load_state_dict(state_dict);
}
```
### Aegis Identity — Biometric Framework
Unified biometric identity system with 5 modality-specific architectures plus `ThemisFusion` for uncertainty-weighted evidence fusion. Designed for edge deployment (sub-2 MB total in `edge_minimal` configuration).
```rust
use axonml_vision::models::biometric::{
AegisIdentity, BiometricEvidence, BiometricModality,
};
use axonml_autograd::Variable;
use axonml_tensor::Tensor;
// Full multimodal system — face + fingerprint + voice + iris
let mut aegis = AegisIdentity::full();
// Or smaller deployments:
let face_only = AegisIdentity::face_only();
let edge = AegisIdentity::edge_minimal();
// Enroll
let face = Variable::new(Tensor::randn(&[1, 3, 64, 64]), false);
let evidence = BiometricEvidence::new().with_face(face);
let enrolled = aegis.enroll(1001, &evidence);
// Verify
let probe = BiometricEvidence::new()
.with_face(Variable::new(Tensor::randn(&[1, 3, 64, 64]), false));
let verification = aegis.verify(1001, &probe);
println!("match={}, score={:.3}, confidence={:.3}",
verification.is_match, verification.match_score, verification.confidence);
// Forensic verification with per-modality scores and cross-modal consistency
let (result, forensic) = aegis.verify_forensic(1001, &probe);
// Anti-spoofing liveness
let liveness = aegis.assess_liveness(&evidence);
// Quality -> liveness -> verification secure pipeline
let secure = aegis.secure_verify(1001, &evidence);
// 1:N identification
let ident = aegis.identify(&probe);
```
**Modality architectures:**
| Model | Modality | Novel idea |
|-------|----------|------------|
| `MnemosyneIdentity` | Face | Identity crystallizes via GRU attractor convergence |
| `AriadneFingerprint` | Fingerprint | Ridge event fields with Gabor wavelets |
| `EchoSpeaker` | Voice | Identity = unpredictable speech residuals |
| `ArgusIris` | Iris | Polar-native radial / angular Conv1d encoding (backed by `polar::polar_unwrap`) |
| `ThemisFusion` | Fusion | Belief propagation with uncertainty gating |
## Features flags
- `default = ["download"]` — enables `reqwest` for hub downloads.
- `cuda` — propagates CUDA support to `axonml-tensor`, `axonml-nn`, `axonml-autograd`, `axonml-core`.
## Tests
```bash
cargo test -p axonml-vision
```
## License
Licensed under either of:
- Apache License, Version 2.0 ([LICENSE-APACHE](../../LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License ([LICENSE-MIT](../../LICENSE-MIT) or http://opensource.org/licenses/MIT)
at your option.
---
_Last updated: 2026-04-16 (v0.6.1)_