oxicuda-vision
Vision Transformer & CLIP primitives for OxiCUDA -- ViT patch embedding, multi-head self-attention, CLIP contrastive learning, FPN, RoI Align, and DETR decoder, all in pure Rust.
Part of the OxiCUDA project. See the workspace README for the full crate map.
Overview
oxicuda-vision provides the architectural pieces of modern vision deep
learning: a strided Conv2D PatchEmbed, sinusoidal and learnable positional
embeddings, a complete ViT block (pre-norm MHSA + MLP) plus encoder stack
and full ViTModel, a CLIP vision encoder + projection head + InfoNCE
contrastive loss, geometric / photometric / normalisation image
augmentations, a Feature Pyramid Network with lateral 1x1 convolutions and
top-down pathway, and a DETR-style decoder with roi_align and
bipartite_match primitives.
All forward passes operate on flat row-major Vec<f32> tensors so the same
code drives CPU unit tests, PTX kernel verification, and CPU-only
deployments. PTX kernels (patch_embed, bilinear_interp,
contrastive_loss, roi_align, image_normalize, adaptive_avg_pool,
focal_loss) are emitted for SM 7.5 through SM 12.0. The only crate
dependency is thiserror.
Modules
| Module | Description |
|---|---|
error |
VisionError / VisionResult |
handle |
VisionHandle, SmVersion, LcgRng |
patch_embed |
PatchEmbed, PatchEmbedConfig, LearnablePosEmbed, pos_2d_sincos, add_pos_embed, prepend_cls |
vit |
ViTBlock, ViTEncoder, ViTModel, ViTConfig::tiny() |
clip |
ClipVisionEncoder, ClipVisionConfig, ProjectionHead, info_nce_loss |
augment |
AugOp, Pipeline; Resize, RandomCrop, HorizontalFlip, ImageNet normalize |
fpn |
FeatureMap, FpnConfig, Fpn, LateralConv1x1 |
detection |
roi_align, bipartite_match, DetrDecoder, DetrConfig::tiny() |
ptx_kernels |
PTX for the seven kernels listed above |
Quick Start
use *;
let mut rng = new;
// 32x32 RGB image, patch_size = 4, embed_dim = 16 -> 64 patch tokens.
let cfg = new?;
let pe = new;
let image = vec!;
let tokens = pe.forward?;
assert_eq!;
// Tiny ViT classifier (10-way).
let model = new?;
let logits = model.forward?;
assert_eq!;
# Ok::
Status
| Item | Value |
|---|---|
| Version | 0.2.0 |
| Release date | 2026-06-16 |
| Default features | Pure Rust (thiserror only) |
unwrap() |
0 in production code |
License
Apache-2.0 -- (C) 2026 COOLJAPAN OU (Team KitaSan)