oxicuda-vision 0.2.0

Vision Transformer & CLIP primitives for OxiCUDA: ViT patch embedding, multi-head self-attention, CLIP contrastive learning, FPN, RoI align, DETR decoder — pure Rust, zero CUDA SDK dependency.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
//! Image segmentation models.
//!
//! Provides:
//! - **`sam`**: a compact, faithful CPU reference of the *Segment Anything
//!   Model* (Kirillov et al. 2023) — ViT image encoder, a point/box/mask prompt
//!   encoder with random-Fourier positional embeddings, and a two-way
//!   transformer mask decoder predicting masks plus IoU quality scores.

pub mod sam;

pub use sam::{
    ImageEncoder, MaskDecoder, MaskPrediction, MultiHeadAttention, PositionEmbeddingRandom,
    PromptEncoder, Sam, SamConfig, TwoWayAttentionBlock, TwoWayBlockOutput, TwoWayTransformer,
};