Crate oxigaf_diffusion

Expand description

§oxigaf-diffusion

Multi-view diffusion model inference for GAF.

Implements the full pipeline: CLIP image encoding → multi-view U-Net denoising with camera-conditioned cross-view attention → VAE decoding.

§Cargo Features

This crate supports the following feature flags:

default = ["accelerate", "flash_attention"]: Default features for CPU-only inference with optimizations
accelerate: Uses platform-native BLAS/LAPACK for CPU tensor operations
- macOS: Accelerate framework
- Linux: OpenBLAS or Intel MKL
cuda (platform-specific): Enables NVIDIA GPU acceleration via CUDA
- Requires CUDA toolkit installed
- Not available on macOS
metal (platform-specific): Enables Apple Silicon GPU acceleration via Metal
- macOS only
- Optimized for M1/M2/M3 chips
flash_attention (enabled by default): Memory-efficient attention with O(N) complexity instead of O(N²)
- Reduces memory usage by 2-4× for large images
- Tiled computation for better cache locality
mixed_precision (planned, not yet implemented): FP16/BF16 inference for reduced memory usage
- Faster on GPUs with Tensor Cores
- Lower memory footprint

Example usage:

# In Cargo.toml
# For CPU-only with flash attention
oxigaf-diffusion = { version = "0.1", default-features = true }

# For Apple Silicon with Metal acceleration
oxigaf-diffusion = { version = "0.1", features = ["metal", "flash_attention"] }

# For NVIDIA GPU with CUDA
oxigaf-diffusion = { version = "0.1", features = ["cuda", "flash_attention"] }

Re-exports§

pub use clip::ClipImageEncoder;
pub use config::DiffusionConfig;
pub use pipeline::MultiViewDiffusionPipeline;
pub use pipeline::MultiViewOutput;
pub use scheduler::DdimScheduler;
pub use scheduler::PredictionType;
pub use unet::MultiViewUNet;
pub use upsampler::LatentUpsampler;
pub use upsampler::UpsamplerMode;
pub use vae::Vae;
pub use flash_attention::flash_attention;
pub use flash_attention::flash_attention_with_config;
pub use flash_attention::FlashAttention;
pub use flash_attention::FlashAttentionConfig;

Modules§

attention: Attention-based building blocks for multi-view diffusion.
camera: Camera-pose conditioning MLP.
clip: CLIP image encoder for reference-image conditioning.
config: Configuration for the multi-view diffusion pipeline.
flash_attention: Flash Attention: Memory-efficient attention with block-wise computation.
pipeline: Full multi-view diffusion pipeline.
scheduler: DDIM scheduler with v-prediction parameterisation.
unet: Multi-view U-Net with camera-conditioned cross-view attention.
upsampler: Latent upsampler for 32×32 → 64×64 latent upsampling.
vae: Variational Autoencoder (SD 2.1 compatible).

Enums§

DiffusionError: Errors that can occur during diffusion model operations.

Type Aliases§

DiffusionResult: Result type for diffusion operations.

Crate oxigaf_diffusion

Crate oxigaf_diffusion Copy item path

§oxigaf-diffusion

§Cargo Features

Re-exports§

Modules§

Enums§

Type Aliases§

Crate oxigaf_diffusion