Skip to main content

Crate oxigaf_diffusion

Crate oxigaf_diffusion 

Source
Expand description

§oxigaf-diffusion

Multi-view diffusion model inference for GAF.

Implements the full pipeline: CLIP image encoding → multi-view U-Net denoising with camera-conditioned cross-view attention → VAE decoding.

§Cargo Features

This crate supports the following feature flags:

  • default = ["accelerate", "flash_attention"]: Default features for CPU-only inference with optimizations

  • accelerate: Uses platform-native BLAS/LAPACK for CPU tensor operations

    • macOS: Accelerate framework
    • Linux: OpenBLAS or Intel MKL
  • cuda (platform-specific): Enables NVIDIA GPU acceleration via CUDA

    • Requires CUDA toolkit installed
    • Not available on macOS
  • metal (platform-specific): Enables Apple Silicon GPU acceleration via Metal

    • macOS only
    • Optimized for M1/M2/M3 chips
  • flash_attention (enabled by default): Memory-efficient attention with O(N) complexity instead of O(N²)

    • Reduces memory usage by 2-4× for large images
    • Tiled computation for better cache locality
  • mixed_precision (planned, not yet implemented): FP16/BF16 inference for reduced memory usage

    • Faster on GPUs with Tensor Cores
    • Lower memory footprint

Example usage:

# In Cargo.toml
# For CPU-only with flash attention
oxigaf-diffusion = { version = "0.1", default-features = true }

# For Apple Silicon with Metal acceleration
oxigaf-diffusion = { version = "0.1", features = ["metal", "flash_attention"] }

# For NVIDIA GPU with CUDA
oxigaf-diffusion = { version = "0.1", features = ["cuda", "flash_attention"] }

Re-exports§

pub use clip::ClipImageEncoder;
pub use config::DiffusionConfig;
pub use pipeline::MultiViewDiffusionPipeline;
pub use pipeline::MultiViewOutput;
pub use scheduler::DdimScheduler;
pub use scheduler::PredictionType;
pub use unet::MultiViewUNet;
pub use upsampler::LatentUpsampler;
pub use upsampler::UpsamplerMode;
pub use vae::Vae;
pub use flash_attention::flash_attention;
pub use flash_attention::flash_attention_with_config;
pub use flash_attention::FlashAttention;
pub use flash_attention::FlashAttentionConfig;

Modules§

attention
Attention-based building blocks for multi-view diffusion.
camera
Camera-pose conditioning MLP.
clip
CLIP image encoder for reference-image conditioning.
config
Configuration for the multi-view diffusion pipeline.
flash_attention
Flash Attention: Memory-efficient attention with block-wise computation.
pipeline
Full multi-view diffusion pipeline.
scheduler
DDIM scheduler with v-prediction parameterisation.
unet
Multi-view U-Net with camera-conditioned cross-view attention.
upsampler
Latent upsampler for 32×32 → 64×64 latent upsampling.
vae
Variational Autoencoder (SD 2.1 compatible).

Enums§

DiffusionError
Errors that can occur during diffusion model operations.

Type Aliases§

DiffusionResult
Result type for diffusion operations.