Expand description
§oxigaf-diffusion
Multi-view diffusion model inference for GAF.
Implements the full pipeline: CLIP image encoding → multi-view U-Net denoising with camera-conditioned cross-view attention → VAE decoding.
§Cargo Features
This crate supports the following feature flags:
-
default=["accelerate", "flash_attention"]: Default features for CPU-only inference with optimizations -
accelerate: Uses platform-native BLAS/LAPACK for CPU tensor operations- macOS: Accelerate framework
- Linux: OpenBLAS or Intel MKL
-
cuda(platform-specific): Enables NVIDIA GPU acceleration via CUDA- Requires CUDA toolkit installed
- Not available on macOS
-
metal(platform-specific): Enables Apple Silicon GPU acceleration via Metal- macOS only
- Optimized for M1/M2/M3 chips
-
flash_attention(enabled by default): Memory-efficient attention with O(N) complexity instead of O(N²)- Reduces memory usage by 2-4× for large images
- Tiled computation for better cache locality
-
mixed_precision(planned, not yet implemented): FP16/BF16 inference for reduced memory usage- Faster on GPUs with Tensor Cores
- Lower memory footprint
Example usage:
# In Cargo.toml
# For CPU-only with flash attention
oxigaf-diffusion = { version = "0.1", default-features = true }
# For Apple Silicon with Metal acceleration
oxigaf-diffusion = { version = "0.1", features = ["metal", "flash_attention"] }
# For NVIDIA GPU with CUDA
oxigaf-diffusion = { version = "0.1", features = ["cuda", "flash_attention"] }Re-exports§
pub use clip::ClipImageEncoder;pub use config::DiffusionConfig;pub use pipeline::MultiViewDiffusionPipeline;pub use pipeline::MultiViewOutput;pub use scheduler::DdimScheduler;pub use scheduler::PredictionType;pub use unet::MultiViewUNet;pub use upsampler::LatentUpsampler;pub use upsampler::UpsamplerMode;pub use vae::Vae;pub use flash_attention::flash_attention;pub use flash_attention::flash_attention_with_config;pub use flash_attention::FlashAttention;pub use flash_attention::FlashAttentionConfig;
Modules§
- attention
- Attention-based building blocks for multi-view diffusion.
- camera
- Camera-pose conditioning MLP.
- clip
- CLIP image encoder for reference-image conditioning.
- config
- Configuration for the multi-view diffusion pipeline.
- flash_
attention - Flash Attention: Memory-efficient attention with block-wise computation.
- pipeline
- Full multi-view diffusion pipeline.
- scheduler
- DDIM scheduler with v-prediction parameterisation.
- unet
- Multi-view U-Net with camera-conditioned cross-view attention.
- upsampler
- Latent upsampler for 32×32 → 64×64 latent upsampling.
- vae
- Variational Autoencoder (SD 2.1 compatible).
Enums§
- Diffusion
Error - Errors that can occur during diffusion model operations.
Type Aliases§
- Diffusion
Result - Result type for diffusion operations.