pub struct DiffusionConfig {Show 25 fields
pub num_views: usize,
pub guidance_scale: f64,
pub num_inference_steps: usize,
pub upsampler_steps: usize,
pub image_size: usize,
pub latent_size: usize,
pub latent_channels: usize,
pub unet_in_channels: usize,
pub unet_out_channels: usize,
pub cross_attention_dim: usize,
pub clip_embed_dim: usize,
pub time_embed_dim: usize,
pub base_channels: usize,
pub channel_mult: Vec<usize>,
pub layers_per_block: usize,
pub attention_head_dim: Vec<usize>,
pub transformer_layers_per_block: Vec<usize>,
pub norm_num_groups: usize,
pub norm_eps: f64,
pub camera_pose_dim: usize,
pub use_linear_projection: bool,
pub vae_scale_factor: f64,
pub use_flash_attention: bool,
pub flash_attention_block_size: usize,
pub upsampler_mode: Option<UpsamplerMode>,
}Expand description
Full configuration for the multi-view diffusion model.
Contains all hyperparameters for the diffusion pipeline, including U-Net architecture, attention settings, CFG parameters, and optional upsampling.
§Examples
use oxigaf_diffusion::DiffusionConfig;
// Use default configuration (256×256, guidance_scale=3.0)
let config = DiffusionConfig::default();
// Customize guidance scale for stronger conditioning
let mut config = DiffusionConfig::default();
config.guidance_scale = 7.5;
// Enable upsampling for 512×512 output
use oxigaf_diffusion::UpsamplerMode;
config.upsampler_mode = Some(UpsamplerMode::SdX2);Fields§
§num_views: usizeNumber of views to generate simultaneously (default: 4).
guidance_scale: f64Classifier-free guidance scale for IP-Adapter conditioning (default: 3.0).
Controls the strength of reference image conditioning. Must be >= 1.0. Higher values increase identity preservation but may reduce diversity.
- 1.0: No guidance (pure conditional)
- 3.0-7.5: Balanced (recommended)
- >10.0: Strong conditioning (may oversaturate)
num_inference_steps: usizeNumber of DDIM denoising steps (default: 50).
upsampler_steps: usizeNumber of latent upsampler denoising steps (default: 10).
image_size: usizeInput/output image resolution before upscaling (default: 256).
latent_size: usizeLatent spatial size (image_size / 8).
latent_channels: usizeNumber of latent channels produced by the VAE (default: 4).
unet_in_channels: usizeU-Net input channels: latent_channels + normal-map latent channels (default: 8).
unet_out_channels: usizeU-Net output channels (default: 4).
cross_attention_dim: usizeCross-attention dimension (SD 2.1 = 1024).
clip_embed_dim: usizeCLIP image embedding dimension (ViT-H/14 = 1280).
time_embed_dim: usizeTime embedding dimension (default: 1280).
base_channels: usizeBase channels for the U-Net (default: 320).
channel_mult: Vec<usize>Channel multipliers per U-Net stage.
layers_per_block: usizeLayers per block in the U-Net.
attention_head_dim: Vec<usize>Number of attention heads per head-dim for each stage.
transformer_layers_per_block: Vec<usize>Number of transformer blocks per attention stage.
norm_num_groups: usizeGroup-norm number of groups (default: 32).
norm_eps: f64Group-norm epsilon.
camera_pose_dim: usizeCamera pose input dimension (4×3 flattened = 12).
use_linear_projection: boolWhether to use linear projection in spatial transformer.
vae_scale_factor: f64VAE scaling factor for latent space.
use_flash_attention: boolWhether to use flash attention for memory-efficient O(N) attention. When enabled, uses block-wise computation with online softmax. Falls back to standard O(N^2) attention when disabled. Default: true (when feature is enabled).
flash_attention_block_size: usizeBlock size for flash attention tiled computation. Larger blocks use more memory but may be faster due to better cache utilization. Default: 64.
upsampler_mode: Option<UpsamplerMode>Upsampler mode for latent upsampling (32×32 → 64×64).
- None: No upsampling, output is 256×256
- Some(SdX2): Use sd-x2-latent-upscaler, output is 512×512
- Some(BilinearVae): Use bilinear upsampling, output is 512×512
Default: None (256×256 output).
Implementations§
Source§impl DiffusionConfig
impl DiffusionConfig
Sourcepub fn stage_channels(&self, stage: usize) -> usize
pub fn stage_channels(&self, stage: usize) -> usize
Channel count for a given U-Net stage index.
Sourcepub fn num_stages(&self) -> usize
pub fn num_stages(&self) -> usize
Total number of U-Net stages.
Trait Implementations§
Source§impl Clone for DiffusionConfig
impl Clone for DiffusionConfig
Source§fn clone(&self) -> DiffusionConfig
fn clone(&self) -> DiffusionConfig
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for DiffusionConfig
impl Debug for DiffusionConfig
Auto Trait Implementations§
impl Freeze for DiffusionConfig
impl RefUnwindSafe for DiffusionConfig
impl Send for DiffusionConfig
impl Sync for DiffusionConfig
impl Unpin for DiffusionConfig
impl UnsafeUnpin for DiffusionConfig
impl UnwindSafe for DiffusionConfig
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more