Expand description
CLIP image encoder for reference-image conditioning.
Implements the ViT-H/14 CLIP image encoder used to extract per-image embeddings that condition the multi-view diffusion model via IP-adapter cross-attention.
Structs§
- Clip
Image Encoder - CLIP ViT image encoder.
- Clip
Vision Config - CLIP vision model configuration (ViT-H/14 defaults).
Functions§
- build_
clip_ encoder - Build a CLIP encoder from a DiffusionConfig with default ViT-H/14 settings.