Skip to main content

Module clip

Module clip 

Source
Expand description

CLIP image encoder for reference-image conditioning.

Implements the ViT-H/14 CLIP image encoder used to extract per-image embeddings that condition the multi-view diffusion model via IP-adapter cross-attention.

Structs§

ClipImageEncoder
CLIP ViT image encoder.
ClipVisionConfig
CLIP vision model configuration (ViT-H/14 defaults).

Functions§

build_clip_encoder
Build a CLIP encoder from a DiffusionConfig with default ViT-H/14 settings.