pub struct MultiViewTransformerBlock { /* private fields */ }Expand description
A transformer block with multi-view cross-attention support.
Each block contains:
- Self-attention (within each view)
- Cross-view attention (across all N views)
- Text/prompt cross-attention
- IP cross-attention (reference image CLIP embedding)
- Feed-forward network
Implementations§
Source§impl MultiViewTransformerBlock
impl MultiViewTransformerBlock
Sourcepub fn new(
vs: VarBuilder<'_>,
dim: usize,
n_heads: usize,
d_head: usize,
context_dim: usize,
ip_dim: usize,
num_views: usize,
) -> Result<Self>
pub fn new( vs: VarBuilder<'_>, dim: usize, n_heads: usize, d_head: usize, context_dim: usize, ip_dim: usize, num_views: usize, ) -> Result<Self>
Create a new multi-view transformer block with standard attention.
Sourcepub fn new_with_flash(
vs: VarBuilder<'_>,
dim: usize,
n_heads: usize,
d_head: usize,
context_dim: usize,
ip_dim: usize,
num_views: usize,
use_flash_attention: bool,
flash_block_size: usize,
) -> Result<Self>
pub fn new_with_flash( vs: VarBuilder<'_>, dim: usize, n_heads: usize, d_head: usize, context_dim: usize, ip_dim: usize, num_views: usize, use_flash_attention: bool, flash_block_size: usize, ) -> Result<Self>
Create a new multi-view transformer block with optional flash attention.
§Arguments
vs- Variable builder for weight initializationdim- Hidden dimensionn_heads- Number of attention headsd_head- Dimension per headcontext_dim- Text cross-attention context dimensionip_dim- IP-adapter context dimensionnum_views- Number of views for cross-view attentionuse_flash_attention- Whether to use flash attentionflash_block_size- Block size for flash attention tiling
Sourcepub fn forward(
&self,
xs: &Tensor,
context: Option<&Tensor>,
ip_tokens: Option<&Tensor>,
) -> Result<Tensor>
pub fn forward( &self, xs: &Tensor, context: Option<&Tensor>, ip_tokens: Option<&Tensor>, ) -> Result<Tensor>
Forward pass.
xs:(B*num_views, seq_len, dim)— spatial tokens for all views (batched)context:(B*num_views, ctx_len, context_dim)— text encoder hidden statesip_tokens:(B*num_views, ip_len, ip_dim)— CLIP image embedding tokens
Trait Implementations§
Auto Trait Implementations§
impl Freeze for MultiViewTransformerBlock
impl !RefUnwindSafe for MultiViewTransformerBlock
impl Send for MultiViewTransformerBlock
impl Sync for MultiViewTransformerBlock
impl Unpin for MultiViewTransformerBlock
impl UnsafeUnpin for MultiViewTransformerBlock
impl !UnwindSafe for MultiViewTransformerBlock
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more