Skip to main content

Module pos_embed

oxicuda_vision::patch_embed

Module pos_embed

Expand description

2-D positional encodings for Vision Transformers.

Provides:

pos_2d_sincos: deterministic 2-D sinusoidal positional encoding as used in MAE / BEiT / DeiT. The first half of dim encodes the row (H) axis; the second half encodes the column (W) axis.
LearnablePosEmbed: a simple learned position table.
add_pos_embed: in-place addition of position embeddings to tokens.

Structs§

LearnablePosEmbed: Learnable position embedding table: [n_positions, embed_dim].

Functions§

add_pos_embed: Add positional embeddings to a token sequence in-place.
pos_2d_sincos: Compute a 2-D sinusoidal positional encoding for a grid_h × grid_w grid.