Skip to main content

Module pos_embed

Module pos_embed 

Source
Expand description

2-D positional encodings for Vision Transformers.

Provides:

  • pos_2d_sincos: deterministic 2-D sinusoidal positional encoding as used in MAE / BEiT / DeiT. The first half of dim encodes the row (H) axis; the second half encodes the column (W) axis.
  • LearnablePosEmbed: a simple learned position table.
  • add_pos_embed: in-place addition of position embeddings to tokens.

Structs§

LearnablePosEmbed
Learnable position embedding table: [n_positions, embed_dim].

Functions§

add_pos_embed
Add positional embeddings to a token sequence in-place.
pos_2d_sincos
Compute a 2-D sinusoidal positional encoding for a grid_h × grid_w grid.