Expand description
2-D positional encodings for Vision Transformers.
Provides:
pos_2d_sincos: deterministic 2-D sinusoidal positional encoding as used in MAE / BEiT / DeiT. The first half ofdimencodes the row (H) axis; the second half encodes the column (W) axis.LearnablePosEmbed: a simple learned position table.add_pos_embed: in-place addition of position embeddings to tokens.
Structs§
- Learnable
PosEmbed - Learnable position embedding table:
[n_positions, embed_dim].
Functions§
- add_
pos_ embed - Add positional embeddings to a token sequence in-place.
- pos_
2d_ sincos - Compute a 2-D sinusoidal positional encoding for a
grid_h × grid_wgrid.