Module data2vec

Expand description

data2vec — Baevski et al. 2022, ICML.

Unified self-supervised learning via teacher-student masked prediction:

A teacher network (EMA of the student) encodes the full, unmasked input and produces target representations.
The student encoder receives the masked input and predicts the teacher’s representations at masked positions.
The loss is the smooth-L1 (Huber) divergence between L2-normalised student predictions and L2-normalised teacher targets, summed only over masked tokens.

 θ_teacher ← m · θ_teacher + (1−m) · θ_student       [EMA update]
 target_j  ← target_j / (‖target[:,j]‖₂ + ε)         [per-dim batch norm]
 L          = mean huber(student_pred − target, β)     [masked positions only]

Reference: “data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language”, Baevski et al., ICML 2022.

Structs§

Data2VecConfig: Hyper-parameters for the data2vec training objective.
Data2VecResult: Output of a single data2vec loss computation.
Data2VecState: Mutable state that tracks the teacher EMA parameter vector and training step.

Functions§

data2vec_batch_loss: Compute the mean data2vec loss over a batch of samples.
data2vec_loss: Compute the data2vec loss for a single sample.
data2vec_mask: Generate a boolean mask of length n_tokens with exactly floor(n_tokens × mask_ratio) positions set to true (= masked).
huber_loss: Per-element Huber (smooth-L1) loss, averaged over all elements.
normalize_teacher_targets: Normalise teacher representations along the batch dimension in-place.

Module data2vec

Module data2vec Copy item path

Structs§

Functions§

Module data2vec