Expand description

Transformer models.

Modules

ALBERT (Lan et al., 2020)
BERT (Devlin et al., 2018)
RoBERTa (Liu et al., 2018) and XLM-RoBERTa (Conneau et al., 2019).
Word embeddings with sinusoidal position embeddings.
SqueezeBERT (Iandola et al., 2020) + ALBERT (Lan et al., 2020)
SqueezeBERT (Iandola et al., 2020)

Structs

Hidden layer output and attention.

Enums

Output of a BERT layer.

Traits

Encoder networks.