Module models
Source Expand description
§Torch implementation of language models
- albert
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (Lan et al.)
- bart
- BART (Lewis et al.)
- bert
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al.)
- deberta
- DeBERTa :Decoding-enhanced BERT with Disentangled Attention (He et al.)
- deberta_v2
- DeBERTa V2 (He et al.)
- distilbert
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (Sanh et al.)
- electra
- Electra: Pre-training Text Encoders as Discriminators Rather Than Generators (Clark et al.)
- fnet
- FNet, Mixing Tokens with Fourier Transforms (Lee-Thorp et al.)
- gpt2
- GPT2 (Radford et al.)
- gpt_j
- GPT-J
- gpt_neo
- GPT-Neo
- longformer
- Longformer: The Long-Document Transformer (Betalgy et al.)
- longt5
- LongT5 (Efficient Text-To-Text Transformer for Long Sequences)
- m2m_100
- M2M-100 (Fan et al.)
- marian
- Marian
- mbart
- MBart (Liu et al.)
- mobilebert
- MobileBERT (A Compact Task-agnostic BERT for Resource-Limited Devices)
- nllb
- openai_gpt
- GPT (Radford et al.)
- pegasus
- Pegasus (Zhang et al.)
- prophetnet
- ProphetNet (ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training)
- reformer
- Reformer: The Efficient Transformer (Kitaev et al.)
- roberta
- RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al.)
- t5
- T5 (Text-To-Text Transfer Transformer)
- xlnet
- XLNet (Generalized Autoregressive Pretraining for Language Understanding)