[Updated on 2019-02-14: add ULMFiT and GPT-2.] [Updated on 2020-02-29: add ALBERT.] [Updated on 2020-10-25: add RoBERTa.] [Updated on 2020-12-13: add T5.] [Updated on 2020-12-30: add GPT-3.] [Updated on 2021-11-13: add XLNet, BART and ELECTRA; Also updated the Summary section.]
Fig. 0. I guess they are Elmo & Bert? (Image source: here) We have seen amazing progress in NLP in 2018. Large-scale pre-trained language modes like OpenAI GPT and BERT have achieved great performance on a variety of language tasks using generic model architectures.

Lilian Weng is a machine learning researcher and writer who shares insights, research findings, and tutorials on machine learning, artificial intelligence, and data science. Through articles, blog posts, and research summaries, Lilian Weng explores topics such as deep learning, natural language processing, and reinforcement learning. Readers can learn about state-of-the-art algorithms, practical applications of machine learning, and trends shaping the field of AI.

Lil’Log

Large-scale pre-trained language modes like OpenAI GPT and BERT have achieved great performance on a variety of language tasks using generic model architectures. This simple and powerful approach in NLP does not require labeled data for pre-training, allowing us to experiment with increased training scale, up to our very limit.

Generalized Language Models