Large-scale pre-trained language modes like OpenAI GPT and BERT have achieved great performance on a variety of language tasks using generic model architectures. This simple and powerful approach in NLP does not require labeled data for pre-training, allowing us to experiment with increased training scale, up to our very limit.

33m read timeFrom lilianweng.github.io
Post cover image
Table of contents
CoVe #ELMo #Cross-View Training #ULMFiT #GPT #BERT #ALBERT #GPT-2 #RoBERTa #T5 #GPT-3 #XLNet #BART #ELECTRA #Summary #Metric: Perplexity #Common Tasks and Datasets #Reference #

Sort: