Large-scale pre-trained language modes like OpenAI GPT and BERT have achieved great performance on a variety of language tasks using generic model architectures. This simple and powerful approach in NLP does not require labeled data for pre-training, allowing us to experiment with increased training scale, up to our very limit.
Table of contents
CoVe #ELMo #Cross-View Training #ULMFiT #GPT #BERT #ALBERT #GPT-2 #RoBERTa #T5 #GPT-3 #XLNet #BART #ELECTRA #Summary #Metric: Perplexity #Common Tasks and Datasets #Reference #Sort: