The development of modern Large Language Models (LLMs) has evolved from pre-training to include both pre-training and post-training methodologies. Four new state-of-the-art models—Alibaba’s Qwen 2, Apple’s Apple Intelligence Foundation Models, Google’s Gemma 2, and Meta AI’s Llama 3.1—exemplify various approaches to these training paradigms. They employ multi-stage pre-training methods and unique post-training strategies like supervised instruction fine-tuning, Direct Preference Optimization (DPO), and reinforcement learning with human feedback (RLHF). Each model highlights different techniques and focuses on data quality and synthetic data utilization to enhance performance.
Table of contents
1.1 Qwen 2 Overview1.2 Qwen 2 Pre-training1.3 Qwen 2 Post-training1.4 Conclusion2.1 AFM Overview2.2 AFM Pre-training2.3 AFM Post-training2.4 Conclusion3.1 Gemma 2 Overview3.2 Gemma 2 Pre-training3.3 Gemma 2 Post-training3.4 Conclusion4.1 Llama 3.1 Overview4.2 Llama 3.1 Pre-training4.3 Llama 3.1 Post-training4.4 ConclusionSort: