Best of Deep Learning — 2025

1
Video
Fireship·1y
OpenAI o3 tries to curb stomp DeepSeek...
Recent restrictions have seen the banning of Deep Seek by countries like Italy, the US, Australia, and Taiwan. Meanwhile, OpenAI has launched the new 03 Mini model and a Deep Research feature for Pro users, aiming to remain competitive. These developments are part of a broader trend in the AI landscape, with open-source solutions making rapid progress. Despite corporate efforts, some AI tools face performance issues, and Google's Gemini has similar features to OpenAI's new offerings.
998
27
2
Article
Machine Learning Mastery·1y
The Roadmap for Mastering Machine Learning in 2025
Machine learning (ML) is integral to many sectors, making it a valuable skill by 2025. This guide offers a step-by-step roadmap for mastering ML, starting with prerequisites in mathematics and programming, followed by core ML concepts, deep learning, and specialization in fields like computer vision or NLP. It also covers model deployment and building a portfolio to showcase projects. The emphasis is on practical learning through projects and continuous skill enhancement.
531
2
3
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
10 MCP, AI Agents, and RAG projects for AI Engineers
Explore 10 AI-focused projects including building an MCP-powered Agentic RAG, a multi-agent book writer, and a RAG system that understands audio content. Learn how to build and fine-tune AI models like DeepSeek-R1 and create applications using open-source tools like Llama 4 and Colpali.
473
3
4
Article
Sebastian Raschka·50w
Coding LLMs from the Ground Up: A Complete Course
Sebastian Raschka shares a comprehensive video course series on building Large Language Models from scratch using Python and PyTorch. The course covers seven key areas: environment setup, text data preprocessing and tokenization, attention mechanisms implementation, LLM architecture coding, pretraining on unlabeled data, classification fine-tuning, and instruction fine-tuning. The content serves as supplementary material to his book 'Build a Large Language Model (From Scratch)' and emphasizes hands-on learning through implementation rather than using pre-built frameworks.
420
5
5
Article
Data Engineer Things·1y
10 minutes are all you need to understand how Transformers work in LLM
Understanding how transformers work in large language models (LLMs) can be achieved quickly by breaking down the steps involved in the process. Starting from tokenization, where input data is converted into tokens, these tokens are then embedded into numerical representations understood by the model. These embeddings are processed through multiple transformer layers that use attention mechanisms to determine the importance of each token in relation to others. Finally, the processed data is projected back onto the vocabulary to predict the next token in a sequence. This foundational knowledge helps in exploring further intricacies of models like GPT-2.
270
4
6
Article
ByteByteGo·49w
EP167: Top 20 AI Concepts You Should Know
A comprehensive overview of 20 essential AI concepts including machine learning, deep learning, neural networks, NLP, computer vision, and transformers. Also covers the AI application stack for building RAG applications, featuring components like large language models, frameworks, vector databases, data extraction tools, and text embeddings. Additionally includes insights into Shopify's tech stack architecture and job opportunities in AI and software engineering.
204
1
7
Article
Sebastian Raschka·21w
The State Of LLMs 2025: Progress, Problems, and Predictions
A comprehensive 2025 review of large language model developments highlights reinforcement learning with verifiable rewards (RLVR) and the GRPO algorithm as the year's dominant training paradigm, following DeepSeek R1's breakthrough. Key trends include inference-time scaling, tool use integration, and architectural efficiency tweaks like mixture-of-experts and linear attention mechanisms. The analysis addresses benchmarking challenges ("benchmaxxing"), discusses practical LLM usage for coding and writing, and examines the shift toward domain-specific models with proprietary data. Predictions for 2026 emphasize RLVR expansion beyond math/code, increased inference optimization, and the emergence of diffusion models for low-latency tasks.
176
1
8
Video
YouTube·48w
STOP Taking Random AI Courses - Read These Books Instead
A comprehensive guide to learning AI and machine learning through structured resources rather than random courses. Covers five key areas: programming fundamentals with Python, mathematics and statistics foundations, traditional machine learning concepts, deep learning and LLMs, and AI engineering for production deployment. Emphasizes practical application over theoretical study, recommending specific books like 'Hands-On ML with Scikit-Learn and TensorFlow' and courses like Andrew Ng's specializations. Highlights the importance of understanding both foundational concepts and modern deployment practices for current AI engineering roles.
164
4
9
Video
Fireship·1y
This free Chinese AI just crushed OpenAI's $200 o1 model...
A free and open source AI model called Deep Seek R1 has been released by China, rivaling OpenAI's $200 o1 model in performance. Using direct reinforcement learning instead of supervised fine-tuning, Deep Seek R1 has shown impressive benchmark results, especially in math and software engineering. The model includes features for advanced problem-solving and is available on platforms like Hugging Face or for local download.
147
7
10
Article
Sebastian Raschka·28w
Recommendations for Getting the Most Out of a Technical Book
A structured five-step approach to learning from technical books: start with an offline read-through to grasp the big picture, follow with hands-on coding by retyping examples, complete exercises to solidify understanding, review notes and explore additional resources, and finally apply concepts in personal projects. The method emphasizes focused reading sessions, active engagement with code, and practical application over passive consumption.
141
7
11
Article
Claudette·46w
Python For Everything
122
14
12
Article
Machine Learning Mastery·1y
3 Easy Ways to Fine-Tune Language Models
The post discusses three methods to fine-tune language models: full fine-tuning, parameter-efficient fine-tuning (PEFT), and instruction tuning. Full fine-tuning updates all model parameters, offering state-of-the-art performance but requiring significant computational power. PEFT, including techniques like LoRA, updates only a small portion of parameters, making it resource-efficient. Instruction tuning uses diverse task instructions, enhancing the model's ability to generalize. Code examples and detailed steps are provided for each method.
121
1
13
Article
Machine Learning Mastery·1y
Roadmap to Python in 2025
Python remains a cornerstone for data science and machine learning in 2025. The post provides a roadmap for learning Python, from basics to advanced machine learning applications, tailored to different proficiency levels. It emphasizes the importance of mastering modern Python features, foundational data science libraries such as NumPy and Pandas, and machine learning frameworks like TensorFlow and PyTorch. The roadmap also highlights specialized tracks for data engineering, AI, web development, and emerging technologies. Staying updated with Python's evolution and leveraging AI tools can further enhance development efficiency and effectiveness.
113
1
14
Article
Daily Dose of Data Science | Avi Chawla | Substack·44w
4 Stages of Training LLMs from Scratch
Training large language models from scratch involves four key stages: pre-training on massive text corpora to learn language basics, instruction fine-tuning to make models conversational and follow commands, preference fine-tuning using human feedback (RLHF) to align with human preferences, and reasoning fine-tuning for mathematical and logical tasks using correctness as a reward signal. Each stage builds upon the previous one to create increasingly capable and aligned AI systems.
104
2
15
Article
freeCodeCamp·37w
How to Fine-Tune Large Language Models
A comprehensive course covering fine-tuning techniques for large language models, including supervised fine-tuning, reinforcement learning with human feedback (RLHF), and QLoRA methodology. The course explains the differences between fine-tuning, pre-training, and prompt engineering, with practical applications and case studies for specializing LLMs for specific domains.
92
3
16
Article
MIT News·23w
Deep-learning model predicts how fruit flies form, cell by cell
MIT researchers developed a deep-learning model that predicts cell-by-cell development during fruit fly embryo formation with 90% accuracy. The model uses a dual-graph structure representing cells as both point clouds and foam-like bubbles, tracking properties like position, division, and folding minute-by-minute during gastrulation. The approach could eventually predict development in more complex organisms and identify early disease patterns in conditions like asthma and cancer, though high-quality video data remains the primary limitation for broader applications.
90
7
17
Video
freeCodeCamp·1y
Understanding Deep Learning Research Tutorial - Theory, Code and Math
This tutorial provides a comprehensive guide to understanding and implementing deep learning research. It breaks down the essential skills needed: reading research papers, understanding dense mathematical notation, and navigating complex codebases. Using examples such as QH Adam and a segmentation model from Meta, the tutorial offers practical steps to demystify the subject. By the end, you should be better prepared to tackle advanced AI research projects.
77
18
Video
freeCodeCamp·1y
DeepSeek-R1 Crash Course
Angrew Brown's crash course introduces DeepSeek, a platform for utilizing and running large language models (LLMs) such as DeepSeek R1 and V3 on local hardware. He demonstrates downloading and setting up the models using tools like AMA, Studio LM, and Hugging Face, stressing the importance of having capable hardware such as an Intel lunar Lake AI PC dev kit or a workstation with an RTX 480 GPU. Troubleshooting tips and the potential for running models with distributed computing are also discussed.
69
19
Video
freeCodeCamp·1y
Essential Machine Learning and AI Concepts Animated
Learn essential machine learning and AI concepts in an easy and visual way with this course from Vladimir of Touring Time Machine. Key topics covered include variance, unsupervised learning, time series analysis, transfer learning, gradient descent, logistic regression, and neural networks, among others. The focus is on simplifying complex ideas with animations, avoiding jargon, and making learning accessible and engaging.
65
1
20
Article
Towards Data Science·29w
We Didn’t Invent Attention — We Just Rediscovered It
Attention mechanisms in AI transformers aren't novel inventions but rediscoveries of fundamental optimization principles. The same mathematical pattern—selective amplification combined with normalization—emerges independently across evolution (500+ million years of neural systems), chemistry (autocatalytic reactions), and AI (gradient descent). This convergence suggests attention represents a universal solution to information processing under energy constraints. Reframing attention as amplification rather than selection offers practical insights for improving AI architectures: decoupling amplification from normalization, exploring non-content-based amplification, implementing local normalization pools, and designing systems that operate at critical dynamics for optimal information processing.
58
3
21
Article
DigitalOcean Community·1y
olmOCR and RolmOCR: The Latest in Open-Source OCR
DigitalOcean's post highlights olmOCR and RolmOCR, two innovative open-source OCR models developed by Allen AI and Reducto. olmOCR features Document Anchoring for improved text extraction, while RolmOCR builds on it with enhancements such as shorter prompts and robust off-angle handling. The integration of advanced Vision Language Models and fine-tuning techniques enable these models to offer scalable, cost-efficient solutions for document digitization.
57
22
Article
Hacker News·50w
Fine-Tuning LLMs is a Huge Waste of Time
Fine-tuning advanced LLMs for knowledge injection is counterproductive because it overwrites existing valuable information stored in densely interconnected neurons. Instead of adding knowledge, fine-tuning risks destroying the carefully built ecosystem of an already trained model. Better alternatives include retrieval-augmented generation (RAG), adapter modules like LoRA, and contextual prompting, which inject new information without damaging the underlying model's knowledge base. These modular approaches preserve the integrity of pre-trained networks while achieving the desired knowledge enhancement goals.
56
2
23
Article
Daily Dose of Data Science | Avi Chawla | Substack·38w
8 Key LLM Development Skills for AI Engineers
Outlines eight essential skills for AI engineers working with Large Language Models in production environments: prompt engineering, context engineering, fine-tuning, RAG systems, agents, deployment, optimization, and observability. Each skill covers practical techniques from crafting structured prompts to implementing monitoring systems, with emphasis on moving beyond basic prompting to building scalable, production-grade LLM applications.
53
24
Article
Sebastian Raschka·1y
The State of LLM Reasoning Models
The post explores recent research advancements in reasoning-optimized large language models (LLMs), focusing on inference-time compute scaling methods. It discusses how various techniques, such as chain-of-thought reasoning and test-time preference optimization, improve the reasoning abilities of LLMs without altering underlying model weights. The article highlights the importance of increasing computational resources during inference to enhance performance, making even smaller models more capable. It also touches on other methods like reinforcement learning and supervised fine-tuning that contribute to improved reasoning in LLMs.
53
25
Article
Daily Dose of Data Science | Avi Chawla | Substack·44w
Prompting vs. RAG vs. Finetuning
A decision framework for choosing between prompt engineering, RAG, and fine-tuning when building LLM applications. The choice depends on two key factors: the amount of external knowledge required and the level of model adaptation needed. RAG works best for custom knowledge bases without behavior changes, fine-tuning modifies model structure and behavior, prompt engineering suffices for basic adjustments, and hybrid approaches combine RAG with fine-tuning for complex requirements.
44

See all Deep Learning archives