Best of PyTorch — 2025

1
Article
Sebastian Raschka·50w
Coding LLMs from the Ground Up: A Complete Course
Sebastian Raschka shares a comprehensive video course series on building Large Language Models from scratch using Python and PyTorch. The course covers seven key areas: environment setup, text data preprocessing and tokenization, attention mechanisms implementation, LLM architecture coding, pretraining on unlabeled data, classification fine-tuning, and instruction fine-tuning. The content serves as supplementary material to his book 'Build a Large Language Model (From Scratch)' and emphasizes hands-on learning through implementation rather than using pre-built frameworks.
420
5
2
Video
YouTube·48w
STOP Taking Random AI Courses - Read These Books Instead
A comprehensive guide to learning AI and machine learning through structured resources rather than random courses. Covers five key areas: programming fundamentals with Python, mathematics and statistics foundations, traditional machine learning concepts, deep learning and LLMs, and AI engineering for production deployment. Emphasizes practical application over theoretical study, recommending specific books like 'Hands-On ML with Scikit-Learn and TensorFlow' and courses like Andrew Ng's specializations. Highlights the importance of understanding both foundational concepts and modern deployment practices for current AI engineering roles.
164
4
3
Article
Sebastian Raschka·28w
Recommendations for Getting the Most Out of a Technical Book
A structured five-step approach to learning from technical books: start with an offline read-through to grasp the big picture, follow with hands-on coding by retyping examples, complete exercises to solidify understanding, review notes and explore additional resources, and finally apply concepts in personal projects. The method emphasizes focused reading sessions, active engagement with code, and practical application over passive consumption.
141
7
4
Article
Machine Learning Mastery·1y
3 Easy Ways to Fine-Tune Language Models
The post discusses three methods to fine-tune language models: full fine-tuning, parameter-efficient fine-tuning (PEFT), and instruction tuning. Full fine-tuning updates all model parameters, offering state-of-the-art performance but requiring significant computational power. PEFT, including techniques like LoRA, updates only a small portion of parameters, making it resource-efficient. Instruction tuning uses diverse task instructions, enhancing the model's ability to generalize. Code examples and detailed steps are provided for each method.
121
1
5
Article
Hacker News·31w
character-ai/Ovi
Ovi is an open-source audio-video generation model that simultaneously creates synchronized 5-second videos and audio from text or text+image inputs. The 11B parameter model supports flexible resolutions (720×720 to 960×960), multiple aspect ratios, and includes a custom-trained 5B audio branch. It offers inference options for single or multi-GPU setups, includes memory optimization features like fp8 quantization and CPU offloading for 24GB GPUs, and provides integration with Gradio UI and ComfyUI. The model is based on research from Character AI and builds upon Wan2.2 for video and MMAudio for audio processing.
59
2
6
Article
Daily Dose of Data Science | Avi Chawla | Substack·22w
[Hands-on] Deploy and Run LLMs on your Phone!
Fine-tune and deploy LLMs directly on iOS and Android devices using UnslothAI, TorchAO, and ExecuTorch. The tutorial walks through loading Qwen3-0.6B, preparing reasoning and chat datasets, training with quantization-aware methods, exporting to mobile-ready .pte format, and running the model locally on iPhone at ~25 tokens/second. The resulting model is ~470MB and runs 100% on-device without requiring cloud connectivity.
37
7
Article
Daily Dose of Data Science | Avi Chawla | Substack·39w
Implement "Attention is all you need"
A comprehensive tutorial on implementing the Transformer architecture from the groundbreaking "Attention is All You Need" paper using PyTorch. Covers the complete implementation including multi-head attention mechanisms, encoder-decoder structure, positional encoding, and feed-forward networks. Explains key components like self-attention with the Q, K, V formula, masked attention for decoders, and the training process using teacher forcing. Demonstrates how the architecture works for sequence-to-sequence tasks like machine translation, with detailed explanations of both training and inference phases.
34
2
8
Article
Sebastian Raschka·37w
Understanding and Implementing Qwen3 From Scratch
A comprehensive guide to implementing Qwen3, one of the leading open-source large language models, from scratch using pure PyTorch. The article explores why Qwen3 is popular among developers, including its Apache License v2.0, strong performance rankings, and variety of model sizes from 0.6B to 480B parameters. It provides hands-on code implementation to understand the architecture's inner workings.
33
9
Article
Medium·1y
Mathematical Foundation Underpinning Reinforcement Learning
Reinforcement learning (RL) is inspired by the process of learning from experience, with the Soft Actor-Critic (SAC) algorithm being a popular framework. This post discusses the mathematical foundation of SAC agents, detailing the actor (policy) and critic networks. The actor network uses a neural network to estimate actions and their probabilities while the critic network estimates the expected return of action-state pairs. Python code snippets in PyTorch demonstrate the implementation of these networks and their integration into a RL model.
33
10
Article
Sebastian Raschka·24w
From Random Forests to RLVR: A Short History of ML/AI Hello Worlds
A chronological overview traces the evolution of beginner-friendly ML/AI examples from 2013 to 2025. Starting with Random Forests on Iris datasets and XGBoost on Kaggle competitions, it progresses through neural networks (MLPs, AlexNet), transformer models (DistilBERT, Llama 2 with LoRA), and culminates with reasoning models using RLVR on mathematical datasets. Each milestone reflects when methods became mainstream and accessible, often lagging years behind their initial publication due to tooling maturity and community adoption.
32
11
Article
Sebastian Raschka·49w
Understanding and Coding the KV Cache in LLMs from Scratch
KV cache is a critical optimization technique for LLM inference that stores previously computed key and value vectors to avoid redundant calculations during text generation. The technique provides significant speed improvements (up to 5x in examples) by caching intermediate attention computations and reusing them for subsequent tokens. Implementation involves modifying the attention mechanism to store and retrieve cached values, though it increases memory usage and code complexity. The article provides a complete from-scratch implementation with performance comparisons and optimization strategies for production use.
32
12
Article
Hugging Face·25w
Transformers v5: Simple model definitions powering the AI ecosystem
Hugging Face releases Transformers v5, marking five years since v4 with daily installs growing from 20,000 to 3 million. The library now supports over 400 model architectures and 750,000 community checkpoints. Version 5 focuses on simplicity through modular design, improved training support for both pre-training and fine-tuning, enhanced inference capabilities with continuous batching and a new serving API, and first-class quantization support. The release emphasizes interoperability across the ecosystem, enabling seamless integration with inference engines like vLLM and SGLang, local deployment tools like llama.cpp and MLX, and training frameworks like Unsloth and Axolotl.
31
13
Article
Hacker News·24w
Tongyi-MAI/Z-Image
Z-Image is a 6B parameter image generation model featuring three variants: Z-Image-Turbo (distilled for sub-second inference with 8 NFEs on H800 GPUs), Z-Image-Base (foundation model for fine-tuning), and Z-Image-Edit (specialized for image editing). Built on a Scalable Single-Stream DiT architecture, it excels at photorealistic generation, bilingual text rendering (English/Chinese), and instruction following. The model uses Decoupled-DMD distillation algorithm and DMDR (combining DMD with reinforcement learning) for few-step generation optimization. Available on Hugging Face and ModelScope with PyTorch and Diffusers support.
25
14
Video
Sam Witteveen·46w
Kyutai STT & TTS - A Perfect Local Voice Solution?
Kyutai has released separate speech-to-text and text-to-speech models that offer low latency voice processing for English and French. The TTS model is only 1.6B parameters and performs competitively with commercial solutions like 11 Labs. While the models support voice cloning through embeddings, the voice embedding model itself isn't released for ethical reasons. Users can blend existing voice embeddings to create new voices, but cannot generate embeddings from custom audio samples. The models show promise for local voice applications but are currently limited by language support and the restricted voice cloning capability.
22
2
15
Article
Hugging Face·33w
SOTA OCR with Core ML and dots.ocr
A detailed walkthrough of converting the dots.ocr model (a 3B parameter OCR model from RedNote) to run on Apple devices using Core ML and MLX. The guide covers the conversion process from PyTorch to Core ML, including simplifying the model architecture, debugging common conversion errors, and initial benchmarking. Key challenges addressed include handling attention implementations, fixing dtype mismatches, removing dynamic control flow, and dealing with variable-length sequence masking. The converted model initially runs on GPU in FLOAT32 precision, with future parts promising Neural Engine optimization and quantization techniques.
19
16
Article
Daily Dose of Data Science | Avi Chawla | Substack·41w
The Full MLOps/LLMOps Blueprint
A comprehensive crash course covering MLOps and LLMOps fundamentals, from foundational concepts to hands-on implementations. The series explores ML system lifecycle, data pipelines, model training, deployment, and monitoring. Part 3 focuses specifically on reproducibility and versioning using tools like Git, DVC, and MLflow, emphasizing that ML systems require extensive infrastructure beyond just the ML code itself.
19
17
Article
Towards AI·1y
PyTorch vs PyTorch Lightning: A Practical Exploration
PyTorch is a popular framework for deep learning, known for its dynamic computational graph, flexibility, and extensive community support, but requires writing a lot of boilerplate code. PyTorch Lightning is a high-level interface built on top of PyTorch that automates many low-level details like training loops, logging, and distributed learning, making it ideal for production and team projects. Lightning enhances code readability, reproducibility, and speeds up development while preserving PyTorch’s flexibility.
19
1
18
Article
Hacker News·1y
Jiayi-Pan/TinyZero
TinyZero is based on DeepSeek R1 Zero, enhanced with veRL. Using reinforcement learning, it demonstrates the development of self-verification and search abilities in a 3B base LM. The project can be experimented with for less than $30.
15
19
Article
Daily Dose of Data Science | Avi Chawla | Substack·39w
Data and Pipeline Engineering for ML Systems (With Implementation)
A comprehensive MLOps crash course covering data and pipeline engineering for ML systems. The series explores data sources, ETL pipelines, model training, deployment, versioning, and reproducibility. It includes hands-on implementations using tools like PyTorch, MLflow, Git, DVC, and Weights & Biases, providing both foundational concepts and practical system-level thinking for production ML environments.
12
20
Article
Hugging Face·48w
(LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware
QLoRA enables fine-tuning of FLUX.1-dev diffusion models on consumer hardware with under 10GB VRAM by combining 4-bit quantization with Low-Rank Adaptation. The approach uses bitsandbytes for quantization, 8-bit AdamW optimizer, gradient checkpointing, and cached latents to dramatically reduce memory usage from ~120GB to ~9GB. Training on RTX 4090 takes 41 minutes for 700 steps, while FP8 training with torchao on H100 reduces time to 20 minutes. The technique maintains high-quality results while making advanced model customization accessible to developers without enterprise-grade hardware.
11
21
Article
Towards Data Science·51w
May Must-Reads: Math for Machine Learning Engineers, LLMs, Agent Protocols, and More
A monthly roundup of popular machine learning and data science articles covering essential math skills for ML engineers, beginner guides to LLMs and RAG, software engineering concepts like inheritance, agent communication protocols, Model Context Protocol, PyTorch applications, healthcare ML projects, and time series forecasting techniques. The collection also introduces new authors contributing to the data science community.
11
22
Video
Community Picks·47w
Train a Convolutional Neural Network from Scratch: PyTorch, Next.js, React, Tailwind, Python (2025)
A comprehensive tutorial covering the complete process of building a convolutional neural network from scratch using PyTorch to classify audio files. The guide starts with neural network fundamentals including neurons, activation functions, and training concepts like forward pass, backpropagation, and loss optimization. It then dives deep into CNN theory, explaining kernels, feature maps, spatial information preservation, and how CNNs extract hierarchical features from images. The practical implementation includes converting audio to spectrograms, training on serverless GPUs with Modal, achieving 83% accuracy, and building a Next.js frontend to visualize the model's convolutional layer outputs and feature extraction process.
11

See all PyTorch archives