Best of PyTorch — February 2026

1
Article
Netflix TechBlog·14w
Scaling LLM Post-Training at Netflix
Netflix built an internal post-training framework to scale LLM fine-tuning from experimentation to production. The framework abstracts infrastructure complexity across four dimensions: data (streaming, sequence packing, loss masking), model (sharding, LoRA, architecture support), compute (distributed job orchestration, checkpointing, MFU monitoring), and workflow (supporting both SFT and on-policy RL). Key engineering decisions include staying Hugging Face-compatible for interoperability, maintaining optimized internal model implementations for performance, and evolving from SPMD-only execution to hybrid orchestration for RL workflows. The system enables researchers to focus on modeling rather than distributed systems plumbing.
32
2
Article
Real Python·15w
pandas 3.0 Lands Breaking Changes and Other Python News for February 2026 – Real Python
Python 3.15 alpha releases show JIT compiler performance gains of 7-8% on some platforms. pandas 3.0 introduces breaking changes including Copy-on-Write semantics, dedicated string dtype, and requires Python 3.11+. The PSF received $1.5M from Anthropic for security infrastructure improvements. PyTorch 2.10 deprecated TorchScript in favor of torch.export. PEP 822 proposes d-strings for cleaner multiline string handling. Black 26.1.0 stabilized its 2026 formatting style, and the Python Developers Survey 2026 is now open.
22
3
Article
Hugging Face·14w
Custom Kernels for All from Codex and Claude
HuggingFace built an agent skill that teaches AI coding agents (Claude, Codex) to write production-ready CUDA kernels with PyTorch bindings. The skill packages domain expertise about GPU architectures, memory patterns, and library integration into ~550 tokens of structured guidance. Testing on LTX-Video (diffusers) and Qwen3-8B (transformers) showed the agent-generated RMSNorm kernels achieved 1.88-1.94x speedup over PyTorch baselines, with 6% end-to-end improvement in video generation. The skill integrates with HuggingFace's Kernel Hub for distribution, enabling developers to generate, benchmark, and publish optimized kernels without deep CUDA expertise.
21
2

See all PyTorch archives