Best of Machine Learning — February 2026

1
Video
ForrestKnight·15w
I Can't Believe Rust is Replacing Java
XAI completely rewrote X's recommendation algorithm, replacing dozens of Java/Scala microservices with four Rust components and a Grok-based transformer. The old system used hand-engineered features and explicit weights across interconnected services, while the new architecture moves intelligence into an ML model with Rust handling orchestration. The shift addresses JVM garbage collection issues for sub-millisecond latency requirements and reflects a broader industry trend of using Rust for performance-critical infrastructure alongside AI-driven systems.
92
25
2
Article
Code Like A Girl·14w
I analyzed 50,000 Dating Profiles to Decipher the Myths of Love in Algorithm
A data scientist analyzes 50,000 dating app profiles to debunk common dating myths using Python and machine learning. The analysis reveals that urban users get 40% more successful relationships but double the catfish rate, picky swipers (right-swipe <25%) perform better than desperate ones, and spending more time on apps doesn't increase matches. A logistic regression model achieves 99.15% accuracy in predicting compatibility based on swipe behavior, app usage patterns, and shared interests—proving that behavioral alignment matters more than common hobbies.
87
8
3
Article
Salesforce Engineering·14w
How Agentic Memory Enables Reliable AI Agents Across Enterprise Users
Salesforce developed Agentic Memory to overcome limitations of stateless AI agents with small context windows. The system uses a structured data layer that separates short-term session context from long-term persistent memory anchored to profile graphs. Key innovations include write and read gates with confidence scoring, hybrid semantic validation to prevent duplication, and episodic memory that preserves event sequences. The architecture treats memory as inspectable, governable data rather than prompt text, enabling agents to maintain continuity across sessions while meeting enterprise requirements for auditability, access control, and compliance at scale.
57
4
Article
ByteByteGo·15w
How Grab Built a Vision LLM to Scan Images
Grab built a custom 1B-parameter Vision LLM to extract information from Southeast Asian documents for eKYC verification. Starting with Qwen2-VL 2B, they progressed from LoRA fine-tuning to full parameter training, then built a lightweight model from scratch combining Qwen2-VL's vision encoder with Qwen2.5's compact language decoder. The four-stage training process included projector alignment, vision enhancement, language-specific visual training on synthetic OCR data, and task-specific fine-tuning. The final model achieved comparable accuracy to the 2B version while delivering 48-56% faster latency, addressing challenges with non-Latin scripts and diverse document formats across the region.
54
5
Video
The Coding Gopher·15w
Meta now has the most insane AI agent
Meta's AI agent represents a shift from large language models to large action models (LAMs) that can interact with computers through visual understanding and mouse/keyboard control. The system uses vision transformers to parse screen pixels, DOM annotation for web interaction, and operates in ephemeral sandboxed microVMs for security. By working at the UI layer rather than requiring APIs, it enables probabilistic automation of complex workflows across legacy systems, marking a transition from text-to-text models to multimodal input-to-executable-action systems.
47
1
6
Article
Towards Data Science·12w
Is the AI and Data Job Market Dead?
Despite recurring claims that data science is dying, job postings grew 130% year-over-year after bottoming out in mid-2023, and salaries continue to rise. The field has evolved from a generalist 'Swiss Army Knife' role into three distinct specializations: analyst, engineering (ML engineer), and infrastructure (data engineer). A 2025 study of 285,000 companies shows senior hiring is still growing while junior hiring has plateaued—not disappeared—making entry-level competition fiercer. To stand out, candidates should specialize in areas like GenAI or time series forecasting, build strong professional networks, develop soft skills AI can't replace, and consider starting in analyst roles before moving up.
40
7
Article
Machine Learning Mastery·14w
The 7 Biggest Misconceptions About AI Agents (and Why They Matter)
AI agents are conditional automation systems, not truly autonomous entities. Common misconceptions lead to production failures: agents require explicit boundaries and guardrails, prototypes differ vastly from production-ready systems, more tools and context often degrade performance, behavior is non-stationary requiring continuous monitoring, most failures stem from system design rather than model limitations, and evaluation must focus on behavioral metrics like tool-selection accuracy rather than text quality. Successful deployments treat agents as engineered systems with constraints, not intelligent entities that self-regulate.
41
5
8
Article
Tech Lead Digest·14w
AI Fluency Leveling
A 7-level framework for assessing AI fluency in knowledge workers, from casual consumers to AI pioneers. The levels progress from basic prompt usage through context engineering and RAG implementation to system architecture and platform development. The critical transition occurs at Level 4, where practitioners shift from prompt-based approaches to deterministic code for managing AI's probabilistic nature. Each level includes hiring criteria, required skillsets, and practical guidance for career development and organizational assessment.
35
4
9
Video
bycloud·14w
LLM’s Billion Dollar Problem
Token consumption in LLMs has exploded with thinking models and AI agents, creating scalability challenges. Standard attention mechanisms scale quadratically with context length, making long contexts prohibitively expensive. Three approaches attempt to solve this: sparse attention (restricts which tokens interact), linear attention (accumulates information in shared memory), and compressed attention (compresses tokens before comparison). While sparse and compressed attention help, only linear attention can theoretically scale past 1M context windows. Recent developments show hybrid approaches combining linear attention with standard or compressed attention achieving promising results, with Google's Gemini 3 Flash demonstrating breakthrough performance at 1M context length.
35
10
Article
The Palindrome·13w
The Palindrome in 2026
Tivadar Danka outlines his 2026 plans for The Palindrome newsletter: finishing his Machine Learning From Zero book with from-scratch algorithm implementations, creating more explainer videos, launching monthly live workshops for paid subscribers (starting with Mathematics of Machine Learning on March 7th), building a team of contributors inspired by distill.pub, and developing nb2wb—an open-source tool for converting Jupyter Notebooks to web publishing platforms. The newsletter has grown from 16,835 to 39,663 subscribers since May 2025.
32
1
11
Article
Windsurf·15w
Windsurf Tab v2: 25-75% more accepted chars with Variable Aggression
Windsurf Tab v2 introduces a completely rewritten autocomplete model that increases accepted characters by 25-75% through improved context engineering and a new "variable aggression" feature. The team optimized the system prompt (76% reduction in length), refined the data pipeline, and used reinforcement learning to train models that predict more code per suggestion while maintaining acceptance rates. Users can now choose between different aggression levels to match their preferences, from conservative suggestions to bolder multi-line predictions. The update focuses on maximizing total keystrokes saved rather than just optimizing for acceptance rate alone.
32
1
12
Article
Netflix TechBlog·14w
Scaling LLM Post-Training at Netflix
Netflix built an internal post-training framework to scale LLM fine-tuning from experimentation to production. The framework abstracts infrastructure complexity across four dimensions: data (streaming, sequence packing, loss masking), model (sharding, LoRA, architecture support), compute (distributed job orchestration, checkpointing, MFU monitoring), and workflow (supporting both SFT and on-policy RL). Key engineering decisions include staying Hugging Face-compatible for interoperability, maintaining optimized internal model implementations for performance, and evolving from SPMD-only execution to hybrid orchestration for RL workflows. The system enables researchers to focus on modeling rather than distributed systems plumbing.
32
13
Article
sean goedecke·13w
LLM-generated skills work, if you generate them afterwards
LLM-generated "skills" (explanatory prompts for specific tasks) work better when created after solving a problem rather than before. A recent paper found that pre-generated skills provide no benefit because they bake in incorrect assumptions from training data. The effective approach is to have the LLM solve the problem through iteration first, then distill that learned experience into a reusable skill document. This captures knowledge gained from millions of tokens of problem-solving rather than just regurgitating existing training data.
30
3
14
Article
SD Times·14w
This week in AI updates: GPT-5.3-Codex-Spark, GitHub Agentic Workflows, and more (February 13, 2026)
OpenAI released GPT-5.3-Codex-Spark, a lightweight coding model delivering 1,000+ tokens per second through a Cerebras partnership. GitHub launched Agentic Workflows for repository automation using plain Markdown descriptions. Google added Automated Reviews to Conductor in Gemini CLI and upgraded Gemini 3 Deep Think mode for improved reasoning. GitHub Copilot testing for .NET reached general availability in Visual Studio 2026. Anthropic raised $30 billion in Series G funding at a $380 billion valuation, with run-rate revenue hitting $14 billion.
28
15
Article
Max Woolf's Blog·12w
An AI agent coding skeptic tries AI agent coding, in excessive detail
A self-described AI agent skeptic documents their journey from dismissing agentic coding to becoming a cautious convert after working with Claude Opus 4.5 and OpenAI Codex. The author shares detailed real-world experiments: building a YouTube scraper, a FastAPI webapp, Rust packages with Python bindings (icon rendering, word clouds, a terminal MIDI DAW, a physics simulator), and ultimately developing high-performance Rust implementations of ML algorithms (UMAP, HDBSCAN, GBDT) that outperform existing C/C++ libraries by 2-100x. Key insights include the importance of a well-crafted AGENTS.md file for controlling agent behavior, chaining Codex and Opus for iterative optimization, and the value of having approximate domain knowledge to audit agent output. The author remains measured—acknowledging real productivity gains while resisting hype—and open-sources all projects.
23
16
Article
mlflow·15w
Introducing DeepEval, RAGAS, and Phoenix Judges in MLflow
MLflow 3.9.0 integrates over 50 evaluation metrics from DeepEval, RAGAS, and Arize Phoenix frameworks into a unified API. This integration enables developers to evaluate LLM agents and RAG systems using multiple judge frameworks simultaneously, compare results side-by-side in MLflow UI, and access specialized metrics for conversational agents, retrieval quality, hallucination detection, and safety. The unified interface eliminates the need for custom wrappers and provides visualization, filtering, and iteration tools for improving agent quality before production deployment.
19
17
Article
mlflow·12w
Multi-turn Evaluation & Simulation: Enhancing AI Observability with MLflow for Chatbots
MLflow 3.10 introduces multi-turn evaluation and conversation simulation for chatbots and AI agents. The release adds built-in session-level scorers like ConversationCompleteness and UserFrustration that assess entire conversations rather than individual responses. A ConversationSimulator lets developers define persona-based test scenarios with goals and guidelines, generate reproducible multi-turn conversations, and automatically extract test cases from production traces. Scorers can run on-demand against existing sessions or be registered to evaluate new sessions automatically. The workflow enables A/B comparison of agent versions—demonstrated by a prompt improvement that boosted completeness 50% and cut frustration 75%.
18
18
Article
Product Hunt·15w
Polyvia: Pinecone for visual data - Visual Knowledge Index for Agents
Polyvia is a Visual Knowledge Index that enables AI agents to reason across visual data like charts, diagrams, and infographics. Unlike traditional tools that only extract or index text, Polyvia uses VLM-OCR extraction to convert visual content into structured logic, creates a graph-based facts ontology to disambiguate information, and provides agentic visual reasoning with audit-ready citations. It's available via API, MCP Server (for Claude, Cursor), and Polyvia Studio interface, targeting developers building multimodal agents and knowledge-work teams in research, finance, and healthcare.
15
1
19
Article
BigData Boutique blog·13w
Everything You Need to Know Before Building AI Agents
AI agents differ from chatbots by autonomously controlling workflow through iterative problem-solving rather than single-shot responses. They consist of three core components: memory (short-term, long-term, and working), planning/reasoning (decomposing goals into subtasks), and tools (APIs and external integrations). Autonomy levels range from operator assistance to full independence, with Level 2-3 being optimal for most production systems. Common pitfalls include building agents for deterministic workflows, poor tool definitions, lack of evaluation frameworks, and missing observability. Most production agents struggle with quality issues, and the engineering challenge lies in moving from demo to reliable production system.
13
20
Video
EO·15w
What you must know before AGI arrives | Carnegie Mellon University Po-Shen Loh
Carnegie Mellon math professor Po-Shen Loh discusses the impending arrival of AGI and its implications for education and human capability. He argues that as AI surpasses human abilities in creativity and problem-solving (already solving 4 of 6 International Math Olympiad problems), the critical skill becomes independent thinking and synthesis rather than rote learning. Loh warns against students using AI for homework, comparing it to driving instead of running for exercise—it atrophies mental fitness. He advocates for teaching students to "grade homework" rather than just do it, emphasizing creativity, communication, and the ability to solve novel problems. His educational approach focuses on building autonomous thinking through live instruction combining math expertise with acting/communication skills, creating a scalable ecosystem where high schoolers teach middle schoolers. He stresses that future success depends on authentic collaboration, empathy, and the ability to create value for others, as AI will handle routine tasks.
12
3

See all Machine Learning archives