Best of Machine LearningApril 2026

  1. 1
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·4w

    Google solved an Old RNN Problem

    Google Research introduces 'Memory Caching,' a technique that addresses the long-standing limitation of RNNs losing information over long sequences. Instead of relying on a single fixed-size memory state, the approach splits sequences into segments and saves the RNN's memory state at each segment boundary. During generation, each token attends to all saved checkpoints, achieving O(NL) complexity — a middle ground between RNNs' O(L) and Transformers' O(L²). Four variants are proposed: Residual Memory, Gated Residual Memory (GRM), Memory Soup, and Sparse Selective Caching (SSC), with GRM performing best. The technique significantly closes the recall gap between RNNs and Transformers and shows that hybrid architectures are implicitly a special case of Memory Caching. Experiments are at academic scale (up to 1.3B params), so frontier-scale performance remains an open question.

  2. 2
    Article
    Avatar of bytebytegoByteByteGo·5w

    How LinkedIn Feed Uses LLMs to Serve 1.3 Billion Users

    LinkedIn replaced five separate Feed retrieval systems with a single LLM-powered dual encoder model to serve 1.3 billion users. Key engineering decisions include: converting raw numerical features into percentile buckets (boosting popularity-embedding correlation 30x), filtering training data to only positively-engaged posts (2.6x faster training, 15% better recall), using both easy and hard negatives for contrastive learning, and building a Generative Recommender with causal transformer attention and Multi-gate Mixture-of-Experts heads for multi-task ranking. Infrastructure innovations include shared context batching, a custom Flash Attention variant (GRMIS) for 2x speedup, disaggregated CPU/GPU serving, and continuously running embedding refresh pipelines.

  3. 3
    Article
    Avatar of hnHacker News·4w

    A 3D Body from Eight Questions — No Photo, No GPU

    Clad built a small MLP that predicts 58 body shape parameters from just 8 questionnaire inputs, achieving 0.3 cm height MAE and 0.3–0.5 kg mass MAE — outperforming both a photo-based pipeline and height+weight regression on circumferences. The key innovation is a differentiable physics loss: the MLP's outputs are passed through the Anny body model's forward pass (blendshapes → vertices → volume → mass), so mass errors backpropagate through all volume-related parameters jointly. The model is tiny (~85 KB), trains in ~60 minutes on a laptop, and runs in milliseconds on CPU. Major lessons include fixing body density calculations per gender using the Siri two-component model, discovering that a training/inference distribution mismatch on ancestry blendshapes caused a 3 kg noise floor, and finding that dataset quality and evaluation rigor mattered far more than model architecture.

  4. 4
    Article
    Avatar of huggingfaceHugging Face·4w

    Any Custom Frontend with Gradio's Backend

    gradio.Server is a new Gradio class that extends FastAPI, allowing developers to pair any custom frontend (React, Svelte, plain HTML/JS) with Gradio's backend infrastructure including queuing, concurrency management, ZeroGPU support, and gradio_client compatibility. The @app.api() decorator wraps Python functions with Gradio's queuing engine, preventing GPU contention when multiple users hit an endpoint simultaneously. A working demo builds a 'Text Behind Image' editor: a ~50-line Python backend loads a BiRefNet segmentation model, while a 1300-line vanilla HTML/CSS/JS frontend handles canvas layering, drag-and-drop, and client-side PNG export. The frontend communicates via the Gradio JS Client to benefit from queue management. Future posts will cover MCP tool registration, SSE streaming, batch processing, and multi-page app patterns.