Best of Deep LearningMarch 2026

  1. 1
    Article
    Avatar of freecodecampfreeCodeCamp·11w

    Learn how to fine-tune LLMs in 12 hours

    A 12-hour freeCodeCamp course covering LLM fine-tuning from foundations to enterprise applications. The curriculum spans four major areas: Parameter-Efficient Fine-Tuning (PEFT) with LoRA and QLoRA for consumer hardware, advanced alignment techniques including RLHF and Direct Preference Optimization (DPO), high-performance tooling like Unsloth, Axolotl, and Llama Factory, and enterprise/multimodal AI covering Vision Transformers, multimodal architectures, and APIs from OpenAI and Google Cloud Vertex AI.

  2. 2
    Article
    Avatar of mitMIT News·11w

    Neurons receive precisely tailored teaching signals as we learn

    MIT neuroscientists have found the first biological evidence that the brain sends individualized, vectorized error signals to specific neurons during learning — similar to backpropagation in artificial neural networks. Using a brain-computer interface that linked the activity of 8–10 neurons in mice directly to rewards, researchers observed that neurons requiring increased activity and those requiring decreased activity received opposing instructive signals at their dendrites. Blocking these dendritic signals prevented learning. The findings bridge neuroscience and machine learning, suggesting the brain uses a targeted, cell-specific feedback mechanism rather than only the broad neuromodulator-based reinforcement previously understood.

  3. 3
    Article
    Avatar of hnHacker News·9w

    GitHub - MoonshotAI/Attention-Residuals

    Attention Residuals (AttnRes) is a drop-in replacement for standard residual connections in Transformer architectures, developed by the Kimi team at MoonshotAI. Instead of uniformly accumulating all layer outputs with fixed unit weights, AttnRes uses softmax attention over preceding layer outputs with a learned pseudo-query per layer, enabling selective, content-aware aggregation across depth. A practical Block AttnRes variant reduces memory from O(Ld) to O(Nd) by grouping layers into blocks and applying attention only at block boundaries. Evaluated on a 48B MoE model trained on 1.4T tokens, AttnRes consistently outperforms the baseline across benchmarks, with notable gains on GPQA-Diamond (+7.5) and HumanEval (+3.1). Scaling law experiments show Block AttnRes matches the loss of a baseline trained with 1.25x more compute.