...that even LLMs like GPTs and LLaMAs use.

Daily Dose of DS offers a daily dose of inspiration, education, and motivation for data scientists and aspiring data professionals. Through bite-sized articles, tutorials, and curated resources, readers embark on a journey to master the art and science of data analysis, machine learning, and artificial intelligence. By staying updated with the latest trends, techniques, and tools in data science, readers can hone their skills and stay ahead in this rapidly evolving field.

Daily Dose of Data Science | Avi Chawla | Substack

Activation checkpointing is a memory-saving technique used in training large models like GPT and LLaMA. It works by dividing the network into segments, storing only the first layer's activations per segment during the forward pass, and recomputing intermediate activations during backpropagation only when needed. This can reduce memory usage to sqrt(M) compared to storing all activations, at the cost of a 15–25% increase in runtime. A larger batch size can offset this overhead. The post also briefly covers AI agent memory types (short-term, long-term: semantic, episodic, procedural) and mentions open-source tools like Cognee and Anton.

A Memory-efficient Technique to Train Large Models