Activation checkpointing is a memory-saving technique used in training large models like GPT and LLaMA. It works by dividing the network into segments, storing only the first layer's activations per segment during the forward pass, and recomputing intermediate activations during backpropagation only when needed. This can reduce memory usage to sqrt(M) compared to storing all activations, at the cost of a 15–25% increase in runtime. A larger batch size can offset this overhead. The post also briefly covers AI agent memory types (short-term, long-term: semantic, episodic, procedural) and mentions open-source tools like Cognee and Anton.

5m read timeFrom blog.dailydoseofds.com
Post cover image
Table of contents
An Open-Source Autonomous BI AgentA Memory-efficient Technique to Train Large ModelsTypes of Memory in AI Agents

Sort: