Adam-mini: A Memory-Efficient Optimizer Revolutionizing Large Language Model Training with Reduced Memory Usage and Enhanced Performance

We are a community of AI/ ML/Generative AI enthusiasts/researchers/journalists/writers who share interesting news and articles about the applications of AI. 

Machine Learning News

Adam-mini is a newly introduced optimizer that significantly reduces memory usage in training large language models while maintaining or enhancing performance. Traditional methods like the Adam optimizer require extensive memory, doubling the resource needs due to the storage of first-order and second-order momentum values. Adam-mini addresses this by partitioning model parameters into blocks based on the Hessian structure of transformers and assigning a single effective learning rate to each block. This strategical partitioning reduces memory usage by 45% to 50% and improves throughput by nearly 50%, making the training of large models more efficient and accessible, especially for researchers with limited GPU resources.