LoRA is a parameter-efficient finetuning approach for large language models (LLMs) that reduces the computational and memory burden of finetuning while maintaining performance comparable to end-to-end finetuning. It achieves this by injecting a learnable low-rank weight update into each layer of the pretrained model, reducing the number of trainable parameters. QLoRA is an extension of LoRA that combines it with model quantization to further reduce memory usage during finetuning.

33m read timeFrom cameronrwolfe.substack.com
Post cover image
Table of contents
Background InformationAdaptation of Foundation ModelsFinetuning LLMs More EfficientlyTakeaways

Sort: