LoRA is a parameter-efficient finetuning approach for large language models (LLMs) that reduces the computational and memory burden of finetuning while maintaining performance comparable to end-to-end finetuning. It achieves this by injecting a learnable low-rank weight update into each layer of the pretrained model, reducing the number of trainable parameters. QLoRA is an extension of LoRA that combines it with model quantization to further reduce memory usage during finetuning.
Table of contents
Background InformationAdaptation of Foundation ModelsFinetuning LLMs More EfficientlyTakeawaysSort: