Are you wondering what low-rank adaptation of large language models (LoRA) is? Or maybe you want to learn more about when LoRA should be used? Well either way, you are in the right place! In this article, we tell you everything you need to know to determine when you should and should not use LoRA.
We start out by discussing what LoRa is and how it works. After that, we discuss whether there is any specific type of data that is required in order to apply LoRA. After that, we discuss some of the main advantages and disadvantages of LoRA. This will provide useful context that will inform our final section on when you should and should not use LoRA.
What is LoRA?
What is low-rank adaptation of large language models (LoRA)? LoRA is a model fine tuning scheme that enables you to fine tune large language models faster and use smaller compute resources. The high level intuition behind how it works is that it freezes some model parameters and only allows you to make updates to a small set of parameters that are not frozen. Since there are not as many parameters that need to be updated, there are not as many computations that need to be run.
When LoRA is used, parameter updates that are made during fine tuning are represented by small matrices that can then be combined with the original parameter matrices to get the final parameter values. In addition to making fine tuning faster and more efficient, LoRA also reduces the amount of storage that is required to store your fine tuned models. That is because you do not need to store a full model every time you fine tune a model. Instead, you can store one instance of a base model you are fine tuning and a collection of small update matrices that can be combined with the original model’s parameter matrices to calculate the final parameters for the fine tuned models. These small update matrices can be combined with the base model’s parameter matrices on the fly right before inference is made so that you do not need to store the full fine tuned model.
What data is required for LoRA?
Is there any special data that is required in order to use LoRA? No, LoRA is simply a model training scheme that allows you to fine tune a model faster and on smaller compute resources. That means that there is no special data that is required in order to use LoRA. All you need is the dataset that you would use in order to fine tune your model the standard way if you were not using LoRA.
Advantages and disadvantages of LoRA
What are some of the main advantages and disadvantages of using LoRA? In this section, we will describe some of the main advantages and disadvantages of LoRA.
Advantages of LoRA
Here are some of the main advantages of using LoRA.
- Smaller compute requirements. One of the main advantages of using a technique like LoRA is that it allows you to train a large language model on smaller machines with smaller compute resources. This can be a huge advantage if you work in an environment where large compute resources are not available to you. If you find yourself in this situation, then LoRA can spell the difference between being able to fine tune a model and not being able to train a model at all.
- Cheaper training. Another advantage of being able to train a model on smaller compute resources is that model training is cheaper. If you are using compute resources from a vendor like AWS or Azure, then you will have to pay less to use those resources. If you are running a model on compute resources that are owned by your company, then you will likely still incur lower energy costs.
- Faster training. Another advantage of reducing the number of parameters that need to be updated and therefore the number of calculations that need to be run is that model training runs will be faster. This can be a large advantage if you are in a situation where you need to rapidly iterate on models as you learn from previous model training runs.
- Performance gains of fine tuned models. In addition to experiencing improvements in operational metrics like speed and cost, fine tuning models with LoRA generally gives you the same types of performance gains that are seen when fine tuning a model the old fashion way. That means that there is not a lot to lose when it comes to using LoRA.
- Reduced storage requirements. Another advantage of fine tuning a model with LoRA is that you are not required to store many different versions of fine tuned models. Instead, all you need to store is one copy of the base model that you are fine tuning and the small update matrices that were created for each fine tuning run. We will note that you can combine the base model parameter matrices with the update matrices before saving the model and save the full fine tuned models if you want, but this is not strictly required.
- Averts catastrophic forgetting. Another advantage of fine tuning a model with LoRA is that it averts phenomena like catastrophic forgetting. Catastrophic forgetting is a phenomenon that can occur when a model’s performance degrades when it is trained on a large amount of training data. This is a particularly important benefit if you need to train the model with a large amount of data.
- Can be combined with other pest techniques. LoRA is also a great option to reach for because it can be combined with a range of other parameter efficient fine tuning techniques to further reduce the resources required for model training.
Disadvantages of LoRA
Here are some of the main disadvantages of LoRA.
- Slow inference if full models are not stored. One disadvantage of LoRA is that if you do want to store parameter update matrices and combine them with base model parameter matrices on the fly, this will introduce some latency into your system when it comes time to make predictions. This is because additional computations will have to be made at the time of inference to combine the parameter matrices. We will note that this disadvantage can be circumvented if you combine the parameter matrices before saving your model and store the full fine tune model rather than just the update matrix. There are some trade offs here between inference speed and storage requirements.
- Does not speed up inference. Even if you combine the update matrices with the base model parameter matrices before saving the model, inference will not be any faster than it would be if you did not use LoRA. LoRA is a technique for speeding up model training, not a technique for speeding up model inference.
- Configuration required. Probably the largest disadvantage of LoRA is that it introduces a lot of configuration and potentially some additional debugging into the model training process. There are multiple parameters that need to be configured when you use LoRA and the parameters that you choose can impact the quality of your final model.
When to use LoRA
When does it make sense to use LoRA to fine tune your large language model? Here are some examples of situations where it does make sense to use LoRA with large language models.
- When you cannot fine tune a model on available compute resources. The main situations where you should reach for LoRA is if you realize that you cannot fine tune a model on available compute resources because the available compute resources are too small. This situation is especially common if your company does not use cloud computing platforms and instead maintains their own compute resources. If no one has needed large compute resources to fine tune a model before, then those resources might just not be available.
- When you are fine tuning a language model on a lot of data. Another situation where it makes sense to use LoRA is when you plan to fine tune a large language model on a lot of data. There are a few benefits that you stand to gain in these situations. For one, your model training runs will be faster and cheaper. In addition to that, your model will be less likely to display catastrophic forgetting.
When not to use LoRA
When does it not make sense to use LoRA to fine tune your large language model? Here are some examples of situations where it does not make sense to use LoRA with large language models.
- When you want to reduce inference time. As we mentioned above, LoRA is a technique for speeding up model training times. It does not speed up model inference times, and can even have a negative impact on model inference times depending on how it is implemented. If you want to speed up model inference times, you are better off using a technique like model quantization.
- How to improve large language model performance
- When to use large language models
- When to use model quantization