Are you wondering when it makes sense to fine tune a Large Language Model (LLM) rather than using the out of the box implementation? Well then you are in the right place! In this article, we tell you everything you need to know to understand when you should fine tune an LLM and when you might be better off pursuing another strategy.
We start out by discussing what types of data you need in order to fine tune an LLM. After that, we will discuss some of the main advantages and disadvantages of fine tuning LLMs. This will provide you with more context that will inform later discussions of when to fine tune LLMs. After that, we will discuss examples of situations where it makes sense to fine tune an LLM. Finally, we will discuss some examples of situations where It does not make sense to fine tune an LLM.
What does it mean to fine tune an LLM?
What does fine tuning mean in the context of LLMs? In general, when you fine tune an LLM it means that you are taking a large model that has already been trained on outside data and continuing to train it on your own data. That means that you are modifying the parameters of the model itself to better understand the context within which you are operating in.
What data do you need to fine tune an LLM?
What data do you need to fine tune a LLM? As the name Large Language Model suggests, LLMs are models that are intended to be used with text data. That means that LLMs are generally used when you have text data that you want to be able to feed to your model. In order to fine tune an LLM, you need two different pieces of data. First, you need some examples of text snippets that you would expect a user to feed into your model. These should closely resemble the types of prompts that users will type into your model when the model is in use. In addition to this, you also need some examples of what the model output should look like. Specifically, you should have one example of an appropriate output for each text snippet that is used as an input.
There are many different ways that you can generate this type of data, and especially the output data that represents what an appropriate response to a text snippet should look like. In some cases, teams will provide subject matter experts with examples of text snippets that a user may feed in the model and have them manually write appropriate responses to those snippets. In other cases, teams will use synthetic data that has been generated by another LLM to create examples of appropriate responses. Other times teams use a combination of both of these strategies.
Advantages and disadvantages of fine tuning LLMs
In this section, we will talk about some of the main advantages and disadvantages of fine tuning LLMs. This will provide useful context to inform conversations around when to fine tune LLMs and when not to.
Advantages of fine tuning LLMs
We will start off by discussing some of the main advantages of fine tuning LLMs. Specifically, we will focus on advantages that fine tuning has over other methods like prompt engineering.
- Lower latency. One of the largest advantages of fine tuning LLMs is not actually related to improving the predictive performance of the model. Instead, it is related to latency and getting your users an answer as fast as possible. Many alternative methods that are used to enhance the performance of LLMs, such as prompt chaining and prompt engineering, introduce additional overhead and steps into the process that the model has to go through in order to return a response. Sometimes these steps and this additional overhead can introduce a lot of lag and make your model slow. With fine tuning, this is not the case. This is a great option if speed is a large priority for you.
- Fewer tokens needed for prompting. Another large advantage of fine tuning LLMs is that you can hugely reduce the number of tokens and the amount of context that you need to feed into your model. If you do not fine tune your model, then you likely need to include a bunch of extra information in your prompts. This may include things like specifications for the output format, examples of what a good response looks like, and details about how to handle edge cases. When you fine tune a model, you can bake a lot of information into the model itself.
- Provide domain-specific context. Another advantage of fine tuning a model is that it can provide your model with more domain specific context. If you work in a domain where your model does not necessarily need to know highly specific domain knowledge in order to be able to produce a reasonable response, then this is not a concern. However, if you work in a domain with very technical terminology and domain knowledge then you may be in a different scenario. Especially if there is domain knowledge that your model needs to know that is not freely available on the internet, such as when there is a large base of internal knowledge and terminology that is specific to your company that the model needs to know in order to function properly. In some cases, you can solve these problems by feeding some context into the model during prompting, but in other cases this is not sufficient.
- Improved predictive performance for specific tasks. Finally, there are some cases where you may get improved predictive performance from fine tuning a model. This is especially true if you want the model to execute on a very narrow and specific task. That being said, fine tuning does not always improve predictive performance so it is often a better avenue to turn to when you also have operational constraints you need to solve for, like high latency or high token counts.
Disadvantages of fine tuning LLMs
What are some of the main disadvantages of fine tuning LLMs? In this section, we will discuss some of the main disadvantages of fine tuning LLMs.
- Training data may be difficult to obtain. One of the main disadvantages of fine tuning LLMs is that you may find yourself in a situation where training data is more difficult to obtain. It is a common problem in the domain of natural language processing that you do not always have a naturally labeled training data set that you can fall back to. The quality of your training dataset will strictly dictate the quality of your model and if you have a lazy, haphazardly thrown together training set then your model will not do well. That means that you have to take a lot of care to make sure that you include the right examples.
- Fine tuned models are more expensive to use. If you are using a hosted solution, such as OpenAI, to fine tune and host your models then fine tuned models may be more expensive to use. It is not uncommon for vendors to charge much more money to make a prediction with a fine tuned model than they do to make a prediction with a model that has not been fine tuned.
- Fine tuning on sensitive data may introduce privacy concerns. Whenever you fine tune a model on internal data, you run the risk of exposing sensitive data to the model. The model may then go back and spit that sensitive information out to another user. This is a particularly large concern if your data contains personally identifiable information (PII) about users of your product.
- May require large computational resources. Another disadvantage of fine tuning large language models is that it may require large computational resources, which may be expensive if you have access to them at all.
- May not be possible for state of the art models. In some cases, fine tuning is simply not an option. It is common for vendors to publish state of the art models that have been trained on external data without providing the option to fine tune those models. It is often the case that fine tuning capabilities are coming to those models later on, but if you need a fine tuned model then you may not be able to use the state of the art model.
- Fine tuned models may be stripped of guardrails that ensure they behave in a reasonable way. Many vendors like OpenAI put some guardrails on their models to ensure that they behave in a reasonable way. When you are fine tuning a model, these guardrails may or may not make sense for your use case. As a result, these vendors often remove these guardrails when you use the model.
- Requires a lot of iteration. Finally, it generally takes a lot of effort and iteration to ensure that your model is behaving appropriately. It generally takes a lot more labor than other techniques like prompt engineering.
When to fine tune LLMs
Now that we have discussed some of the main advantages and disadvantages of fine tuning LLMs, we will discuss some examples of situations where it does make sense to fine tune an LLM. Specifically, we will discuss situations where you may get better results from fine tuning an LLM than using another strategy like prompt engineering.
- When there is too much context to include in the prompt. One great example of a situation where it may make sense to fine tune an LLM is if you have a long list of specific instructions for the model and you can not fit all of the context that you want to provide in the prompt window. For example, if you want to provide specific instructions about how to format your output and you want to provide context on how to handle certain edge cases and you want to provide a few examples of what an appropriate response looks like, you may find yourself in a situation where you want to provide more context than the model is willing to accept. In these cases, you can bake some of this context into a dataset and use that dataset to fine tune your model. For example, if you have a specific output format that you are looking for then you can make sure to format all of the responses you feed to the fine tuning job in that format. The model should implicitly understand that you are looking for a response in that format.
- When other methods introduce too much latency. Many of the other methods that are used to improve LLM performance introduce extra steps and computation into the process of generating a response. Each of these additional steps introduces additional time and increases the amount of time it takes for your system to generate a response. If you are in a situation where introducing additional latency into your system has high costs, then this is an example of a situation where fine tuning may be a good avenue to pursue.
- When you are trying to execute a very specific task. In general, fine tuned models perform better when you are trying to execute a very specific task with a very clear definition of success. If you are in a case where you are looking for a general model that can assist with multiple related tasks, then you may be better off sticking with a model that has not been fine tuned. If you have a very narrowly defined task in mind, then this is the type of situation where it makes sense to look into fine tuning a model.
When not to fine tune LLMs
Now that we have discussed some situations where it makes sense to fine tune a model, we will take some time to discuss situations where it does not necessarily make sense to fine tune a model. In these situations, you may be better off looking into other techniques like prompt engineering.
- When your data contains sensitive private information. It may make sense to avoid fine tuning a model if you find yourself in a situation where the data that you want to use contains sensitive private information. There are strategies that you can take to mask the data in these situations, but there is always a risk that you will miss something and some sensitive information will find its way into your training data. The main risk here is that the model might spit out some of the sensitive information it was trained on to another user. In these situations, you may be better off using different strategies that do not risk exposing private data to your model.
- When you have not exhausted your options with other alternatives. Fine tuning an LLM is more of an art than a science and it can require a lot of iteration. That means that it can take more time than simply introducing a few additional lines into your prompt or introducing another step into a chain of prompts. It is generally a good idea to try out simpler methods that do not require as much effort first, then turn to fine tuning a model only if those other methods do not suit your needs.
Related articles
- How to improve LLM performance
- When to use retrieval augmented generation for LLMs
- When to use prompt chaining for LLMs
- When to use basic prompt engineering for LLMs
- When to use few shot learning for LLMs
- When to use function calling for LLMs