A Definitive Guide to Fine-Tuning LLMs Using Axolotl and Llama-Factory
Large Language Model (LLM) training or fine-tuning is an essential task that most enterprises eventually need to undertake when they decide to deploy an open source LLM in production.
Fine-tuning ensures optimal performance and alignment with specific tasks and business needs. Open-source LLMs are trained on massive amounts of generic data and fine-tuning helps tailor the model to an enterprise's unique domain, terminology, and style. Typically this is achieved by training it on a smaller, more relevant dataset. Eventually, this process improves accuracy, reduces irrelevant or misleading responses, and allows for better control over the model's output, making it reliable for mission-critical applications.
In this guide, we will showcase steps to fine-tuning Mistral 7b, using two different open source tools - using Axolotl, or using Llama-Factory.
Axolotl is a versatile open-source tool specifically designed for fine-tuning LLMs. It supports popular training methods like LoRA and full fine-tuning, and offers easy integration with performance-boosting technologies like Xformers.
Similarly, LLaMA-Factory is another open source tool that simplifies the fine-tuning process. It also offers diverse methods like LoRA, full fine-tuning, and reinforcement learning, along with options for memory-efficient quantization techniques, making LLM customization accessible on different hardware setups.
In this article, we will use Mistral 7B as our model, which has rapidly become the darling of the AI community due to its small size and high precision.
Prerequisite (Optional)
There are two key prerequisites - a server with an Ampere GPU, such as A100, and a working conda setup.
For the A100 GPU, you can use any of the cloud platforms like Google Cloud, AWS, Lambda Labs or E2E Networks, depending on which region you are and what works for your budget. For this particular guide, we used Google Cloud startup credits (thank you Google).
In many cloud GPU setups, conda is pre-installed. However, if it isn’t then follow the steps below:
The last step installs conda in your home directory. Once this is done, you need to restart your shell.
Then move to the next step below.
Steps to Fine-Tuning Mistral 7B Using Axolotl
Axolotl is a great tool, but its documentation is not easy to follow. Also, if you don’t follow the steps below exactly, you may encounter hard-to-debug errors.
Let’s start.
Installation of Axolotl
To begin with, we need to create a conda environment.
base_model: this is the pre-trained model you are fine-tuning
model_type: Mistral is a causal LLM, so keep the value as is
tokenizer: the tokenizer format is the one used by Llama, so we can keep it as is
load_in_8bit: false if don’t want to quantize the model to 8 bit
load_in_4bit: false if you don’t want to quantize the model to 4 bit
datasets: this contains the dataset and the type
output_dir: set this to the output directory
wandb_*: if you want to use wandb for tracking your model training, you can set the values here
epochs: number of epochs you want to train for
Dataset
To train using Axolotl, the dataset should be in jsonl format. The format looks like this:
{"conversations": [{"from": "Customer", "value": "\"<Customer>: Who is the Founder of Apple\""}, {"from": "gpt", "value": "\"<Chatbot>: The founder of Apple is Steve Jobs\""}]}
{"conversations": [{"from": "Customer", "value": "\"<Customer>: What is the capital of France?\""}, {"from": "gpt", "value": "\"<Chatbot>: The capital of France is Paris.\""}]}
{"conversations": [{"from": "Customer", "value": "\"<Customer>: How far is the Moon from Earth?\""}, {"from": "gpt", "value": "\"<Chatbot>: The Moon is approximately 384,400 kilometers from Earth.\""}]}
…and so on.
Essentially, jsonl stores each JSON object as a separate line within a file, making it ideal for streaming large datasets.
For this training, we will use mhenrichsen/alpaca_2k_test which has been preformatted in jsonl format.
Works on various setups: 32-bit full-tuning, 16-bit freeze-tuning, 16-bit LoRA, 2/4/8-bit QLoRA via AQLM/AWQ/GPTQ/LLM.int8. This allows you to decide the method based on your configuration.
Integrates with training monitors: LlamaBoard, TensorBoard, Wandb, MLflow, etc.
It also has a UI, and points to a future where LLM fine-tuning will be mostly a no-code affair.
cd /home/(username)/workspace/
git clone https://github.com/hiyouga/LLaMA-Factory.git
conda create -n llama_factory python=3.10
conda activate llama_factory
cd LLaMA-Factory
pip install -r requirements.txt
This completes the installation step.
Pre-Training
To pre-train a model using Llama-Factory, you can do it using a single command. We will assume that you are already inside the LLaMA-Factory directory.
This will enable you to download gated models from HuggingFace, such as Mistral, Gemma or Llama2 (make sure you have gone to model repo, requested access and received it).
For pre-training using LoRA, the following command does the job:
You will have to wait a bit, while the model downloads (you won’t see any output when that is happening). Once that happens, you will start seeing the training steps:
As you can see, both the tools allow you to perform pre-training or fine-tuning of a range of LLMs. Axolotl is a little harder because the installation steps and dependencies are not clearly defined. LLaMa-Factory works out of the box without any problems. Ensure you use an A100 or Ampere GPU.
At Superteams.ai, we offer fully-managed fractional AI teams to solve business problems in a variety of domains. Leverage the power of AI today by reaching out to us at info@superteams.ai.