Guide to Fine-Tune Your LLM for Building Your Own Financial Advisor

Large language models (LLMs) are rapidly advancing and garnering significant attention in the field of generative AI. Companies are not just curious; they are intensely interested in LLMs and are actively seeking ways to incorporate them into their operations. Enormous sums of money have been invested in LLM research and development recently. As the frontier of LLMs continues to expand, staying well-informed is crucial. The potential benefits LLMs could offer your business hinge on your understanding of this technology.

Now let’s get to our use case. Many legal professionals aspire to transition into financial advisory roles but lack the specialized skills and knowledge necessary to excel in this field. While they may possess a strong foundation in law, they often struggle to adapt their expertise to the intricacies of financial advising. This knowledge gap can hinder their ability to effectively advise clients on complex financial matters and may limit their career advancement opportunities.

Reference Colab Notebook:

Financial-Advisor Tutorial (LLAMA Finetune).ipynb

What Is Fine-Tuning?

In deep learning, fine-tuning represents a subset of transfer learning. It encompasses the process of modifying the internal parameters of a pre-trained model originally trained on a broad dataset for a general task, such as image recognition or natural language understanding. The objective is to refine the model's performance for a specific related task without initiating the training process anew.

Why Fine-Tuning?

Streamlining the process

Creating a large language model from the ground up demands a significant investment of time and computational resources. Contrastingly, fine-tuning enables us to build on a pre-existing model, substantially reducing the time and resources needed to attain favorable outcomes. By commencing with a model that has already absorbed relevant features, we can bypass the initial training stages and concentrate on tailoring the model to the particular task.

Enhanced effectiveness

Pre-trained models have undergone extensive training on expansive datasets for general tasks. This signifies that they have already acquired valuable features and patterns that can prove advantageous for related tasks. Through fine-tuning a pre-trained model, we can harness this reservoir of knowledge and representations, resulting in heightened performance on our specific task.

Optimizing data utilization

In numerous real-world scenarios, procuring labeled data for a particular task can pose challenges and consume significant time. Fine-tuning serves as a solution by enabling us to efficiently train models even when faced with limited labeled data. By initiating with a pre-trained model and adjusting it to our specific task, we can maximize the utility of available labeled data, achieving commendable results with reduced effort.

How to Fine-Tune a Model?

Fine-tuning a model involves taking a pre-trained model and adapting it to a specific task or domain by adjusting its parameters on a smaller, task-specific dataset. This process typically involves freezing some layers of the model to retain the general knowledge learned during pre-training while updating others to specialize in the new task. Fine-tuning aims to optimize the model's performance for the specific task by iteratively adjusting hyperparameters, such as learning rate and batch size, and evaluating the model's performance on a validation set. Careful consideration of the trade-offs between overfitting and underfitting is crucial, often requiring regularization techniques and monitoring of performance metrics. Finally, the fine-tuned model is evaluated on a separate test dataset to assess its generalization ability before deployment.

Cloud Resources Required

E2E Cloud offers a comprehensive suite of services, including advanced cloud GPUs featuring HGX 8xH100, A100, L40S, and more. Businesses can harness the power of high-performance computing to propel their AI and ML workloads forward.

Let’s Code

Installing important libraries:

!pip install -qqq datasets accelerate bitsandbytes peft transformers evaluate trl

‍

import torch
import time
import evaluate
import pandas as pd
import numpy as np
from datasets import Dataset, load_dataset
import random

Finding the right dataset:

You can refer to https://huggingface.co/datasets for related datasets. In this tutorial, we’re using https://huggingface.co/datasets/nihiluis/financial-advisor-100. which is a QnA dataset related to financial advisory.

huggingface_dataset_name = "nihiluis/financial-advisor-100"

dataset = load_dataset(huggingface_dataset_name)

dataset

Output: DatasetDict({ train: Dataset({ features: ['Sub-Concept', 'Question'], num_rows: 871 }) })

Making Instruction Dataset:

Fine-tuning a custom LLM with your own data can bridge this gap, and data preparation is the first step in this process. It is also a crucial step that can significantly influence your fine-tuned model’s performance.

def format_instruction(question: str, answer: str):
  instruction_template = """
### Instruction:
Task: Provide detailed answers corresponding to the given questions.
Topic: {question}

### Guidelines:
1. Contextual Understanding: Ensure your answers are relevant and based on the context of the question.
2. Clarity and Detail: Elaborate on your responses to provide comprehensive explanations.
3. Accuracy: Verify the accuracy of your answers before submission.
4. Language Quality: Maintain a clear and concise writing style, free of grammatical errors.

Example:

### Question:
{question}

### Answer:
{answer}
""".strip()

def generate_instruction_dataset(data_point):

  return {
    "question": data_point["question"],
    "answer": data_point["answer"],
    "text": format_instruction(data_point["question"],data_point["answer"])}
    
def process_dataset(data: Dataset):
  return (
    data.shuffle(seed=42)
    .map(generate_instruction_dataset).remove_columns(['id'])
  )

Train-test Split:

Train-test split is a common technique used in machine learning to evaluate the performance of a model. It involves dividing a dataset into two subsets: one for training the model and the other for testing its performance. Typically, the majority of the data (e.g., 70-80%) is allocated to the training set, while the remaining portion is allocated to the test set.

The training set is used to train the model by adjusting its parameters to minimize a chosen objective function, such as minimizing prediction error or maximizing accuracy. Once the model is trained, it is evaluated on the test set to assess its performance on unseen data. This evaluation helps estimate how well the model will generalize to new, unseen examples.

dataset["train"] = process_dataset(dataset["train"])

train_data = dataset['train'].shuffle(seed=42).select([i for i in range(80)]) #Depends on the size of dataset

test_data = dataset['train'].shuffle(seed=42).select([i for i in range(80,100)]) #80-20 split


train_data,test_data

Loading the LLM:

Loading the Language Model (LLM) for fine-tuning refers to the process of initializing a pre-trained language model (such as GPT, BERT, LLAMA, Mistral, etc.) and preparing it for further training on a specific task or dataset. This typically involves loading the pre-trained weights and architecture of the language model into memory and configuring the model for fine-tuning.

The steps involved in loading an LLM for fine-tuning might include:

1. Selecting the Pre-trained Model: Choose the pre-trained language model that best suits your task and requirements.

2. Loading Pre-trained Weights: Load the pre-trained weights of the chosen model into memory. These weights contain the learned parameters of the model, capturing the knowledge gained during pre-training on large-scale text data.

3. Configuring Model Architecture: Initialize the model architecture according to the specifications of the pre-trained model. This may involve setting up the layers, activation functions, and other architectural components.

4. Adjusting Hyperparameters: Fine-tuning may involve adjusting hyperparameters such as learning rate, batch size, and optimizer settings. These hyperparameters affect how the model learns from the new dataset during fine-tuning.

5. Freezing or Adjusting Layers: Decide whether to freeze certain layers of the pre-trained model to preserve their learned representations or allow them to be updated during fine-tuning. For example, in transfer learning scenarios, earlier layers are often frozen to retain general features, while later layers may be fine-tuned to adapt to task-specific nuances.

By following these steps, you can load a pre-trained language model and fine-tune it to achieve high performance on your target task or dataset.

We’ll be using togethercomputer/Llama-2-7B-32K-Instruct from Hugging Face. Llama-2-7B-32K-Instruct is an open-source, long-context chat model fine-tuned from Llama-2-7B-32K, over high-quality instruction and chat data.

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

model_id =  "togethercomputer/Llama-2-7B-32K-Instruct"
# model_id = "meta-llama/Llama-2-13b-chat-hf"
bnb_config = BitsAndBytesConfig(
  load_in_4bit=True,
  bnb_4bit_use_double_quant=True,
  bnb_4bit_quant_type="nf4",
  bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Printing trainable parameters:

Trainable parameters refer to the parameters within a machine learning model that are updated or "learned" during the training process. These parameters are adjusted through optimization algorithms such as gradient descent to minimize a chosen loss function, leading to improved model performance on the training data.

In neural network models, trainable parameters typically include:

1. Weights: These are the parameters that govern the strength of connections between neurons in different layers of the network. Each connection between neurons has an associated weight that determines the influence of the input on the output.

2. Biases: Biases are additional parameters in neural network layers that allow the model to capture shifts or offsets in the data. They are added to the weighted sum of inputs before passing through an activation function.

During the training process, the values of these trainable parameters are adjusted iteratively to minimize the difference between the model's predictions and the actual target values in the training data. This adjustment is achieved by computing gradients of the loss function with respect to the trainable parameters and updating the parameters in the direction that decreases the loss.

It's worth noting that not all parameters in a model may be trainable. In transfer learning or fine-tuning scenarios, for example, some parameters may be frozen and kept fixed, while only a subset of parameters is updated during training to adapt the model to a new task or domain.

def print_trainable_parameters(model):
  """
  Prints the number of trainable parameters in the model.
  """
  trainable_params = 0
  all_param = 0
  for _, param in model.named_parameters():
  
     all_param += param.numel()
     if param.requires_grad:
         trainable_params += param.numel()
  print(
     f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100  trainable_params / all_param}"
  )

Using PEFT and LoRA Config:

Low-Rank Adaptation (LoRA) is a PEFT method that decomposes a large matrix into two smaller low-rank matrices in the attention layers. This drastically reduces the number of parameters that need to be fine-tuned.

from peft import prepare_model_for_kbit_training

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

‍

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
   r=16,
   lora_alpha=64,
   # target_modules=["query_key_value"],
   target_modules=["q_proj", "k_proj", "v_proj", "o_proj"], #specific to Llama models.
   lora_dropout=0.1,
   bias="none",
   task_type="CAUSAL_LM"
)
   
model = get_peft_model(model, lora_config)
print_trainable_parameters(model)

Setting up training arguments:

Training arguments, in the context of machine learning, are the configuration settings and parameters that control the training process of a model. These arguments dictate various aspects of the training procedure, such as optimization algorithms, learning rates, batch sizes, number of epochs, regularization techniques, and more. Here's a brief overview of some common training arguments:

1. Optimization Algorithm: Specifies the algorithm used to minimize the loss function during training, such as Stochastic Gradient Descent (SGD), Adam, RMSProp, etc.

2. Learning Rate: Determines the step size taken during optimization. It controls how much the model parameters are updated in each iteration of the training process.

3. Batch Size: Defines the number of data samples used in each iteration of training. Larger batch sizes typically lead to faster training but may require more memory.

4. Number of Epochs: Specifies the number of times the entire dataset is passed forward and backward through the model during training.

5. Loss Function: Defines the objective function used to measure the difference between the model's predictions and the actual target values. Common loss functions include Mean Squared Error (MSE), Cross-Entropy Loss, etc.

6. Regularization: Controls techniques used to prevent overfitting, such as L1 or L2 regularization, dropout, early stopping, etc.

7. Validation Split: Determines the fraction of the training data reserved for validation during training to monitor the model's performance and prevent overfitting.

8. Metrics: Specifies the evaluation metrics used to assess the model's performance during training, such as accuracy, precision, recall, F1-score, etc.

9. Initialization: Controls how model parameters are initialized before training, such as random initialization, pre-trained weights from transfer learning, etc.

10. Data Augmentation: Specifies techniques used to artificially increase the diversity of the training data, such as rotation, flipping, scaling, etc.

These training arguments are crucial in determining the behavior and performance of the trained model and are often tuned empirically to achieve the best results for a given task or dataset.

from transformers import TrainingArguments

OUTPUT_DIR = "llama2-financial-advisor"

training_arguments = TrainingArguments(
   per_device_train_batch_size=4,
   gradient_accumulation_steps=4,
   optim="paged_adamw_32bit",
   logging_steps=1,
   learning_rate=1e-4,
   fp16=True,
   max_grad_norm=0.3,
   num_train_epochs=2,
   evaluation_strategy="steps",
   eval_steps=0.2,
   warmup_ratio=0.05,
   save_strategy="epoch",
   group_by_length=True,
   output_dir=OUTPUT_DIR,
   report_to="tensorboard",
   save_safetensors=True,
   lr_scheduler_type="cosine",
   seed=42,
)
   
model.config.use_cache = True

‍
Ready for training and pushing this to Hub:

SFTTrainer instance from the trl library is initialized for semi-supervised fine-tuning (SFT) of a given model on a specified train dataset, evaluated against a test dataset, utilizing a particular peft_config, and employing a tokenizer for text processing. The maximum sequence length for tokenization is set to 1024 characters. The trainer is configured with training_arguments, likely including parameters such as learning rate, batch size, and number of epochs. Finally, the trainer is invoked to commence training on the specified datasets, leveraging the semi-supervised fine-tuning methodology to enhance model performance.

from trl import SFTTrainer
trainer = SFTTrainer(
   model=model,
   train_dataset=train_data,
   eval_dataset=test_data,
   peft_config=lora_config,
   dataset_text_field="text",
   max_seq_length=1024,
   tokenizer=tokenizer,
   args=training_arguments,
)

trainer.train()
from huggingface_hub import notebook_login

notebook_login()

‍

model.push_to_hub(
 "your-repo-name/sample-name", use_auth_token=True
)

BEFORE FINE TUNING

AFTER FINE TUNING

Conclusion

In conclusion, fine-tuning large language models (LLMs) offers a powerful approach to adapt pre-trained models to specific domains and tasks. By leveraging a pre-trained model and fine-tuning it on a smaller, task-specific dataset, you can achieve improved performance and generate more relevant and accurate responses.

The tutorial demonstrated the key steps involved in fine-tuning Llama2, including preparing the dataset, loading the pre-trained model, configuring the model architecture with techniques like LoRA, setting up training arguments, and performing the actual fine-tuning using the SFTTrainer from the TRL library.

The comparison of responses before and after fine-tuning highlights the effectiveness of this approach. The fine-tuned model generates more detailed, contextually relevant, and domain-specific answers compared to the pre-trained model. This showcases how fine-tuning allows the model to specialize and provide higher-quality outputs for the target task.