This repo holds a collection of Jupyter notebooks to fine-tune Large Language Models from scratch.
Research shows that the pattern-recognition abilities of foundation language models are so powerful that they sometimes require relatively little additional training to learn specific tasks. That additional training helps the model make better predictions on a specific task. This additional training, called fine-tuning, unlocks an LLM's practical side.
Read more about Fine-tuning process here: View.
- 🌟 Notebook: Fine-tune GPT2 (Small) 125 Million parameter model for classifying spam messages.
- 🌻 Notebook: Fine-tune TinyStories 19M model to summarize stories.
Notebook: Fine-tune Llama 3.2 1 Billion parameter model for summarization task using LoRA.
We use LoRA technique to fine-tune Meta's Llama-3.2 1 Billion parameter model for the summarization task. We have used LoRA to avoid training the entire 1 billion parameters of the model. Since the model is pre-trained and has a decent understanding of language, we can attach additional layers and explicitly train them while keeping the rest of the model weights frozen.
- Frozen Model parameters =
1,235,814,400 - Trainable Model parameters =
13,357,056
What is LoRA?
LoRA (Low-Rank Adaptation of Large Language Models) is a popular and lightweight training technique that significantly reduces the number of trainable parameters. It works by inserting a smaller number of new weights into the model and only these are trained.
For our use case, I have inserted additional LoRA layers with all the Linear layers present in the Llama model. The LoRA layer looks somewhat like this.
class LoRALayer(torch.nn.Module):
def __init__(self, in_dim, out_dim, rank, alpha):
super().__init__()
self.A = torch.nn.Parameter(torch.empty(in_dim, rank, dtype=torch.bfloat16))
torch.nn.init.kaiming_uniform_(self.A, a=math.sqrt(5))
self.B = torch.nn.Parameter(torch.zeros(rank, out_dim, dtype=torch.bfloat16))
self.alpha = alpha
self.rank = rank
def forward(self, x):
x = (self.alpha / self.rank) * (x @ self.A @ self.B)
return xI have clubbed the above shown LoRA Layer with all the Linear Layers present in the Llama model. The model architecture before and after the LoRA layer integration is shown below.


