GPT 124M from Scratch

A complete implementation of a GPT-2 style Large Language Model (124M parameters) built from scratch in PyTorch. This project demonstrates the core components of a Transformer-based language model, including Multi-Head Attention, Feed-Forward Networks, and Positional Embeddings, trained on sample text data.

🚀 Features

124M Parameter Architecture: Configuration matching the GPT-2 Small model.
Custom Tokenizer: Uses tiktoken for efficient BPE tokenization.
Training Loop: Complete training pipeline with validation and checkpointing.
Data Pipeline: Efficient data loading with sliding window context.
Text Generation: Sampling with temperature and top-k filtering.

📂 Project Structure

llm-scratch/
├── model/                  # Model architecture
│   ├── gpt_model.py       # Main GPT class
│   ├── attention.py       # Multi-Head Attention
│   ├── transformer_block.py
│   └── transformer_layers.py
├── training/               # Training utilities
│   ├── training.py        # Training loop
│   └── utils.py           # Loss calculation & plotting
├── data/                   # Data handling
│   ├── dataset.py         # PyTorch Dataset
│   └── dataloader.py      # DataLoader factory
├── main.py                 # Entry point for training
└── gpt_124m_config.json    # Model configuration

🛠️ Installation

Clone the repository
Install dependencies:

pip install torch tiktoken matplotlib

🏃 Usage

Prepare your data: The project defaults to using the-verdict.txt. Ensure this file is present or update main.py to point to your dataset.
Configure the model: stored in gpt_124m_config.json.
Start training:

python main.py

The script will train the model, save the best weights to gpt_124m.pt, and generate loss plots.

📊 Configuration

The gpt_124m_config.json file controls the model hyperparameters:

vocab_size: 50257 (GPT-2 tokenizer)
context_length: 256
emb_dim: 768
n_layers: 12
n_heads: 12
batch_size: 2
epochs: 10

Created for educational purposes to understand LLM internals.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
data		data
fine-tuning		fine-tuning
model		model
training		training
utils		utils
.gitignore		.gitignore
README.md		README.md
gpt_124m_config.json		gpt_124m_config.json
gpt_download.py		gpt_download.py
main.py		main.py
the-verdict.txt		the-verdict.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT 124M from Scratch

🚀 Features

📂 Project Structure

🛠️ Installation

🏃 Usage

📊 Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPT 124M from Scratch

🚀 Features

📂 Project Structure

🛠️ Installation

🏃 Usage

📊 Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages