Tiny-LLM

This project implements a minimal Large Language Model (LLM) using PyTorch. It includes a complete training pipeline with custom tokenization, transformer architecture, and text generation.

Results

With ~51.2M parameters, the model converges to a validation loss of 3.5 and perplexity of 34 on WikiText-103 after 50,000 iterations.

Example of generated text:

World War II was a global conflict of interest since it did not have the support of many major political parties such as the National Socialist Workers' Union for Germany, the Social Workers' Union for the United Nations and the United Nations, and the Warsaw Pact. The war was the setting for the war and the installation of a national election. The war was not resolved until the end of World War II, when the Soviet Union had already begun to face a crisis.

= = = Critical reception = = =

After the war, the Navy adopted the National Guard as its Director of Naval Artillery. It was replaced by the Army of the Navy's Navy, and was reduced to a Navy Reserve Training class, a naval career in the Navy Reserve. In 1946, the Navy was renamed the Navy Reserve Service Medal, with the Navy's rank of major. She was reclassified as a ship on 1 July 1947, and was renamed to the Navy Reserve Service Medal. In 1949, she was converted into a storage hulk for the Navy's Army Reserve Fleet, and renamed to the Naval Reserve Fleet. She was converted into an aircraft carrier in 1950.

Architecture

The model implements a standard decoder-only transformer architecture (src/core/model.py):

1. Token + Positional Embeddings (512d)
2. Dropout (0.1)
3. Transformer Blocks (x8):
   a. LayerNorm → Multi-Head Attention (8 heads) → Residual
   b. LayerNorm → Feed Forward (2048d) → GELU → Residual
4. Final LayerNorm
5. Linear Projection (512d → vocab_size)

Key Implementation Details

Attention Mechanism: Multi-head scaled dot-product attention with causal masking
Normalization: Pre-norm architecture using LayerNorm before attention and feed-forward blocks
Activation: GELU activation functions in feed-forward networks
Position Encoding: Learnable positional embeddings up to context length

Training Pipeline

Dataset Processing

The system uses WikiText-103 with preprocessing in src/core/dataset.py:

Tokenization: GPT-2 BPE encoding via tiktoken
Chunking: Sequences grouped into 512-token blocks
Splitting: 99%/1% train/validation split with shuffling

Training Configuration

TrainingConfig(
    max_iterations=100000,
    betas=(0.9, 0.95),           # AdamW optimizer
    max_learning_rate=6e-4,
    min_learning_rate=6e-5,      # Cosine annealing schedule
    gradient_accumulation_steps=4,
    max_gradient_norm=1.0,
    weight_decay=0.1
)

Optimization Features

Mixed Precision: Automatic mixed precision with bfloat16
Gradient Scaling: CUDA gradient scaler for numerical stability
Learning Rate Scheduling: Cosine annealing from max to min learning rate
Gradient Clipping: Global norm clipping at 1.0 to prevent gradient explosion
Model Compilation: PyTorch 2.0 compilation for inference acceleration

Installation and Usage

Setup

git clone https://github.com/RobinLmn/tiny-llm
cd tiny-llm
pip install -r requirements.txt

Training

python src/main.py

The training script automatically handles dataset download, tokenization, and model checkpointing every 5,000 iterations to models/tiny/.

Inference

from core.utils import load_model
from core.tokenizer import SubWordTokenizer
from core.generation import GenerationConfig, generate_text

# Load trained checkpoint
tokenizer = SubWordTokenizer("gpt2")
model = load_model(model_config, tokenizer, "models/tiny/tiny-llm.pth")

# Generate text with temperature sampling
config = GenerationConfig(maximum_length=200, temperature=0.8)
text = generate_text(model, tokenizer, "World War II was", config)

Dependencies

PyTorch >= 2.0.0 (required for scaled_dot_product_attention)
tiktoken (GPT-2 tokenizer)
datasets (HuggingFace data loading)
rich (progress visualization)

License

See LICENSE for details. Inspired by nanoGPT by Andrej Karpathy. Check his work for more details and examples!

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
docs		docs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tiny-LLM

Results

Architecture

Key Implementation Details

Training Pipeline

Dataset Processing

Training Configuration

Optimization Features

Installation and Usage

Setup

Training

Inference

Dependencies

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tiny-LLM

Results

Architecture

Key Implementation Details

Training Pipeline

Dataset Processing

Training Configuration

Optimization Features

Installation and Usage

Setup

Training

Inference

Dependencies

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages