Reproducing TinyStories small language models (SLM)

This repository contains a reproduction of the TinyStories language models described in the paper "TinyStories: How Small Can Language Models Be and Still Speak Coherent English?" by Ronen Eldan and Yuanzhi Li.

The goal of this project is to demonstrate that a very small transformer model, when trained on a simplified, synthetic dataset, can generate fluent, grammatically correct, and consistent short stories.

🌟 Inference

On UI

Execute the Flask server:

python app.py

Head-over to the Local Server end-point in your browser to talk to the model.

In Terminal

cd tiny-stories-with-hf
python inference.py "<YOUR_PROMPT_HERE>"

📄 Abstract

"We introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to train and evaluate LMs that are much smaller than the state-of-the-art models (below 10 million total parameters)... yet still produce fluent and consistent stories." — Eldan & Li (2023)

🧩 Model Architecture

This model is a decoder-only Transformer (GPT-style) designed to fit within 10M trainable parameters.

Configuration 1: Model Size `S` (3.6M) - HuggingFace

Hyperparameter	Value
Parameters	~3.6 Million
Attention Layers	8
Hidden Dimension (Embedding Dimensions)	64
Attention Heads per Layer	16
Context Window	512 tokens
Vocab Size	~50,257 (GPT-Neo tokenizer)
Dropout	0.1
Learning Rate	5e-4

Configuration 2: Model Size `M` (19.3M) - HuggingFace

Hyperparameter	Value
Parameters	~19.3 Million
Attention Layers	8
Hidden Dimension (Embedding Dimensions)	256
Attention Heads per Layer	16
Context Window	512 tokens
Vocab Size	~50,257 (GPT-Neo tokenizer)
Dropout	0.1
Learning Rate	5e-4

🚀 Training Procedure

Configuration 1: Model Size `S` (3.6M)

The model was trained from scratch on a NVIDIA T4 GPU for around 3 hours to achieve a loss of 2.17. The model was trained for 0.22 epochs estimating around 55K steps. We used EleutherAI/gpt-neo-125M tokenizer model training and inference.

Training Hyper-parameters
- Training regime:
- Epochs: 0.22
- Loss: 2.17
- GPU: NVIDIA T4
- Training Steps: 55,000
- Training Time: ~3 hours

Configuration 2: Model Size `M` (19.3M)

The model was trained from scratch on a NVIDIA A100 GPU for around 4 hours 40 minutes to achieve a loss of 1.40. The model was trained for 1 epoch estimating around 265K steps. We used EleutherAI/gpt-neo-125M tokenizer model training and inference.

Training Hyper-parameters
- Training regime:
- Epochs: 1
- Loss: 1.40
- GPU: NVIDIA A100
- Training Steps: 264,965
- Training Time: ~4 hours 40 minutes

🏄 Playing with 19M parameter model

📚 Dataset

The model was trained on the TinyStories dataset, which consists of synthetic short stories generated by GPT-3.5/4. The stories use a restricted vocabulary typical of a 3-year-old child.

Source: Hugging Face Datasets (roneneldan/TinyStories)
Size: ~2GB text data

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
__pycache__		__pycache__
media		media
templates		templates
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
inference-with-hf.ipynb		inference-with-hf.ipynb
inference.py		inference.py
inference_streamer.py		inference_streamer.py
train-with-hf.ipynb		train-with-hf.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reproducing TinyStories small language models (SLM)

🌟 Inference

On UI

In Terminal

📄 Abstract

🧩 Model Architecture

Configuration 1: Model Size `S` (3.6M) - HuggingFace

Configuration 2: Model Size `M` (19.3M) - HuggingFace

🚀 Training Procedure

Configuration 1: Model Size `S` (3.6M)

Configuration 2: Model Size `M` (19.3M)

🏄 Playing with 19M parameter model

📚 Dataset

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Reproducing TinyStories small language models (SLM)

🌟 Inference

On UI

In Terminal

📄 Abstract

🧩 Model Architecture

Configuration 1: Model Size S (3.6M) - HuggingFace

Configuration 2: Model Size M (19.3M) - HuggingFace

🚀 Training Procedure

Configuration 1: Model Size S (3.6M)

Configuration 2: Model Size M (19.3M)

🏄 Playing with 19M parameter model

📚 Dataset

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Configuration 1: Model Size `S` (3.6M) - HuggingFace

Configuration 2: Model Size `M` (19.3M) - HuggingFace

Configuration 1: Model Size `S` (3.6M)

Configuration 2: Model Size `M` (19.3M)

Packages