Skip to content

SauravP97/tiny-stories-hf

Repository files navigation

Reproducing TinyStories small language models (SLM)

License: MIT Python PyTorch

This repository contains a reproduction of the TinyStories language models described in the paper "TinyStories: How Small Can Language Models Be and Still Speak Coherent English?" by Ronen Eldan and Yuanzhi Li.

Paper Front

The goal of this project is to demonstrate that a very small transformer model, when trained on a simplified, synthetic dataset, can generate fluent, grammatically correct, and consistent short stories.

🌟 Inference

On UI

web-ui

Execute the Flask server:

python app.py

Head-over to the Local Server end-point in your browser to talk to the model.

In Terminal

cd tiny-stories-with-hf
python inference.py "<YOUR_PROMPT_HERE>"

📄 Abstract

"We introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to train and evaluate LMs that are much smaller than the state-of-the-art models (below 10 million total parameters)... yet still produce fluent and consistent stories." — Eldan & Li (2023)

🧩 Model Architecture

This model is a decoder-only Transformer (GPT-style) designed to fit within 10M trainable parameters.

Configuration 1: Model Size S (3.6M) - HuggingFace

Hyperparameter Value
Parameters ~3.6 Million
Attention Layers 8
Hidden Dimension (Embedding Dimensions) 64
Attention Heads per Layer 16
Context Window 512 tokens
Vocab Size ~50,257 (GPT-Neo tokenizer)
Dropout 0.1
Learning Rate 5e-4

Configuration 2: Model Size M (19.3M) - HuggingFace

Hyperparameter Value
Parameters ~19.3 Million
Attention Layers 8
Hidden Dimension (Embedding Dimensions) 256
Attention Heads per Layer 16
Context Window 512 tokens
Vocab Size ~50,257 (GPT-Neo tokenizer)
Dropout 0.1
Learning Rate 5e-4

🚀 Training Procedure

Configuration 1: Model Size S (3.6M)

The model was trained from scratch on a NVIDIA T4 GPU for around 3 hours to achieve a loss of 2.17. The model was trained for 0.22 epochs estimating around 55K steps. We used EleutherAI/gpt-neo-125M tokenizer model training and inference.

  • Training Hyper-parameters
    • Training regime:
    • Epochs: 0.22
    • Loss: 2.17
    • GPU: NVIDIA T4
    • Training Steps: 55,000
    • Training Time: ~3 hours

Configuration 2: Model Size M (19.3M)

The model was trained from scratch on a NVIDIA A100 GPU for around 4 hours 40 minutes to achieve a loss of 1.40. The model was trained for 1 epoch estimating around 265K steps. We used EleutherAI/gpt-neo-125M tokenizer model training and inference.

  • Training Hyper-parameters
    • Training regime:
    • Epochs: 1
    • Loss: 1.40
    • GPU: NVIDIA A100
    • Training Steps: 264,965
    • Training Time: ~4 hours 40 minutes

🏄 Playing with 19M parameter model

sample chats 19M sample chats 19M

📚 Dataset

The model was trained on the TinyStories dataset, which consists of synthetic short stories generated by GPT-3.5/4. The stories use a restricted vocabulary typical of a 3-year-old child.

About

Train Tiny Stories dataset from the paper - "TinyStories: How Small Can Language Models Be and Still Speak Coherent English?"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors