word_language_model

Word-level Language Modeling using RNN and Transformer

This example trains a multi-layer RNN (Elman, GRU, or LSTM) or Transformer on a language modeling task. By default, the training script uses the Wikitext-2 dataset, provided. The trained model can then be used by the generate script to generate new text.

python main.py --accel --epochs 6           # Train a LSTM on Wikitext-2.
python main.py --accel --epochs 6 --tied    # Train a tied LSTM on Wikitext-2.
python main.py --accel --tied               # Train a tied LSTM on Wikitext-2for 40 epochs.
python main.py --accel --epochs 6 --model Transformer --lr 5
                                            # Train a Transformer model on Wikitext-2.
python main.py --accel --epochs 6 --model Transformer --use-optimizer --lr 0.001
                                            # Train a Transformer model with AdamW optimizer on Wikitext-2.

python generate.py --accel                  # Generate samples from the default model checkpoint.

Note

Example supports running on acceleration devices (CUDA, MPS, XPU)

The model uses the nn.RNN module (and its sister modules nn.GRU and nn.LSTM) or Transformer module (nn.TransformerEncoder and nn.TransformerEncoderLayer) which will automatically use the cuDNN backend if run on CUDA with cuDNN installed.

During training, if a keyboard interrupt (Ctrl-C) is received, training is stopped and the current model is evaluated against the test dataset.

The main.py script accepts the following arguments:

optional arguments:
  -h, --help            show this help message and exit
  --data DATA           location of the data corpus
  --model MODEL         type of network (RNN_TANH, RNN_RELU, LSTM, GRU, Transformer)
  --emsize EMSIZE       size of word embeddings
  --nhid NHID           number of hidden units per layer
  --nlayers NLAYERS     number of layers
  --lr LR               initial learning rate
  --clip CLIP           gradient clipping
  --epochs EPOCHS       upper epoch limit
  --batch_size N        batch size
  --bptt BPTT           sequence length
  --dropout DROPOUT     dropout applied to layers (0 = no dropout)
  --tied                tie the word embedding and softmax weights
  --seed SEED           random seed
  --accel               use accelerator
  --log-interval N      report interval
  --save SAVE           path to save the final model
  --onnx-export ONNX_EXPORT
                        path to export the final model in onnx format
  --nhead NHEAD         the number of heads in the encoder/decoder of the transformer model
  --dry-run             verify the code and the model
  --use-optimizer       specify whether to use an AdamW optimizer

With these arguments, a variety of models can be tested. As an example, the following arguments produce slower but better models:

python main.py --accel --emsize 650 --nhid 650 --dropout 0.5 --epochs 40
python main.py --accel --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied
python main.py --accel --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40
python main.py --accel --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tied

Name		Name	Last commit message	Last commit date
parent directory ..
data/wikitext-2		data/wikitext-2
README.md		README.md
data.py		data.py
generate.py		generate.py
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Word-level Language Modeling using RNN and Transformer

FilesExpand file tree

word_language_model

Directory actions

More options

Directory actions

More options

Latest commit

History

word_language_model

Folders and files

parent directory

README.md

Word-level Language Modeling using RNN and Transformer