Potential 0.4.1 Memory Leak for a Fairseq Model

Yesterday, when I was on 0.4.0 I was training a fairseq model (self_att_wp) I could train it fine with batch size of around 4 (technically I control the number of tokens fed into the model). After upgrading to 0.4.1, when training the same model with the same arguments it runs out of memory. Even decreasing the batch size it still runs out of memory after a bit of time with memory just increasing after every couple of batches.

There's an issue in fairseq mention this as well, https://github.com/pytorch/fairseq/issues/232. The issue mentions the memory leak with a different fairseq model, so I'd guess that multiple fairseq models now have this issue. The command used to train the fairseq model I've been using are:

python3.6 train.py data-bin/wikitext_outline_to_target -
a fconv_self_att_wp --lr 0.25 --clip-norm 0.1 --max-tokens 4000 --lr-scheduler reduce_lr_on_plateau --source-lang wikitext_outline --target-lang wikitext_target --max-epoch 25 --no-epoch-checkpoints --save-di
r model4_checkpoints/

You'll need to replace source-lang/target-lang and the data-bin argument by whatever dataset you end up using. The readme here, https://github.com/pytorch/fairseq/tree/master/examples/stories, describes the commands in some more detail to cover that part and trains the same architecture (some variation in exact arguments, but I don't think they'll matter).

edit: More specifically the error was a cuda out of memory error and doing nvidia-smi I could see the memory increasing over time. I also had upgraded to cuda 9.2/cudnn 7.1.4 so it might be an issue there. The OS was ubuntu 16.04.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential 0.4.1 Memory Leak for a Fairseq Model #9942

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential 0.4.1 Memory Leak for a Fairseq Model #9942

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions