-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Closed
Description
Environment info
transformersversion: 3.3.1- Platform: Darwin-19.6.0-x86_64-i386-64bit
- Python version: 3.7.7
- PyTorch version (GPU?): 1.6.0 (False)
- Tensorflow version (GPU?): 2.2.0 (False)
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help
@sshleifer (from activity on Adafactor PRs)
Information
Model I am using (Bert, XLNet ...): T5
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
To reproduce
The Adafactor docs recommend the following for T5 : Adafactor(model.parameters(), lr=1e-3, relative_step=False, warmup_init=True)
However, the init code then has:
if lr is not None and relative_step:
raise ValueError("Cannot combine manual lr and relative_step options")
if warmup_init and not relative_step:
raise ValueError("warmup_init requires relative_step=True")
which makes this setting impossible (as well as just changing to relative_step=True). So something seems to be missing either in the recommendations or in the implementation.
Thanks!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels