Recommended Adafactor settings for T5 cause error

## Environment info
- `transformers` version: 3.3.1
- Platform: Darwin-19.6.0-x86_64-i386-64bit
- Python version: 3.7.7
- PyTorch version (GPU?): 1.6.0 (False)
- Tensorflow version (GPU?): 2.2.0 (False)
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No

### Who can help
@sshleifer   (from activity on Adafactor PRs)

## Information

Model I am using (Bert, XLNet ...): T5

The problem arises when using:
* [ ] the official example scripts: (give details below)
* [X] my own modified scripts: (give details below)

The tasks I am working on is:
* [ ] an official GLUE/SQUaD task: (give the name)
* [X] my own task or dataset: (give details below)

## To reproduce

The Adafactor docs recommend the following for T5 : `Adafactor(model.parameters(), lr=1e-3, relative_step=False, warmup_init=True)`

However, the init code then has:
```
        if lr is not None and relative_step:
            raise ValueError("Cannot combine manual lr and relative_step options")
        if warmup_init and not relative_step:
            raise ValueError("warmup_init requires relative_step=True")
```
which makes this setting impossible (as well as just changing to `relative_step=True`). So something seems to be missing either in the recommendations or in the implementation.

Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommended Adafactor settings for T5 cause error #7789

Environment info

Who can help

Information

To reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Recommended Adafactor settings for T5 cause error #7789

Description

Environment info

Who can help

Information

To reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions