Skip to content

Recommended Adafactor settings for T5 cause error #7789

@OyvindTafjord

Description

@OyvindTafjord

Environment info

  • transformers version: 3.3.1
  • Platform: Darwin-19.6.0-x86_64-i386-64bit
  • Python version: 3.7.7
  • PyTorch version (GPU?): 1.6.0 (False)
  • Tensorflow version (GPU?): 2.2.0 (False)
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Who can help

@sshleifer (from activity on Adafactor PRs)

Information

Model I am using (Bert, XLNet ...): T5

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

The Adafactor docs recommend the following for T5 : Adafactor(model.parameters(), lr=1e-3, relative_step=False, warmup_init=True)

However, the init code then has:

        if lr is not None and relative_step:
            raise ValueError("Cannot combine manual lr and relative_step options")
        if warmup_init and not relative_step:
            raise ValueError("warmup_init requires relative_step=True")

which makes this setting impossible (as well as just changing to relative_step=True). So something seems to be missing either in the recommendations or in the implementation.

Thanks!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions