Skip to content

Parsing error with --lr-scheduler-kwargs: invalid dict value #41296

@stephankoe

Description

@stephankoe

System Info

  • transformers version: 4.57.0.dev0
  • Platform: Linux-5.10.134-008.18.kangaroo.al8.x86_64-x86_64-with-glibc2.39
  • Python version: 3.12.7+gc
  • Huggingface_hub version: 1.0.0.rc2
  • Safetensors version: 0.5.3
  • Accelerate version: 1.7.0
  • Accelerate config: not found
  • DeepSpeed version: 0.16.9+ali
  • PyTorch version (accelerator?): 2.6.0+ali.7.post2.ppu1.5.2.cu126 (CUDA)
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: PPU-ZW810E

Who can help?

When using the --lr-scheduler-kwargs option with a dict argument, such as '{"min_lr": 1e-06}', the parser reports the following error:

test.py: error: argument --lr_scheduler_kwargs/--lr-scheduler-kwargs: invalid dict value: '{"min_lr": 1e-06}'

It appears that this error was introduced in a61fc6a (title: "Fix typing of train_args"). It works as expected when reversing the data type of lr_scheduler_kwargs back to Optional[Union[dict[str, Any], str]].

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

You can simply reproduce it with a short script, such as:

from transformers import TrainingArguments, HfArgumentParser

parser = HfArgumentParser((TrainingArguments,))
training_args, = parser.parse_args_into_dataclasses()

and then run it with python test.py --lr-scheduler-kwargs '{"min_lr": 1e-06}'

It will report the following error:

usage: test.py [-h] [--output_dir OUTPUT_DIR] [--overwrite_output_dir [OVERWRITE_OUTPUT_DIR]] [--do_train [DO_TRAIN]] [--do_eval [DO_EVAL]]
            [--do_predict [DO_PREDICT]] [--eval_strategy {no,steps,epoch}] [--prediction_loss_only [PREDICTION_LOSS_ONLY]]                                              [--per_device_train_batch_size PER_DEVICE_TRAIN_BATCH_SIZE] [--per_device_eval_batch_size PER_DEVICE_EVAL_BATCH_SIZE]                                       [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS] [--eval_accumulation_steps EVAL_ACCUMULATION_STEPS] [--eval_delay EVAL_DELAY]                   [--torch_empty_cache_steps TORCH_EMPTY_CACHE_STEPS] [--learning_rate LEARNING_RATE] [--weight_decay WEIGHT_DECAY] [--adam_beta1 ADAM_BETA1]                 [--adam_beta2 ADAM_BETA2] [--adam_epsilon ADAM_EPSILON] [--max_grad_norm MAX_GRAD_NORM] [--num_train_epochs NUM_TRAIN_EPOCHS]
...
            [--eval_use_gather_object [EVAL_USE_GATHER_OBJECT]] [--average_tokens_across_devices [AVERAGE_TOKENS_ACROSS_DEVICES]]
            [--no_average_tokens_across_devices]
test.py: error: argument --lr_scheduler_kwargs/--lr-scheduler-kwargs: invalid dict value: '{"min_lr": 1e-06}'

Expected behavior

It's expected that the parser parses the string '{"min_lr": 1e-06}' as a dictionary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions