System Info
transformers version: 4.57.0.dev0
- Platform: Linux-5.10.134-008.18.kangaroo.al8.x86_64-x86_64-with-glibc2.39
- Python version: 3.12.7+gc
- Huggingface_hub version: 1.0.0.rc2
- Safetensors version: 0.5.3
- Accelerate version: 1.7.0
- Accelerate config: not found
- DeepSpeed version: 0.16.9+ali
- PyTorch version (accelerator?): 2.6.0+ali.7.post2.ppu1.5.2.cu126 (CUDA)
- Using distributed or parallel set-up in script?:
- Using GPU in script?:
- GPU type: PPU-ZW810E
Who can help?
When using the --lr-scheduler-kwargs option with a dict argument, such as '{"min_lr": 1e-06}', the parser reports the following error:
test.py: error: argument --lr_scheduler_kwargs/--lr-scheduler-kwargs: invalid dict value: '{"min_lr": 1e-06}'
It appears that this error was introduced in a61fc6a (title: "Fix typing of train_args"). It works as expected when reversing the data type of lr_scheduler_kwargs back to Optional[Union[dict[str, Any], str]].
Information
Tasks
Reproduction
You can simply reproduce it with a short script, such as:
from transformers import TrainingArguments, HfArgumentParser
parser = HfArgumentParser((TrainingArguments,))
training_args, = parser.parse_args_into_dataclasses()
and then run it with python test.py --lr-scheduler-kwargs '{"min_lr": 1e-06}'
It will report the following error:
usage: test.py [-h] [--output_dir OUTPUT_DIR] [--overwrite_output_dir [OVERWRITE_OUTPUT_DIR]] [--do_train [DO_TRAIN]] [--do_eval [DO_EVAL]]
[--do_predict [DO_PREDICT]] [--eval_strategy {no,steps,epoch}] [--prediction_loss_only [PREDICTION_LOSS_ONLY]] [--per_device_train_batch_size PER_DEVICE_TRAIN_BATCH_SIZE] [--per_device_eval_batch_size PER_DEVICE_EVAL_BATCH_SIZE] [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS] [--eval_accumulation_steps EVAL_ACCUMULATION_STEPS] [--eval_delay EVAL_DELAY] [--torch_empty_cache_steps TORCH_EMPTY_CACHE_STEPS] [--learning_rate LEARNING_RATE] [--weight_decay WEIGHT_DECAY] [--adam_beta1 ADAM_BETA1] [--adam_beta2 ADAM_BETA2] [--adam_epsilon ADAM_EPSILON] [--max_grad_norm MAX_GRAD_NORM] [--num_train_epochs NUM_TRAIN_EPOCHS]
...
[--eval_use_gather_object [EVAL_USE_GATHER_OBJECT]] [--average_tokens_across_devices [AVERAGE_TOKENS_ACROSS_DEVICES]]
[--no_average_tokens_across_devices]
test.py: error: argument --lr_scheduler_kwargs/--lr-scheduler-kwargs: invalid dict value: '{"min_lr": 1e-06}'
Expected behavior
It's expected that the parser parses the string '{"min_lr": 1e-06}' as a dictionary.
System Info
transformersversion: 4.57.0.dev0Who can help?
When using the
--lr-scheduler-kwargsoption with a dict argument, such as'{"min_lr": 1e-06}', the parser reports the following error:It appears that this error was introduced in a61fc6a (title: "Fix typing of train_args"). It works as expected when reversing the data type of
lr_scheduler_kwargsback toOptional[Union[dict[str, Any], str]].Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
You can simply reproduce it with a short script, such as:
and then run it with
python test.py --lr-scheduler-kwargs '{"min_lr": 1e-06}'It will report the following error:
Expected behavior
It's expected that the parser parses the string
'{"min_lr": 1e-06}'as a dictionary.