Skip to content

GenerationConfig.from_pretrained raise ValueError after training, maybe raise it earlier? #29665

@YiqunChen1999

Description

@YiqunChen1999

System Info

  • transformers version: 4.38.2
  • Platform: Linux-4.18.0-305.3.1.el8.x86_64-x86_64-with-glibc2.28
  • Python version: 3.10.13
  • Huggingface_hub version: 0.21.4
  • Safetensors version: 0.4.2
  • Accelerate version: 0.28.0
  • Accelerate config: - compute_environment: LOCAL_MACHINE
    - distributed_type: MULTI_GPU
    - mixed_precision: no
    - use_cpu: False - debug: False
    - num_processes: 8
    - machine_rank: 0 - num_machines: 1
    - gpu_ids: all
    - rdzv_backend: static - same_network: True - main_training_function: main
    - downcast_bf16: no
    - tpu_use_cluster: False
    - tpu_use_sudo: False
    - tpu_env: []
  • PyTorch version (GPU?): 2.2.1 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

@gante @pacman100 @muellerzr

Raise Errors as Early as Possible: I noticed that GenerationConfig.save_pretrained in transformers/generation/configuration_utils.py will raise a ValueError if the config cannot pass the validation. I think it's better to raise the error earlier (e.g., after self.validate in __init__) instead of raising it in Trainer._save. Users might be upset after several hours of training and finding the model checkpoint is not saved.

For example, finetuning LLaVA will raise this error. Issue#1252 and issue#1144 meet the same phenomenon.

Please correct me if I am wrong. Thanks!

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Maybe no reproduction is necessary.

  1. Install LLaVA by following the guide.
  2. Train and finetune the model by following the guide. Issue#1252 and issue#1144 also give the training script.

Expected behavior

Raise the ValueError before training if PretrainedConfig cannot pass the validation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions