can't resume lora training due to wandb logging num params

### System Info

Hi,
I have some trained checkpoints that i'd like to resume from 
all of them are lora checkpoints 
but when resuming i get the following error in trainer

```
 trainer.train(resume_from_checkpoint=script_args.resume_from_checkpoint)
  File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1938, in train
    return inner_training_loop(
  File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2202, in _inner_training_loop
    self.control = self.callback_handler.on_train_begin(args, self.state, self.control)
  File "/opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py", line 460, in on_train_begin
    return self.call_event("on_train_begin", args, state, control)
  File "/opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py", line 507, in call_event
    result = getattr(callback, event)(
  File "/opt/conda/lib/python3.10/site-packages/transformers/integrations/integration_utils.py", line 900, in on_train_begin
    self.setup(args, state, model, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/integrations/integration_utils.py", line 853, in setup
    self._wandb.config["model/num_parameters"] = model.num_parameters()
  File "/root/.local/lib/python3.10/site-packages/wandb/sdk/wandb_config.py", line 149, in __setitem__
    key, val = self._sanitize(key, val)
  File "/root/.local/lib/python3.10/site-packages/wandb/sdk/wandb_config.py", line 258, in _sanitize
    raise config_util.ConfigError(
wandb.sdk.lib.config_util.ConfigError: Attempted to change value of key "model/num_parameters" from 0 to 266240
```
I assume that the fact that this is a lora training is relevant because the error describes a change in number or params (which shouldn't be logged as 0 from the first place)

and even though in line [integration_util.py#L838](https://github.com/huggingface/transformers/blob/43df47d8e78238021a4273746fc469336f948314/src/transformers/integrations/integration_utils.py#L838)

the wandb config dict was set with allow_val_change=True
i still get the above error. in line [/integration_utils.py#L853](https://github.com/huggingface/transformers/blob/43df47d8e78238021a4273746fc469336f948314/src/transformers/integrations/integration_utils.py#L853)

any idea on how to solve this?

Thanks


### Who can help?

@muellerzr @SunMarc

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

step 1. train a small model with dpo lora
step 2. try to resume with `trainer.train(resume_from_checkpoint=True)`
while setting 

```
 os.environ["WANDB_RESUME"] = "allow"
os.environ["WANDB_RUN_ID"] = script_args.run_id # same run_id as previous run
DPOConfig(
    output_dir=script_args.output_dir # same previous run out dir
    run_name=script_args.run_name # same run_name as previous run 
..
)

...

trainer.train(resume_from_checkpoint=True)
```

### Expected behavior

being able to resume previous training 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can't resume lora training due to wandb logging num params #33320

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

can't resume lora training due to wandb logging num params #33320

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions