[RT-DETR] No norm freezing for R18

### System Info

## Issue

In the original implementation of the code, RT-DETR R18 [doesn't have its norms frozen](https://github.com/lyuwenyu/RT-DETR/blob/bc0cf9f16c1ae98e925a7495e32c81319a624088/rtdetr_pytorch/configs/rtdetr/rtdetr_r18vd_6x_coco.yml#L16)

I was wondering if this was normal or not and if I can submit a PR to parameterize this.

I don't know if this was intentional because for the same config, the author also excplicitly removes the weight decay with an [optimizer group](https://github.com/lyuwenyu/RT-DETR/blob/bc0cf9f16c1ae98e925a7495e32c81319a624088/rtdetr_pytorch/configs/rtdetr/rtdetr_r18vd_6x_coco.yml#L36-L41). And since it is hard to, by default, force the user to use a param group in transformers (at least I believe) freezing the norm would prevent the user from manually specifying it.

## Specs

- `transformers` version: 4.43.3
- Platform: Linux-6.5.0-45-generic-x86_64-with-glibc2.35
- Python version: 3.11.9
- Huggingface_hub version: 0.24.3
- Safetensors version: 0.4.3
- Accelerate version: 0.32.1
- Accelerate config:    - compute_environment: LOCAL_MACHINE
        - distributed_type: DEEPSPEED
        - mixed_precision: fp16
        - use_cpu: False
        - debug: False
        - num_processes: 2
        - machine_rank: 0
        - num_machines: 1
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - enable_cpu_affinity: False
        - deepspeed_config: {'gradient_accumulation_steps': 1, 'offload_optimizer_device': 'cpu', 'offload_param_device': 'cpu', 'zero3_init_flag': False, 'zero_stage': 2}
        - downcast_bf16: no
        - tpu_use_cluster: False
        - tpu_use_sudo: False
        - tpu_env: []
        - dynamo_config: {'dynamo_backend': 'INDUCTOR'}
- PyTorch version (GPU?): 2.4.0+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: <fill in>
- Using GPU in script?: <fill in>
- GPU type: NVIDIA GeForce RTX 3090

### Who can help?

@amyeroberts

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

```python
print(RTDetrConfig.from_pretrained("PekingU/rtdetr_r18vd_coco_o365"))
# model config...
```

### Expected behavior

```python
print(RTDetrConfig.from_pretrained("PekingU/rtdetr_r18vd_coco_o365"))
# model config with `freeze_norm` parameter
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RT-DETR] No norm freezing for R18 #32604

System Info

Issue

Specs

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RT-DETR] No norm freezing for R18 #32604

Description

System Info

Issue

Specs

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions