Performance Regression from commit 7dcd870

### System Info

- `transformers` version: 4.28.0.dev0 (656e869a4523f6a0ce90b3aacbb05cc8fb5794bb)
- Platform: Linux-5.15.0-67-generic-x86_64-with-glibc2.35
- Python version: 3.10.10
- Huggingface_hub version: 0.13.4
- Safetensors version: 0.3.0
- PyTorch version (GPU?): 2.0.0 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: True
- Using distributed or parallel set-up in script?: False

### Who can help?

@ArthurZucker @younesbelkada 

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

I have a benchmark script which benchmarks the generation speed of different LLaMA models.  Before commit 7dcd870 my generation speed averaged around 48 tokens/s in ideal cases, RTX 3090.  After that commit the average speed is 43 tokens/s.

The specific issue seems to be the change to `apply_rotary_pos_emb`.  My guess is the change from a rather simple slicing of two Tensors to a scatter-gather.

To test my theory I patched `apply_rotary_pos_emb` to its pre 7dcd870 state, and minimally modified `LlamaAttention` accordingly.  No other modifications.  Speed jumped back to 48 tokens/s.

The problem should apply generally, but the specific script I'm using is: https://github.com/fpgaminer/GPTQ-triton/blob/99ec4a3adb7fad9de33ff026bbfb64cbb3bab2f8/benchmark_generate.py

### Expected behavior

I would not expect a 10% drop in performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Regression from commit 7dcd870 #22683

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance Regression from commit 7dcd870 #22683

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions