Remove unnecessary views of position_ids#26059
Remove unnecessary views of position_ids#26059ArthurZucker merged 2 commits intohuggingface:mainfrom
views of position_ids#26059Conversation
|
For context, I work in the Torch-MLIR project, which converts PyTorch models into MLIR. Handling the |
ArthurZucker
left a comment
There was a problem hiding this comment.
Fine with me, could you confirm that the slow tests pass using RUN_SLOW pytest tests/models/llama ?
|
pinging @gante for a second look |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
|
Running |
|
@gante, bumping this :) |
|
He is off for a while so won't be able to review this! le me ping @fxmarty for a second look! 👁️ |
fxmarty
left a comment
There was a problem hiding this comment.
LGTM, saw the same thing. I believe other architectures have the same issue, e.g.
. It could be nice to fix it in this PR as well. Some that may be impacted: codgen, gpt2, gpt_neo, gpt_neox, gptj, gpt_bigcode, imagegpt, llama.@ramiro050 Beware that this code path if position_ids is None: should not be trusted for batched generation, as it assumes that all generated sequences have the same length.
Done! PTAL |
view of position_ids in modeling_llamaviews of position_ids
When `position_ids` is `None`, its value is generated using `torch.arange`, which creates a tensor of size `(seq_length + past_key_values_length) - past_key_values_length = seq_length`. The tensor is then unsqueezed, resulting in a tensor of shape `(1, seq_length)`. This means that the last `view` to a tensor of shape `(-1, seq_length)` is a no-op. This commit removes the unnecessary view.
8ce4d78 to
8d3b4a7
Compare
8d3b4a7 to
a0887a6
Compare
|
@ArthurZucker @fxmarty, bumping this |
ArthurZucker
left a comment
There was a problem hiding this comment.
Hey! Just wondering why we remove the casting to long which I think is required for backward compatibility!
| if position_ids is not None: | ||
| position_ids = position_ids.view(-1, input_shape[-1]).long() | ||
|
|
There was a problem hiding this comment.
hey! Not sure why this was removed as well? Casting to long is probably required / might be breaking if removed!
There was a problem hiding this comment.
The docstring mentions that position_ids should be a tensor of type torch.LongTensor, so there should not be a need for the cast. (position_ids is also marked as having type Optional[torch.LongTensor] in the signature of forward)
|
@ArthurZucker, bumping this |
ArthurZucker
left a comment
There was a problem hiding this comment.
Thanks for the fix
When
position_idsisNone, its value is generated usingtorch.arange, which creates a tensor of size(seq_length + initial_val) - initial_val = seq_length. The tensor is then unsqueezed, resulting in a tensor of shape(1, seq_length). This means that the lastviewto a tensor of shape(-1, seq_length)is a no-op.Simiarly, when
position_idsis notNone, the documentation for the models specifies that it should be of shape(batch_size, seq_length), making theviewmade in this case also unnecessary.This commit removes the unnecessary views.
@gante @ArthurZucker