Skip to content

Remove unnecessary views of position_ids#26059

Merged
ArthurZucker merged 2 commits intohuggingface:mainfrom
ramiro050:llama-remove-view
Oct 6, 2023
Merged

Remove unnecessary views of position_ids#26059
ArthurZucker merged 2 commits intohuggingface:mainfrom
ramiro050:llama-remove-view

Conversation

@ramiro050
Copy link
Contributor

@ramiro050 ramiro050 commented Sep 8, 2023

When position_ids is None, its value is generated using torch.arange, which creates a tensor of size (seq_length + initial_val) - initial_val = seq_length. The tensor is then unsqueezed, resulting in a tensor of shape (1, seq_length). This means that the last view to a tensor of shape (-1, seq_length) is a no-op.

Simiarly, when position_ids is not None, the documentation for the models specifies that it should be of shape (batch_size, seq_length), making the view made in this case also unnecessary.

This commit removes the unnecessary views.

@gante @ArthurZucker

@ramiro050
Copy link
Contributor Author

For context, I work in the Torch-MLIR project, which converts PyTorch models into MLIR. Handling the view op in its most general form (especially the use of -1s) has been a big challenge for us for a while now (there's some work we need to do to improve the shape information propagation and dynamic shape support in our project). While the goal is to support this as is in the future, in the short term, removing this view from the computation would unblock us to run LLaMA in our compiler.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me, could you confirm that the slow tests pass using RUN_SLOW pytest tests/models/llama ?

@ArthurZucker
Copy link
Collaborator

pinging @gante for a second look

@ArthurZucker ArthurZucker requested a review from gante September 8, 2023 20:02
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@ramiro050
Copy link
Contributor Author

Running RUN_SLOW=1 pytest tests/models/llama, I get 172 passed, 47 skipped, 35 warnings. The warnings are either due to deprecation or from using PyTorch's tracer; they don't seem related to the changes in this PR. The skipped tests are these:

test_modeling_llama.py::LlamaModelTest::test_cpu_offload SKIPPED (test requires CUDA)
test_modeling_llama.py::LlamaModelTest::test_disk_offload SKIPPED (test requires CUDA)
test_modeling_llama.py::LlamaModelTest::test_equivalence_flax_to_pt SKIPPED (test is PT+FLAX test)
test_modeling_llama.py::LlamaModelTest::test_equivalence_pt_to_flax SKIPPED (test is PT+FLAX test)
test_modeling_llama.py::LlamaModelTest::test_model_parallel_beam_search SKIPPED (test requires multiple GPUs)
test_modeling_llama.py::LlamaModelTest::test_model_parallel_equal_results SKIPPED (test requires multiple GPUs)
test_modeling_llama.py::LlamaModelTest::test_model_parallelism SKIPPED (test requires multiple GPUs)
test_modeling_llama.py::LlamaModelTest::test_model_parallelization SKIPPED (test requires multiple GPUs)
test_modeling_llama.py::LlamaModelTest::test_multi_gpu_data_parallel_forward SKIPPED (test requires multiple GPUs)
test_modeling_llama.py::LlamaModelTest::test_pipeline_audio_classification SKIPPED (LlamaModelTest::test_pipeline_audio_classification is skipped: `audio-classification` is not in `self.pipeline_model_mapping` for `...)
test_modeling_llama.py::LlamaModelTest::test_pipeline_automatic_speech_recognition SKIPPED (LlamaModelTest::test_pipeline_automatic_speech_recognition is skipped: `automatic-speech-recognition` is not in `self.pipel...)
test_modeling_llama.py::LlamaModelTest::test_pipeline_conversational SKIPPED (LlamaModelTest::test_pipeline_conversational is skipped: `conversational` is not in `self.pipeline_model_mapping` for `LlamaModelTest`.)
test_modeling_llama.py::LlamaModelTest::test_pipeline_depth_estimation SKIPPED (LlamaModelTest::test_pipeline_depth_estimation is skipped: `depth-estimation` is not in `self.pipeline_model_mapping` for `LlamaModelTe...)
test_modeling_llama.py::LlamaModelTest::test_pipeline_document_question_answering SKIPPED (LlamaModelTest::test_pipeline_document_question_answering is skipped: `document-question-answering` is not in `self.pipeline...)
test_modeling_llama.py::LlamaModelTest::test_pipeline_fill_mask SKIPPED (LlamaModelTest::test_pipeline_fill_mask is skipped: `fill-mask` is not in `self.pipeline_model_mapping` for `LlamaModelTest`.)
test_modeling_llama.py::LlamaModelTest::test_pipeline_image_classification SKIPPED (LlamaModelTest::test_pipeline_image_classification is skipped: `image-classification` is not in `self.pipeline_model_mapping` for `...)
test_modeling_llama.py::LlamaModelTest::test_pipeline_image_segmentation SKIPPED (LlamaModelTest::test_pipeline_image_segmentation is skipped: `image-segmentation` is not in `self.pipeline_model_mapping` for `LlamaM...)
test_modeling_llama.py::LlamaModelTest::test_pipeline_image_to_text SKIPPED (LlamaModelTest::test_pipeline_image_to_text is skipped: `image-to-text` is not in `self.pipeline_model_mapping` for `LlamaModelTest`.)
test_modeling_llama.py::LlamaModelTest::test_pipeline_mask_generation SKIPPED (`run_pipeline_test` is currently not implemented.)
test_modeling_llama.py::LlamaModelTest::test_pipeline_object_detection SKIPPED (LlamaModelTest::test_pipeline_object_detection is skipped: `object-detection` is not in `self.pipeline_model_mapping` for `LlamaModelTe...)
test_modeling_llama.py::LlamaModelTest::test_pipeline_question_answering SKIPPED (LlamaModelTest::test_pipeline_question_answering is skipped: `question-answering` is not in `self.pipeline_model_mapping` for `LlamaM...)
test_modeling_llama.py::LlamaModelTest::test_pipeline_summarization SKIPPED (LlamaModelTest::test_pipeline_summarization is skipped: `summarization` is not in `self.pipeline_model_mapping` for `LlamaModelTest`.)
test_modeling_llama.py::LlamaModelTest::test_pipeline_table_question_answering SKIPPED (LlamaModelTest::test_pipeline_table_question_answering is skipped: `table-question-answering` is not in `self.pipeline_model_ma...)
test_modeling_llama.py::LlamaModelTest::test_pipeline_text2text_generation SKIPPED (LlamaModelTest::test_pipeline_text2text_generation is skipped: `text2text-generation` is not in `self.pipeline_model_mapping` for `...)
test_modeling_llama.py::LlamaModelTest::test_pipeline_text_to_audio SKIPPED (LlamaModelTest::test_pipeline_text_to_audio is skipped: `text-to-audio` is not in `self.pipeline_model_mapping` for `LlamaModelTest`.)
test_modeling_llama.py::LlamaModelTest::test_pipeline_token_classification SKIPPED (LlamaModelTest::test_pipeline_token_classification is skipped: `token-classification` is not in `self.pipeline_model_mapping` for `...)
test_modeling_llama.py::LlamaModelTest::test_pipeline_translation SKIPPED (LlamaModelTest::test_pipeline_translation is skipped: `translation` is not in `self.pipeline_model_mapping` for `LlamaModelTest`.)
test_modeling_llama.py::LlamaModelTest::test_pipeline_video_classification SKIPPED (LlamaModelTest::test_pipeline_video_classification is skipped: `video-classification` is not in `self.pipeline_model_mapping` for `...)
test_modeling_llama.py::LlamaModelTest::test_pipeline_visual_question_answering SKIPPED (LlamaModelTest::test_pipeline_visual_question_answering is skipped: `visual-question-answering` is not in `self.pipeline_model...)
test_modeling_llama.py::LlamaModelTest::test_pipeline_zero_shot_audio_classification SKIPPED (LlamaModelTest::test_pipeline_zero_shot_audio_classification is skipped: `zero-shot-audio-classification` is not in `self...)
test_modeling_llama.py::LlamaModelTest::test_pipeline_zero_shot_image_classification SKIPPED (LlamaModelTest::test_pipeline_zero_shot_image_classification is skipped: `zero-shot-image-classification` is not in `self...)
test_modeling_llama.py::LlamaModelTest::test_pipeline_zero_shot_object_detection SKIPPED (LlamaModelTest::test_pipeline_zero_shot_object_detection is skipped: `zero-shot-object-detection` is not in `self.pipeline_mo...)
test_modeling_llama.py::LlamaModelTest::test_pt_tf_model_equivalence SKIPPED (test is PT+TF test)
test_modeling_llama.py::LlamaModelTest::test_save_load_fast_init_from_base SKIPPED (LLaMA buffers include complex numbers, which breaks this test)
test_modeling_llama.py::LlamaIntegrationTest::test_model_13b_greedy_generation SKIPPED (Model is curently gated)
test_modeling_llama.py::LlamaIntegrationTest::test_model_13b_logits SKIPPED (Logits are not exactly the same, once we fix the instabalities somehow, will update!)
test_modeling_llama.py::LlamaIntegrationTest::test_model_13bf_logits SKIPPED (Logits are not exactly the same, once we fix the instabalities somehow, will update!)
test_modeling_llama.py::LlamaIntegrationTest::test_model_70b_logits SKIPPED (Logits are not exactly the same, once we fix the instabalities somehow, will update! Also it is gonna be a `too_slow` test)
test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits SKIPPED (Logits are not exactly the same, once we fix the instabalities somehow, will update!)
test_modeling_llama.py::CodeLlamaIntegrationTest::test_model_7b_logits SKIPPED (test requires CUDA)
test_tokenization_llama.py::LlamaTokenizationTest::test_batch_encode_plus_tensors SKIPPED (test is PT+TF test)
test_tokenization_llama.py::LlamaTokenizationTest::test_pickle_subword_regularization_tokenizer SKIPPED (worker 'gw4' crashed on CI, passing locally.)
test_tokenization_llama.py::LlamaTokenizationTest::test_save_pretrained SKIPPED (Let's wait for the fast tokenizer!)
test_tokenization_llama.py::LlamaTokenizationTest::test_save_slow_from_fast_and_reload_fast SKIPPED (Unfortunately way too slow to build a BPE with SentencePiece.)
test_tokenization_llama.py::LlamaTokenizationTest::test_subword_regularization_tokenizer SKIPPED (worker 'gw4' crashed on CI, passing locally.)
test_tokenization_llama.py::LlamaTokenizationTest::test_tf_encode_plus_sent_to_model SKIPPED (test requires TensorFlow)
test_tokenization_llama.py::LlamaIntegrationTest::test_integration_test_xnli SKIPPED (RUN_TOKENIZER_INTEGRATION=1 to run tokenizer integration tests)

@ramiro050
Copy link
Contributor Author

@gante, bumping this :)

@ArthurZucker
Copy link
Collaborator

He is off for a while so won't be able to review this! le me ping @fxmarty for a second look! 👁️

Copy link
Contributor

@fxmarty fxmarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, saw the same thing. I believe other architectures have the same issue, e.g.

position_ids = position_ids.unsqueeze(0).view(-1, input_shape[-1])
. It could be nice to fix it in this PR as well. Some that may be impacted: codgen, gpt2, gpt_neo, gpt_neox, gptj, gpt_bigcode, imagegpt, llama.

@ramiro050 Beware that this code path if position_ids is None: should not be trusted for batched generation, as it assumes that all generated sequences have the same length.

@ramiro050
Copy link
Contributor Author

It could be nice to fix it in this PR as well. Some that may be impacted: codgen, gpt2, gpt_neo, gpt_neox, gptj, gpt_bigcode, imagegpt, llama.

Done! PTAL

@ramiro050 ramiro050 changed the title Remove unnecessary view of position_ids in modeling_llama Remove unnecessary views of position_ids Sep 20, 2023
When `position_ids` is `None`, its value is generated using
`torch.arange`, which creates a tensor of size `(seq_length +
past_key_values_length) - past_key_values_length = seq_length`. The
tensor is then unsqueezed, resulting in a tensor of shape `(1,
seq_length)`. This means that the last `view` to a tensor of shape
`(-1, seq_length)` is a no-op.

This commit removes the unnecessary view.
@ramiro050
Copy link
Contributor Author

@ArthurZucker @fxmarty, bumping this

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Just wondering why we remove the casting to long which I think is required for backward compatibility!

Comment on lines -478 to -480
if position_ids is not None:
position_ids = position_ids.view(-1, input_shape[-1]).long()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey! Not sure why this was removed as well? Casting to long is probably required / might be breaking if removed!

Copy link
Contributor Author

@ramiro050 ramiro050 Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring mentions that position_ids should be a tensor of type torch.LongTensor, so there should not be a need for the cast. (position_ids is also marked as having type Optional[torch.LongTensor] in the signature of forward)

@ramiro050
Copy link
Contributor Author

@ArthurZucker, bumping this

Copy link
Contributor

@fxmarty fxmarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix

@ArthurZucker ArthurZucker merged commit 8878eb1 into huggingface:main Oct 6, 2023
@ramiro050 ramiro050 deleted the llama-remove-view branch October 9, 2023 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants