-
Notifications
You must be signed in to change notification settings - Fork 32.7k
Closed
Labels
Description
Feature request
We would like to wrap image features selection in LLaVa in a separate function to make it easier to override for custom use cases (e.g. applying a layer norm on the image features before projection, etc).
transformers/src/transformers/models/llava/modeling_llava.py
Lines 452 to 461 in 52daf4e
| if pixel_values is not None: | |
| image_outputs = self.vision_tower(pixel_values, output_hidden_states=True) | |
| # this is not memory efficient at all (output_hidden_states=True) will save all the hidden stated. | |
| selected_image_feature = image_outputs.hidden_states[vision_feature_layer] | |
| if vision_feature_select_strategy == "default": | |
| selected_image_feature = selected_image_feature[:, 1:] | |
| elif vision_feature_select_strategy == "full": | |
| selected_image_feature = selected_image_feature | |
| else: | |
| raise ValueError(f"Unexpected select feature strategy: {self.config.vision_feature_select_strategy}") |
Currently, we need to override the entire forward pass for any custom processing of the image features. Having a separate function would make it a lot easier!
Motivation
To make it easier to implement custom image features selection in LlaVa.
Your contribution
A custom function for image features selection
def _get_selected_image_features(
self, pixel_values: torch.FloatTensor, vision_feature_layer: int, vision_feature_select_strategy: str
):
image_outputs = self.vision_tower(pixel_values, output_hidden_states=True)
# this is not memory efficient at all (output_hidden_states=True) will save all the hidden stated.
selected_image_feature = image_outputs.hidden_states[vision_feature_layer]
if vision_feature_select_strategy == "default":
selected_image_feature = selected_image_feature[:, 1:]
elif vision_feature_select_strategy == "full":
selected_image_feature = selected_image_feature
else:
raise ValueError(f"Unexpected select feature strategy: {self.config.vision_feature_select_strategy}")
return selected_image_featureReactions are currently unavailable