Refactor image features selection in LlaVa

### Feature request

We would like to wrap image features selection in LLaVa in a separate function to make it easier to override for custom use cases (e.g. applying a layer norm on the image features before projection, etc).

https://github.com/huggingface/transformers/blob/52daf4ec768fb9ffe84a0c373834172a7c54aecc/src/transformers/models/llava/modeling_llava.py#L452-L461

Currently, we need to override the entire forward pass for any custom processing of the image features. Having a separate function would make it a lot easier! 

### Motivation

To make it easier to implement custom image features selection in LlaVa. 

### Your contribution

A custom function for image features selection
```python
    def _get_selected_image_features(
        self, pixel_values: torch.FloatTensor, vision_feature_layer: int, vision_feature_select_strategy: str
    ):
        image_outputs = self.vision_tower(pixel_values, output_hidden_states=True)
        # this is not memory efficient at all (output_hidden_states=True) will save all the hidden stated.
        selected_image_feature = image_outputs.hidden_states[vision_feature_layer]
        if vision_feature_select_strategy == "default":
            selected_image_feature = selected_image_feature[:, 1:]
        elif vision_feature_select_strategy == "full":
            selected_image_feature = selected_image_feature
        else:
            raise ValueError(f"Unexpected select feature strategy: {self.config.vision_feature_select_strategy}")
        return selected_image_feature
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor image features selection in LlaVa #33695

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	if pixel_values is not None:
	image_outputs = self.vision_tower(pixel_values, output_hidden_states=True)
	# this is not memory efficient at all (output_hidden_states=True) will save all the hidden stated.
	selected_image_feature = image_outputs.hidden_states[vision_feature_layer]
	if vision_feature_select_strategy == "default":
	selected_image_feature = selected_image_feature[:, 1:]
	elif vision_feature_select_strategy == "full":
	selected_image_feature = selected_image_feature
	else:
	raise ValueError(f"Unexpected select feature strategy: {self.config.vision_feature_select_strategy}")

Refactor image features selection in LlaVa #33695

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions