Skip to content

LLaVA torch.compile implementation #29891

@sheepymeh

Description

@sheepymeh

Feature request

As per #28981, LLaVA is planned to receive torch.compile support. Seeing to the fact that LLaVA is composed of a vision tower and a LLM, both of which can be separately compiled with fullgraph=True (after support has been added, which is not the case for Mistral), it seems much easier to compile both parts separately as well.

Motivation

The _merge_input_ids_with_image_features function that connects the two parts is difficult to compile as PyTorch has yet to add support for many of the functions used that require dynamic input sizes, which are necessary here as the number of input image tokens is subject to change.

Your contribution

I'd love to try submitting a PR if possible but I'm not sure what the best way to do so is given the current circumstances.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions