Skip to content

Conversation

@kunal-vaishnavi
Copy link
Contributor

@kunal-vaishnavi kunal-vaishnavi commented May 18, 2024

Description

This PR adds fusions for OpenAI's CLIP model. Here is an example of how to run the ORT transformer optimizer for the linked CLIP model.

$ git clone https://github.com/microsoft/onnxruntime
$ cd onnxruntime/onnxruntime/python/tools/transformers
$ python3 optimizer.py --input /path/to/model.onnx --output /path/to/model_opt.onnx --model_type clip --num_heads 16 --hidden_size 1024 --use_external_data_format --opt_level 0

Motivation and Context

This PR helps optimize multi-modal models that use CLIP for the vision encoder.

["Expand", "Unsqueeze", "Unsqueeze", "Where", "Less"],
[causal_mask_input_index, 0, 0, 0, 0],
["Concat", "Expand", "Unsqueeze", "Unsqueeze", "Where", "Less"],
[causal_mask_input_index, 0, 0, 0, 0, 0],

Check failure

Code scanning / CodeQL

Potentially uninitialized local variable

Local variable 'causal_mask_input_index' may be used before it is initialized.
hanbitmyths
hanbitmyths previously approved these changes May 18, 2024
@hanbitmyths hanbitmyths merged commit ca22a5a into microsoft:main May 18, 2024
kunal-vaishnavi added a commit that referenced this pull request Jan 15, 2025
### Description

This PR adds unit tests for [fusing the vision
components](#20721) of
Phi-3 vision and Phi-3.5 vision.

### Motivation and Context

Many multi-modal models use a CLIP encoder or a variant of CLIP as part
of their encoders. These fusion unit tests will ensure that the vision
components of Phi-3 vision and Phi-3.5 vision can still be fused when
existing fusions are modified to support more models.
guschmue pushed a commit that referenced this pull request Mar 6, 2025
### Description

This PR adds unit tests for [fusing the vision
components](#20721) of
Phi-3 vision and Phi-3.5 vision.

### Motivation and Context

Many multi-modal models use a CLIP encoder or a variant of CLIP as part
of their encoders. These fusion unit tests will ensure that the vision
components of Phi-3 vision and Phi-3.5 vision can still be fused when
existing fusions are modified to support more models.
ashrit-ms pushed a commit that referenced this pull request Mar 17, 2025
### Description

This PR adds unit tests for [fusing the vision
components](#20721) of
Phi-3 vision and Phi-3.5 vision.

### Motivation and Context

Many multi-modal models use a CLIP encoder or a variant of CLIP as part
of their encoders. These fusion unit tests will ensure that the vision
components of Phi-3 vision and Phi-3.5 vision can still be fused when
existing fusions are modified to support more models.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants