Checklist
Motivation
Currently the implementation of vision model in mllama4 is imported from transformers, which may have bad performance. We could implement the modules using sglang's implementation of vision attention
Related resources
No response
Checklist
Motivation
Currently the implementation of vision model in mllama4 is imported from transformers, which may have bad performance. We could implement the modules using sglang's implementation of vision attention
Related resources
No response