ShutterMuse

ShutterMuse is a Qwen3-VL based multimodal large language model for capture-time photography guidance, associated with the paper ShutterMuse: Capture-Time Photography Guidance with MLLMs.

Model Details

  • Architecture: Qwen3VLForConditionalGeneration
  • Model type: qwen3_vl
  • Processor: Qwen3VLProcessor
  • Precision: bfloat16
  • Format: safetensors sharded checkpoint

Usage

from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

model_id = "ShutterMuse/ShutterMuse"
processor = AutoProcessor.from_pretrained(model_id)
model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
)

Citation

If you use this model, please cite the associated paper:

@misc{li2026shuttermuse,
  title         = {ShutterMuse: Capture-Time Photography Guidance with MLLMs},
  author        = {Li, Jiayu and Fang, Yixiao and Hu, Tianyu and Cheng, Wei and Huang, Ping and Fan, Zheheng and Yu, Gang and Ma, Xingjun},
  year          = {2026},
  eprint        = {2606.25763},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url           = {https://arxiv.org/abs/2606.25763}
}
Downloads last month
20
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ShutterMuse/ShutterMuse

Finetuned
(332)
this model

Space using ShutterMuse/ShutterMuse 1

Paper for ShutterMuse/ShutterMuse