Skip to content

[diffusion] optimize: default to in-memory loading for URL/base64 image inputs#23118

Merged
mickqian merged 8 commits intomainfrom
mick/mmgen-srt-image-reuse
Apr 20, 2026
Merged

[diffusion] optimize: default to in-memory loading for URL/base64 image inputs#23118
mickqian merged 8 commits intomainfrom
mick/mmgen-srt-image-reuse

Conversation

@mickqian
Copy link
Copy Markdown
Collaborator

@mickqian mickqian commented Apr 18, 2026

This PR aligns multimodal_gen image input handling with sglang.srt semantics and removes unnecessary remote-image persistence in the OpenAI entrypoints.

What Changed

  • Defaulted remote image inputs (URL / data URI) to in-memory loading in OpenAI image/video paths.
  • Reused the shared srt-style image loading flow (get_image_bytes-based) in multimodal_gen vision loading.
  • Added lightweight retry logic for URL image download to handle transient network/HTTP failures (e.g. timeout, 429, 5xx).

Why

  • Avoid save-then-load overhead for remote image inputs.
  • Reduce disk I/O pressure and temp-file dependency in hot paths.
  • Improve robustness for unstable remote image fetches.

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@mickqian
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@github-actions github-actions Bot added Multi-modal multi-modal language model diffusion SGLang Diffusion run-ci labels Apr 18, 2026
@mickqian
Copy link
Copy Markdown
Collaborator Author

ignoring multimodal gen b200 test since it's already broken on main

@mickqian mickqian merged commit 9a0fd2f into main Apr 20, 2026
84 of 104 checks passed
@mickqian mickqian deleted the mick/mmgen-srt-image-reuse branch April 20, 2026 15:29
zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026
kyx1999 pushed a commit to KMSorSMS/sglang that referenced this pull request Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion Multi-modal multi-modal language model run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant