[diffusion] disaggregated diffusion by yhyang201 · Pull Request #21701 · sgl-project/sglang

yhyang201 · 2026-03-30T16:58:27Z

Motivation

The performance gains are uncertain.

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist

Code Review

This pull request introduces a disaggregated diffusion pipeline architecture, allowing the Encoder, Denoiser, and Decoder roles to run on independent GPU instances. It includes a central DiffusionServer for request routing, a P2P transfer engine for tensor movement, and updated scheduler logic to support these roles. My review identified several areas for improvement, including fixing an incorrect average latency calculation, cleaning up redundant code and dataclass initializations, and addressing potential issues with the transfer protocol's handling of result frames.

gemini-code-assist · 2026-03-30T17:02:40Z

+            total = self._completed + self._failed
+            avg_latency = self._total_latency / total if total > 0 else 0.0


The average latency calculation seems to include failed requests in the denominator (total = self._completed + self._failed), but _total_latency is only updated for completed requests. This will result in an incorrect (underestimated) average latency. The denominator should probably just be self._completed.

Suggested change

total = self._completed + self._failed

avg_latency = self._total_latency / total if total > 0 else 0.0

total = self._completed

avg_latency = self._total_latency / total if total > 0 else 0.0

gemini-code-assist · 2026-03-30T17:02:40Z

+    else:
+        raise TypeError(f"Cannot encode transfer message: {type(msg)}")
+
+    d.pop("result_frames", None)


The result_frames field is popped from the dictionary before JSON serialization in encode_transfer_msg. This means that msg.get("result_frames") in diffusion_server.py's _handle_transfer_done will always be None, and the branch that handles result_frames (and calls _transfer_return_to_client) is effectively dead code. If the intention is to transfer result frames via this message, they should not be popped here.

gemini-code-assist · 2026-03-30T17:02:40Z

+    manifest: dict = None
+    scalar_fields: dict = None
+    receiver_session_id: str = ""
+    receiver_pool_ptr: int = 0
+    receiver_slot_offset: int = 0
+    sender_instance: int = -1
+    receiver_instance: int = -1
+    prealloc_slot_id: int | None = None
+
+    def __post_init__(self):
+        if self.manifest is None:
+            self.manifest = {}
+        if self.scalar_fields is None:
+            self.scalar_fields = {}


The __post_init__ method is used here to initialize manifest and scalar_fields to empty dictionaries if they are None. A more idiomatic way to handle mutable default arguments in dataclasses is to use field(default_factory=dict). This also simplifies other parts of the code, for example line 748 could become scalar_fields = dict(p2p.scalar_fields) without the conditional check.

Suggested change

manifest: dict = None

scalar_fields: dict = None

receiver_session_id: str = ""

receiver_pool_ptr: int = 0

receiver_slot_offset: int = 0

sender_instance: int = -1

receiver_instance: int = -1

prealloc_slot_id: int | None = None

def __post_init__(self):

if self.manifest is None:

self.manifest = {}

if self.scalar_fields is None:

self.scalar_fields = {}

manifest: dict = field(default_factory=dict)

scalar_fields: dict = field(default_factory=dict)

receiver_session_id: str = ""

receiver_pool_ptr: int = 0

receiver_slot_offset: int = 0

sender_instance: int = -1

receiver_instance: int = -1

prealloc_slot_id: int | None = None

gemini-code-assist · 2026-03-30T17:02:40Z

+            "pool_size": msg.get("pool_size", 0),
+        }
+        prealloc = msg.get("preallocated_slots", [])
+        info["free_preallocated_slots"] = list(prealloc) if prealloc else []


The expression list(prealloc) if prealloc else [] is redundant. list(prealloc) will produce an empty list if prealloc is an empty list. You can simplify this.

Suggested change

info["free_preallocated_slots"] = list(prealloc) if prealloc else []

info["free_preallocated_slots"] = list(prealloc)

gemini-code-assist · 2026-03-30T17:02:40Z

+                encode_transfer_msg(alloc_msg)
+            )
+
+    def _transfer_return_to_client(self, request_id: str, result_frames: list) -> None:


This function _transfer_return_to_client appears to be dead code. It is only called from _handle_transfer_done if result_frames is present in the message from the decoder. However, encode_transfer_msg in protocol.py explicitly removes result_frames before serialization, so this path is never taken. The main result path for decoders seems to be _handle_decoder_result_frames for non-transfer messages.

gemini-code-assist · 2026-03-30T17:02:40Z

+                        if scheduler_mod is not None and num_steps is not None:
+                            device = torch.device(f"cuda:{self.worker.local_rank}")
+                            extra_kwargs = {}
+                            mu = req.extra.get("mu") if hasattr(req, "extra") else None


The check hasattr(req, "extra") is redundant because the Req dataclass initializes extra with a default_factory=dict, so it will always be present. You can simplify this line.

Suggested change

mu = req.extra.get("mu") if hasattr(req, "extra") else None

mu = req.extra.get("mu")

gemini-code-assist · 2026-03-30T17:02:40Z

+            if scheduler_mod is not None and num_steps is not None:
+                device = torch.device(local_device)
+                extra_kwargs = {}
+                mu = req.extra.get("mu") if hasattr(req, "extra") else None


The check hasattr(req, "extra") is redundant because the Req dataclass initializes extra with a default_factory=dict, so it will always be present. You can simplify this line.

Suggested change

mu = req.extra.get("mu") if hasattr(req, "extra") else None

mu = req.extra.get("mu")

gemini-code-assist · 2026-03-30T17:02:40Z

+    manifest: dict = None
+    session_id: str = ""
+    pool_ptr: int = 0
+    slot_offset: int = 0
+
+    def __post_init__(self):
+        if self.manifest is None:
+            self.manifest = {}


The __post_init__ method is used to initialize manifest to an empty dictionary if it is None. A more idiomatic way to handle mutable default arguments in dataclasses is to use field(default_factory=dict). This makes the __post_init__ method unnecessary.

Suggested change

manifest: dict = None

session_id: str = ""

pool_ptr: int = 0

slot_offset: int = 0

def __post_init__(self):

if self.manifest is None:

self.manifest = {}

manifest: dict = field(default_factory=dict)

session_id: str = ""

pool_ptr: int = 0

slot_offset: int = 0

gemini-code-assist · 2026-03-30T17:02:40Z

+    manifest: dict = None
+    slot_offset: int = 0
+    scalar_fields: dict = None
+
+    def __post_init__(self):
+        if self.manifest is None:
+            self.manifest = {}
+        if self.scalar_fields is None:
+            self.scalar_fields = {}


The __post_init__ method is used here to initialize manifest and scalar_fields to empty dictionaries if they are None. A more idiomatic way to handle mutable default arguments in dataclasses is to use field(default_factory=dict). This makes the __post_init__ method unnecessary.

Suggested change

manifest: dict = None

slot_offset: int = 0

scalar_fields: dict = None

def __post_init__(self):

if self.manifest is None:

self.manifest = {}

if self.scalar_fields is None:

self.scalar_fields = {}

manifest: dict = field(default_factory=dict)

slot_offset: int = 0

scalar_fields: dict = field(default_factory=dict)

mickqian · 2026-04-01T16:03:36Z

/tag-and-rerun-ci

yhyang201 · 2026-04-01T22:49:08Z

/rerun-failed-ci

yhyang201 · 2026-04-01T23:25:38Z

/rerun-failed-ci

yhyang201 · 2026-04-01T23:58:23Z

/rerun-failed-ci

mickqian · 2026-04-05T06:23:24Z

+| `--disagg-role` | What it runs |
+|----------------|--------------|
+| `monolithic` | (Default) Standard single-server mode |
+| `encoder` | Encoder role instance (text/image encoding) |


better specify other infos, e.g., TimestepPrepationStage is also included

mickqian · 2026-04-05T06:25:00Z

@@ -0,0 +1,235 @@
+# Disaggregated Diffusion Pipeline


consider move this to docs/diffusion, the current folder is deprecated

mickqian · 2026-04-05T06:25:39Z

+        return order
+
+    @staticmethod
+    def _next_power_of_2(n: int) -> int:


could we cache this by lru_cache

mickqian · 2026-04-05T06:32:34Z


+    @property
+    def role_affinity(self):
+        from sglang.multimodal_gen.runtime.disaggregation.roles import RoleType


please avoid lazy import

mickqian · 2026-04-12T09:34:58Z

+        ib_device = getattr(sa, "disagg_ib_device", None)
+        engine = create_transfer_engine(
+            hostname=hostname,
+            gpu_id=self.gpu_id,


are we expecting something like a physical rank number (local_rank) here? gpu_id is the rank number within the process

mickqian · 2026-04-12T09:39:14Z

+    transfer_state: _TransferRequestState | None = None
+
+
+class DiffusionServer:


should we change the name of this file to something like orchestrator too?

mickqian · 2026-04-12T09:49:33Z

+            pass
+
+        # Fast path: use pre-allocated slot if available and large enough
+        peer_info = self._denoiser_peers.get(denoiser_idx, {})


_denoiser_peers follows the order of registration in _handle_transfer_register

_denoiser_pushes follows the order of passed --denoiser-urls, passed from parse_url_string -> endpoints list

as a result, when we're doing something like:

peer_info = self._denoiser_peers.get(denoiser_idx, {}) self._denoiser_pushes[denoiser_idx].send_multipart(encode_transfer_msg(alloc_msg))

here, we're mixing two instances, sending control msg to A, while using session id / pool_ptr from B as RDMA address.

This could be fixed via forcing the _denoiser_pushes follows the order of passing denoiser-urls

also nit: we need to extract the logics after these fast path to dedicated helper functions

yhyang201 · 2026-04-13T16:36:42Z

/tag-and-rerun-ci

yhyang201 · 2026-04-13T17:44:09Z

/rerun-failed-ci

yhyang201 · 2026-04-13T20:27:32Z

/rerun-failed-ci

yhyang201 · 2026-04-14T00:18:20Z

/rerun-failed-ci

yhyang201 · 2026-04-14T03:25:32Z

/rerun-failed-ci

- Add ready event to DiffusionServer so launch waits for sockets to bind before starting the HTTP server (fixes race condition) - Handle encoder/denoiser error results that arrive as non-transfer messages instead of silently dropping them (fixes silent timeout) - Log response body on HTTP errors and dump role log tails in tearDownClass for CI debugging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixes CI lint failure flagged by ruff. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ults, and setattr routing

github-actions Bot added documentation Improvements or additions to documentation diffusion SGLang Diffusion labels Mar 30, 2026

gemini-code-assist Bot reviewed Mar 30, 2026

View reviewed changes

yhyang201 marked this pull request as ready for review April 1, 2026 14:39

yhyang201 requested review from mickqian and ping1jing2 as code owners April 1, 2026 14:39

yhyang201 force-pushed the disaggregated-diffusion branch from 8cd36d9 to 85bdc28 Compare April 1, 2026 15:05

mickqian added the high priority label Apr 1, 2026

github-actions Bot added the run-ci label Apr 1, 2026

mickqian reviewed Apr 11, 2026

View reviewed changes

mickqian reviewed Apr 12, 2026

View reviewed changes

yhyang201 added 7 commits April 13, 2026 15:06

build disaggregated diffusion pipeline

b7e23fa

remove test

83878d9

upd

b08b131

fix

e2346f5

separate server args

6929491

address review comments

d61b2a1

formalize disagg e2e test and wire into 2-gpu CI

01df50d

yhyang201 force-pushed the disaggregated-diffusion branch from 04592b6 to 01df50d Compare April 13, 2026 15:25

mickqian approved these changes Apr 13, 2026

View reviewed changes

mickqian approved these changes Apr 15, 2026

View reviewed changes

mickqian and others added 9 commits April 15, 2026 14:54

Merge branch 'main' into disaggregated-diffusion

5d4d6ca

remove unused ModelTaskType import in scheduler.py

641ab23

Fixes CI lint failure flagged by ruff. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'main' into disaggregated-diffusion

fd2a8cf

upd

3a7172a

upd

4836329

fix disagg transfer: Python 3.10 compat, SamplingParams subclass defa…

a28a3a9

…ults, and setattr routing

fix send_tensors use-after-free on Python 3.10

817fd6f

add warmup and retry to disagg test

13469cb

Merge branch 'main' into disaggregated-diffusion

3c9ad65

mickqian merged commit 9da998a into sgl-project:main Apr 16, 2026
96 of 109 checks passed

jmamou pushed a commit to jmamou/sglang that referenced this pull request Apr 20, 2026

[diffusion] feat: disaggregated diffusion (sgl-project#21701)

c3c95f4

yhyang201 added a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[diffusion] feat: disaggregated diffusion (sgl-project#21701)

45786fe

zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026

[diffusion] feat: disaggregated diffusion (sgl-project#21701)

5c855d1

kyx1999 pushed a commit to KMSorSMS/sglang that referenced this pull request Apr 27, 2026

[diffusion] feat: disaggregated diffusion (sgl-project#21701)

fc4bf9a

FredHuang99 mentioned this pull request May 1, 2026

[diffusion] disaggregated diffusion v2 #24200

Open

5 tasks

		total = self._completed + self._failed
		avg_latency = self._total_latency / total if total > 0 else 0.0

	info["free_preallocated_slots"] = list(prealloc) if prealloc else []
	info["free_preallocated_slots"] = list(prealloc)

	mu = req.extra.get("mu") if hasattr(req, "extra") else None
	mu = req.extra.get("mu")

		transfer_state: _TransferRequestState \| None = None


		class DiffusionServer:

Conversation

yhyang201 commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

mickqian commented Apr 1, 2026

Uh oh!

yhyang201 commented Apr 1, 2026

Uh oh!

yhyang201 commented Apr 1, 2026

Uh oh!

yhyang201 commented Apr 1, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yhyang201 commented Apr 13, 2026

Uh oh!

yhyang201 commented Apr 13, 2026

Uh oh!

yhyang201 commented Apr 13, 2026

Uh oh!

yhyang201 commented Apr 14, 2026

Uh oh!

yhyang201 commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

yhyang201 commented Mar 30, 2026 •

edited

Loading