Skip to content

[Feature] Fully support DP load balance for PD-Disaggregation mode. #10174

@hnyls2002

Description

@hnyls2002

In the current implementation of PD-Disagggregation, the decode server should know the request's prefill DP rank in bootstrapping. However, if we can only decide the dp rank in the dp_controller module, then in the current design, the decode server cannot get the prefill request's dp rank.

There are three methods to fix the issue:

  • Change route API for bootstrap server, use bootstrap_room as the identifier instead of
url = f"http://{self.bootstrap_addr}/route?engine_rank={engine_rank}&target_dp_group={target_dp_group}"

To achieve this, we have to solve the PUT and GET orders for prefill to register the info and decode to fetch the info.

  • After dp_controller has determined the dp rank, notify the decode server.
  • Remove dp_controller's load balance functionality completely, every dp rank is decided by the external router.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions