In the current implementation of PD-Disagggregation, the decode server should know the request's prefill DP rank in bootstrapping. However, if we can only decide the dp rank in the dp_controller module, then in the current design, the decode server cannot get the prefill request's dp rank.
There are three methods to fix the issue:
- Change
route API for bootstrap server, use bootstrap_room as the identifier instead of
url = f"http://{self.bootstrap_addr}/route?engine_rank={engine_rank}&target_dp_group={target_dp_group}"
To achieve this, we have to solve the PUT and GET orders for prefill to register the info and decode to fetch the info.
- After
dp_controller has determined the dp rank, notify the decode server.
- Remove
dp_controller's load balance functionality completely, every dp rank is decided by the external router.
In the current implementation of PD-Disagggregation, the decode server should know the request's prefill DP rank in bootstrapping. However, if we can only decide the dp rank in the
dp_controllermodule, then in the current design, the decode server cannot get the prefill request's dp rank.There are three methods to fix the issue:
routeAPI for bootstrap server, usebootstrap_roomas the identifier instead ofTo achieve this, we have to solve the
PUTandGETorders for prefill to register the info and decode to fetch the info.dp_controllerhas determined the dp rank, notify the decode server.dp_controller's load balance functionality completely, every dp rank is decided by the external router.