You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
R-Fork (Tensor Remote Fork) is a novel weight loading methodology that leverages efficient inter-node GPU-to-GPU data transfer path to load tensors from a running SGLang instance to a new instance with zero-copy. It can significantly optimize the SGLang instance boot-up time by reducing model weights loading from several minutes to mere seconds.
4
+
5
+
To learn more details about R-Fork, please check **<a href=https://lmsys.org/blog/2025-12-10-rfork/> R-Fork blog </a>**
| load-format | set to `remote_instance` to enable R-Fork. |
12
+
| remote-instance-weight-loader-backend |`nccl` or `transfer_engine`, default value is `nccl`|
13
+
| remote-instance-weight-loader-seed-instance-ip | IP address of the seed instance who will provide the model weight |
14
+
| remote-instance-weight-loader-seed-instance-service-port | the port that the seed instance's HTTP server is listening on |
15
+
| remote-instance-weight-loader-send-weights-group-ports | the list of available ports on the seed instance that will be used to build NCCL communication groups between seed and client instance. This argument is only needed by `nccl` backend. |
0 commit comments