Checklist
Motivation
In the current SGLang router implementation (written in Rust), we support:
- Regular routing strategies: cache-aware, random, and round-robin
- Prefill-decode (PD) disaggregated routing: random and power-of-two (Po2) based
Previously, incoming requests were deserialized from raw bytes into dictionaries (maps) to extract minimal fields (e.g., stream). However, with the addition of PD routing requirements, fields like bootstrap_port and bootstrap_room need to be injected into the request object. As a result, the router now deserializes the full request into a fully typed struct.
This shift raises performance concerns regarding deserialization overhead, especially under high QPS.
Goal
Evaluate and implement an optimized solution that balances:
- Performance overhead
- Code maintainability
- Flexibility for routing logic extensions
Task
Related resources
sample bootstrap injection
fn inject_bootstrap_fields(
&self,
json: &mut serde_json::Value,
prefill: &EngineInfo,
batch_size: Option<usize>,
) -> Result<(), String> {
let obj = json
.as_object_mut()
.ok_or("Request body is not a JSON object")?;
// Generate bootstrap room
let room_id = rand::random::<u64>();
match batch_size {
Some(n) => {
// Batch format
obj.insert(
"bootstrap_host".to_string(),
serde_json::json!(vec![prefill.url.as_str(); n]),
);
obj.insert(
"bootstrap_port".to_string(),
serde_json::json!(vec![prefill.bootstrap_port; n]),
);
obj.insert(
"bootstrap_room".to_string(),
serde_json::json!(vec![room_id; n]),
);
}
None => {
// Single format
obj.insert(
"bootstrap_host".to_string(),
serde_json::json!(prefill.url.as_str()),
);
obj.insert(
"bootstrap_port".to_string(),
serde_json::json!(prefill.bootstrap_port),
);
obj.insert("bootstrap_room".to_string(), serde_json::json!(room_id));
}
}
Ok(())
}
Checklist
Motivation
In the current SGLang router implementation (written in Rust), we support:
Previously, incoming requests were deserialized from raw bytes into dictionaries (maps) to extract minimal fields (e.g., stream). However, with the addition of PD routing requirements, fields like bootstrap_port and bootstrap_room need to be injected into the request object. As a result, the router now deserializes the full request into a fully typed struct.
This shift raises performance concerns regarding deserialization overhead, especially under high QPS.
Goal
Evaluate and implement an optimized solution that balances:
Task
Related resources
sample bootstrap injection