Here is the development roadmap for 2025 H1. Contributions and feedback are welcome (Join Bi-weekly Development Meeting). The previous 2024 Q4 roadmap can be found in #1487
Focus
- Throughput-oriented large-scale deployment similar to the deepseek inference system
- Long context optimizations
- Low latency speculative decoding
- Reinforcement learning training framework integration
- Kernel optimizations
Parallelism
Attention Backend
Caching
Kernel
Quantization
RL Framework integration
Core refactor
Speculative decoding
Multi-LoRA serving
Hardware
Model coverage
Function Calling
Others
Here is the development roadmap for 2025 H1. Contributions and feedback are welcome (Join Bi-weekly Development Meeting). The previous 2024 Q4 roadmap can be found in #1487
Focus
Parallelism
Attention Backend
Caching
Kernel
Quantization
RL Framework integration
Core refactor
scheduler.pyandmodel_runner.pyto make them more modularSpeculative decoding
Multi-LoRA serving
Hardware
Model coverage
Function Calling
Others
sglang/docs/references/faq.md
Line 3 in 8912b76