## Speculative Decoding Development Roadmap (2026 Q2) - adaptive speculative decoding #23705 - spec v2 #11762 - [x] enable v2 by default #21062 - [ ] fully overlap (remove wait_for_verify sync) - [ ] initial pr #23452 - [ ] avoid using cpu metadata in attention backends - [ ] constrained decoding #15623 - [ ] topk > 1 (optional) - piecewise-cuda-graph - [ ] make spec decoding compatible with piecewise-cuda-graph #22128 - dflash @dcw02 - [x] initial pr #22077 - [ ] support spec v2 #23000 - [ ] ... (todo) - refactor - rename some ambiguous variable names and concepts @Qiaolin-Yu - #21058 - parallel spec decoding - [ ] initial pr #22272 - ngram #21052 @kpham-sgl
Speculative Decoding Development Roadmap (2026 Q2)