Skip to content

[eagle] overlap scheduler for eagle and non-eagle#8490

Merged
zhyncs merged 18 commits intolianmin/overlap-specfrom
hanming/overlap-spec-w-batch
Aug 15, 2025
Merged

[eagle] overlap scheduler for eagle and non-eagle#8490
zhyncs merged 18 commits intolianmin/overlap-specfrom
hanming/overlap-spec-w-batch

Conversation

@hanming-lu
Copy link
Copy Markdown
Collaborator

@hanming-lu hanming-lu commented Jul 29, 2025

Summary

  • race-free overlap scheduler skeleton for eagle and non-eagle
  • passes gsm8k for both eagle and non-eagle
  • supports BS>1
  • supports memory alloc/dealloc

TODO

  • optimize future resolve, it is currently taking 0.1ms (~1%), can be close to 0
  • overlap draft decode plan, it is currently taking 0.12ms (~1%), can be overlapped with previous batch's draft extend. If we want to do this, we need to change batch.seq_lens to be two batches ahead, so no need to wait for verify_done before filter/merge/allocate_for_eagle.

Tests

python test/srt/test_eagle_infer_refactor.py TestEagleLargeBS

python test/srt/test_eagle_infer_refactor.py TestEagleLargeBSNoSD

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@hanming-lu hanming-lu force-pushed the hanming/overlap-spec-w-batch branch from 94cf15d to a965ecb Compare August 2, 2025 01:46
@hanming-lu hanming-lu changed the title [SD] support spec dec v2 batch size > 1 [SD] spec dec v2 batch size > 1 and mem alloc Aug 2, 2025
Comment thread python/sglang/srt/managers/scheduler_output_processor_mixin.py Outdated
@hanming-lu hanming-lu force-pushed the hanming/overlap-spec-w-batch branch from 02fb61d to 2189686 Compare August 6, 2025 22:45
@hanming-lu hanming-lu force-pushed the hanming/overlap-spec-w-batch branch from b07c033 to b71796a Compare August 12, 2025 23:48
@hanming-lu hanming-lu force-pushed the hanming/overlap-spec-w-batch branch from b71796a to a1b3d62 Compare August 13, 2025 00:39
@hanming-lu hanming-lu force-pushed the hanming/overlap-spec-w-batch branch from 1972f10 to eda557f Compare August 15, 2025 01:35
@hanming-lu hanming-lu force-pushed the hanming/overlap-spec-w-batch branch from e17aa12 to 73cc358 Compare August 15, 2025 02:13
@hanming-lu hanming-lu changed the title [SD] spec dec v2 batch size > 1 and mem alloc [eagle] race-free overlap scheduler for eagle and non-eagle Aug 15, 2025
@hanming-lu hanming-lu changed the title [eagle] race-free overlap scheduler for eagle and non-eagle [eagle] overlap scheduler for eagle and non-eagle Aug 15, 2025
@hanming-lu hanming-lu marked this pull request as ready for review August 15, 2025 05:55
@hanming-lu hanming-lu force-pushed the hanming/overlap-spec-w-batch branch from 9971436 to a496885 Compare August 15, 2025 06:00
@zhyncs
Copy link
Copy Markdown
Collaborator

zhyncs commented Aug 15, 2025

Great work! Let's merge hanming/overlap-spec-w-batch to lianmin/overlap-spec first.

@zhyncs zhyncs merged commit fa3202b into lianmin/overlap-spec Aug 15, 2025
1 check passed
@zhyncs zhyncs deleted the hanming/overlap-spec-w-batch branch August 15, 2025 19:54
@zhyncs
Copy link
Copy Markdown
Collaborator

zhyncs commented Aug 15, 2025

Hi @hanming-lu can you help merge latest main to lianmin/overlap-spec? Thanks!

@hanming-lu
Copy link
Copy Markdown
Collaborator Author

Hi @hanming-lu can you help merge latest main to lianmin/overlap-spec? Thanks!

@merrymercy could you please help with sorting out the plan? thanks!

@hnyls2002 hnyls2002 restored the hanming/overlap-spec-w-batch branch September 18, 2025 03:56
@zhyncs zhyncs deleted the hanming/overlap-spec-w-batch branch September 18, 2025 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants