Skip to content

Supported Stats for Speculative Decoding for Chat API#1

Merged
FrankLeeeee merged 2 commits intonv_eagle3from
feature/chat-frontend
Jun 9, 2025
Merged

Supported Stats for Speculative Decoding for Chat API#1
FrankLeeeee merged 2 commits intonv_eagle3from
feature/chat-frontend

Conversation

@FrankLeeeee
Copy link
Copy Markdown
Collaborator

@FrankLeeeee FrankLeeeee commented Jun 9, 2025

Motivation

When we run benchmark/mtbench/bench_sglang_eagle.py, this will use the /generate API by default, however, it does not work well for models which require chat APIs such as Llama4, as a result, the acceptance length is extremely low for these models.

Thus, I updated this part of code for two purposes:

  1. enable chat api in SGLang frontend
  2. enable speculative decoding stats for chat api

Modifications

The results seem good.

image

Checklist

@FrankLeeeee FrankLeeeee merged commit 22a52b3 into nv_eagle3 Jun 9, 2025
1 check failed
FrankLeeeee added a commit that referenced this pull request Jun 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant