[Observability] Add pending token count to prefill log and get_load by ch-wan · Pull Request #22480 · sgl-project/sglang

ch-wan · 2026-04-10T02:54:00Z

Motivation

For prefill requests, the engine log only shows #queue-req and #running-req, which is insufficient for understanding load — especially for long-context requests where a single request can have hundreds of thousands of tokens pending prefill across multiple chunks.

This PR adds a #pending-token metric that shows the total number of tokens still waiting to be prefilled. This is useful for:

Monitoring chunked prefill progress in engine logs
Balancing load across engines when handling long-context requests via /get_load

Modifications

PrefillStats (scheduler_metrics_mixin.py): Added num_pending_tokens field, snapshotted at batch-scheduling time for correct reporting under the overlap scheduler.
_get_num_pending_tokens() (scheduler_metrics_mixin.py): New shared helper that computes pending tokens from the waiting queue plus remaining tokens of the currently chunked request. Accepts a chunk_deduct parameter to handle the timing difference between scheduling time (where prefix_indices has not yet been updated) and query time (get_load, where it has).
Prefill batch log (scheduler_metrics_mixin.py): Added #pending-token: {N} to the log line.
GetLoadReqOutput (io_struct.py): Added num_pending_tokens field returned by /get_load.
get_load() (scheduler_metrics_mixin.py): Populates num_pending_tokens using the shared helper.
get_new_batch_prefill() (scheduler.py): Snapshots num_pending_tokens into PrefillStats at scheduling time.

Example log output with a 30K-token chunked prefill (chunk size 8192):

Prefill batch, #new-seq: 1, #new-token: 8192, ..., #pending-token: 21825
Prefill batch, #new-seq: 1, #new-token: 8192, ..., #pending-token: 13633
Prefill batch, #new-seq: 1, #new-token: 8192, ..., #pending-token: 5441
Prefill batch, #new-seq: 1, #new-token: 5441, ..., #pending-token: 0

Accuracy Tests

N/A — observability-only change, no model output affected.

Speed Tests and Profiling

N/A — adds a lightweight sum over the waiting queue during logging and load queries.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Add a `#pending-token` metric to the prefill batch log and a `num_pending_tokens` field to the `GetLoadReqOutput` returned by `/get_load`. This shows the total number of tokens still waiting to be prefilled, including remaining tokens from the currently chunked request. This is particularly useful for load balancing long-context requests across engines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist · 2026-04-10T02:54:05Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

ch-wan · 2026-04-10T02:57:47Z

/tag-and-rerun-ci

Removed outdated comments regarding prefix_indices and chunk_deduct.

…22480) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…gl-project#22480) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…gl-project#22480)

ch-wan requested review from Ying1123, fzyzcjy, hnyls2002, merrymercy, sufeng-buaa and xiezhq-hermann as code owners April 10, 2026 02:54

github-actions Bot added the run-ci label Apr 10, 2026

ch-wan and others added 2 commits April 9, 2026 20:00

Clean up comments in scheduler.py

b02649f

Removed outdated comments regarding prefix_indices and chunk_deduct.

lint

c635442

ch-wan merged commit 37107be into main Apr 10, 2026
173 of 213 checks passed

ch-wan deleted the chengwan/add-pending-token-metric branch April 10, 2026 09:05

Fridge003 pushed a commit that referenced this pull request Apr 11, 2026

[Observability] Add pending token count to prefill log and get_load (#…

f0addb8

…22480) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

pyc96 pushed a commit to pyc96/sglang that referenced this pull request Apr 14, 2026

[Observability] Add pending token count to prefill log and get_load (s…

af577ef

…gl-project#22480) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[Observability] Add pending token count to prefill log and get_load (s…

a7c0f5f

…gl-project#22480)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Observability] Add pending token count to prefill log and get_load#22480

[Observability] Add pending token count to prefill log and get_load#22480
ch-wan merged 3 commits intomainfrom
chengwan/add-pending-token-metric

ch-wan commented Apr 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Apr 10, 2026

Uh oh!

ch-wan commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ch-wan commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Apr 10, 2026

Uh oh!

ch-wan commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ch-wan commented Apr 10, 2026 •

edited

Loading