Skip to content

[Spec][Ngram] Return token counts in list_external_corpora API#22471

Merged
hnyls2002 merged 3 commits intosgl-project:mainfrom
kpham-sgl:feat/list-corpus-token-counts
Apr 11, 2026
Merged

[Spec][Ngram] Return token counts in list_external_corpora API#22471
hnyls2002 merged 3 commits intosgl-project:mainfrom
kpham-sgl:feat/list-corpus-token-counts

Conversation

@kpham-sgl
Copy link
Copy Markdown
Collaborator

@kpham-sgl kpham-sgl commented Apr 9, 2026

Motivation

Part of the Ngram series #21052

Admin will manage the external corpora themselves, we need to give them more insights into their corpora (token counts of each corpus)

Modifications

Replace corpus_ids list with corpus_token_counts dict (corpus_id -> token_count) in the list corpora response. This exposes per-corpus token counts from the SuffixAutomaton's pos_ counter through the full stack: C++ -> FFI -> Python -> HTTP.

Accuracy Tests

N/A

Speed Tests and Profiling

N/A

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

kpham-sgl and others added 2 commits April 9, 2026 16:24
Replace corpus_ids list with corpus_token_counts dict (corpus_id -> token_count)
in the list corpora response. This exposes per-corpus token counts from the
SuffixAutomaton's pos_ counter through the full stack: C++ -> FFI -> Python -> HTTP.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/rerun-test registered/unit/spec/test_ngram_corpus.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

ubuntu-latest (1 test): View workflow run

cd test/ && python3 registered/unit/spec/test_ngram_corpus.py

@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/rerun-test registered/unit/spec/test_ngram_corpus.py

@github-actions
Copy link
Copy Markdown
Contributor

ubuntu-latest (1 test): View workflow run

cd test/ && python3 registered/unit/spec/test_ngram_corpus.py

@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/spec/test_ngram_speculative_decoding.py

@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

…or FFI encoding

Sync TokenizerManager with renamed corpus_token_counts field to prevent
AttributeError, and switch FFI delimiter from comma to tab so corpus IDs
containing commas are not misparsed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/rerun-test registered/unit/spec/test_ngram_corpus.py

@github-actions
Copy link
Copy Markdown
Contributor

ubuntu-latest (1 test): View workflow run

cd test/ && python3 registered/unit/spec/test_ngram_corpus.py

@hnyls2002 hnyls2002 merged commit 04bd8e1 into sgl-project:main Apr 11, 2026
110 of 119 checks passed
pyc96 pushed a commit to pyc96/sglang that referenced this pull request Apr 14, 2026
…roject#22471)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants