[Score API] Add SequenceClassification Model support by sundar24295s · Pull Request #22118 · sgl-project/sglang

sundar24295s · 2026-04-04T19:13:55Z

Add Scoring API support for SequenceClassification models

Summary

Extends the /v1/score endpoint to support SequenceClassification models (e.g., Qwen3ForSequenceClassification, Qwen2ForSequenceClassification, LlamaForClassification) in addition to the existing CausalLM path. For classification models, the scoring API returns pooled class logits from the model's classification head — no label_token_ids required.
Adds multi-item scoring (MIS) support for classification models via --multi-item-scoring-delimiter. All items are packed into a single sequence separated by a delimiter token and scored in one forward pass, with score_and_pool() extracting per-item scores at delimiter positions.
Adds comprehensive unit and E2E tests covering both single-item and multi-item classification scoring paths.

Changes

File	Description
`python/sglang/srt/layers/pooler.py`	New `score_and_pool()` function: dynamically finds delimiter positions in `input_ids` via `get_global_server_args()`, pools logits at those positions for MIS, or falls back to normal pool-then-score for single-item
`python/sglang/srt/managers/tokenizer_manager_score_mixin.py`	Dual model-type dispatch: creates `EmbeddingReqInput` for classification models (vs `GenerateReqInput` for CausalLM). Handles MIS for both model types. Text inputs are tokenized separately then combined at token level to avoid boundary artifacts
`python/sglang/srt/models/qwen3_classification.py`	Forward method uses `score_and_pool()`
`python/sglang/srt/models/qwen2_classification.py`	Forward method uses `score_and_pool()`
`python/sglang/srt/models/llama_classification.py`	Forward method uses `score_and_pool()`
`python/sglang/srt/entrypoints/engine_score_mixin.py`	Docstrings updated for both CausalLM and SequenceClassification
`python/sglang/srt/entrypoints/http_server.py`	`/v1/score` docstring updated
`.codespellrc`	Added "MIS" (Multi-Item Scoring) to codespell ignore list
`test/registered/unit/test_pooler_score_and_pool.py`	6 CPU unit tests for `score_and_pool` (single-item, MIS, fallback paths)
`test/registered/core/test_score_classification.py`	14 E2E tests across 3 classes: single-item classification, MIS classification, and MIS with 12 labels
`test/srt/test_multi_item_scoring.py`	Additional MIS integration tests

Design

Single-item scoring: Each query + item pair is sent as a separate EmbeddingReqInput. The model's pooler extracts the last-token hidden state, the classification head produces logits, and those logits are returned as scores.
Multi-item scoring: When --multi-item-scoring-delimiter <token_id> is configured, all items are packed into one sequence: query<delim>item1<delim>item2<delim>.... The score_and_pool() function applies the classification head to ALL hidden states, then extracts logits at positions just before each delimiter — the same pattern CausalLM uses in LogitsProcessor.
Delimiter positions are found dynamically via (input_ids == delimiter_token).nonzero() at forward time, avoiding the need to thread delimiter indices through the scheduling pipeline.

Test plan

Unit tests (CPU, no model needed)

source /workspace/venvs/sglang-repos/bin/activate
python test/registered/unit/test_pooler_score_and_pool.py -v

All 6 tests pass:

test_mis_extracts_positions_before_delimiter ... ok
test_mis_falls_back_when_no_delimiters_in_input ... ok
test_mis_falls_back_when_not_prefill_only ... ok
test_mis_returns_list_of_scores ... ok
test_single_item_returns_scores ... ok
test_single_item_scores_match_manual_computation ... ok
----------------------------------------------------------------------
Ran 6 tests in 0.007s
OK

E2E tests (GPU required)

Uses json_model_override_args to override Qwen/Qwen3-0.6B to Qwen3ForSequenceClassification with a random classification head.

python test/registered/core/test_score_classification.py -v

All 14 tests pass:

test_basic_single_item ... ok
test_deterministic ... ok
test_label_token_ids_ignored ... ok
test_raw_logits_without_softmax ... ok
test_single_item_edge_case ... ok
test_tokenized_inputs ... ok
test_deterministic (MIS) ... ok
test_items_produce_distinct_scores ... ok
test_mis_basic ... ok
test_mis_many_items ... ok
test_mis_single_item ... ok
test_softmax_valid ... ok
test_many_items_distinct (12 labels) ... ok
test_many_labels_shape (12 labels) ... ok
----------------------------------------------------------------------
Ran 14 tests in 45.538s
OK

Manual server test — single-item scoring

Launch server with a SequenceClassification model:

python -m sglang.launch_server \
  --model-path <path-to-seq-cls-model> \
  --port 30000 --host 0.0.0.0 \
  --chunked-prefill-size -1 \
  --dtype float16 \
  --mem-fraction-static 0.5 \
  --disable-radix-cache \
  --disable-cuda-graph

Single-item curl:

curl -s -X POST "http://localhost:30000/v1/score" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the capital of California?",
    "items": ["Sacramento"],
    "model": "default"
  }' | python3 -m json.tool

Returns class logits directly (12 labels for this model):

{
    "scores": [
        [-0.44, 0.27, -0.85, -0.41, 0.69, 0.32, 0.59, 0.38, 0.17, 0.70, -0.07, -0.07]
    ],
    "prompt_tokens": 14
}

Manual server test — multi-item scoring

Launch server with MIS delimiter:

python -m sglang.launch_server \
  --model-path <path-to-seq-cls-model> \
  --port 30000 --host 0.0.0.0 \
  --chunked-prefill-size -1 \
  --dtype float16 \
  --mem-fraction-static 0.5 \
  --disable-radix-cache \
  --disable-cuda-graph \
  --multi-item-scoring-delimiter 151643

Multi-item curl:

curl -s -X POST "http://localhost:30000/v1/score" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the capital of California?",
    "items": ["Sacramento", "Los Angeles", "San Francisco"],
    "model": "default",
    "apply_softmax": true
  }' | python3 -m json.tool

Returns one probability distribution per item (all items scored in a single forward pass):

{
    "scores": [
        [0.05, 0.10, 0.03, 0.05, 0.15, 0.11, 0.14, 0.11, 0.09, 0.15, 0.07, 0.07],
        [0.05, 0.10, 0.03, 0.05, 0.16, 0.10, 0.14, 0.11, 0.09, 0.15, 0.07, 0.07],
        [0.05, 0.10, 0.03, 0.05, 0.15, 0.10, 0.14, 0.11, 0.09, 0.15, 0.07, 0.07]
    ],
    "prompt_tokens": 26
}

Pre-commit

SKIP=clang-format pre-commit run  # all hooks pass

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist

Code Review

This pull request introduces support for SequenceClassification models within the scoring API, enabling both single-item and multi-item scoring (MIS) for these architectures. Key implementation details include a new score_and_pool utility in the pooler layer to extract hidden states at delimiter positions and updates to the tokenizer manager to process pooled classification logits. The changes also integrate this logic into Llama and Qwen classification models and add comprehensive test suites. Review feedback identifies a performance bottleneck in the pooler layer caused by GPU-CPU synchronizations when iterating over sequence lengths and suggests using dim=-1 for softmax operations in the tokenizer manager to ensure robustness against varying tensor shapes.

hnyls2002 · 2026-04-08T08:36:21Z

Three test placement issues to fix before merge:

Create test/registered/score/ and move score tests there. test_score_classification.py should not be in core/. Also move the existing test_score_api.py from core/ into score/ so all scoring tests live together.
Move test/registered/unit/test_pooler_score_and_pool.py to test/registered/unit/layers/. Unit tests follow the source module hierarchy — score_and_pool is in sglang.srt.layers.pooler.
Delete test/srt/test_multi_item_scoring.py. It has no CI registration, uses hand-rolled server lifecycle instead of sglang test fixtures, and its coverage overlaps with test_score_api.py and test_score_classification.py.

The file was added via merge from main (sgl-project#22118) but not registered in the old test/srt/run_suite.py suite, causing a sanity check failure.

Score API SequenceClassification Model support

68da3a8

sundar24295s requested review from BBuf, CatherineSue, Edwardf0t1, Fridge003, HaiShaw, JustinTong0323, Ying1123, ch-wan, ispobock, merrymercy and slin1237 as code owners April 4, 2026 19:13

sundar24295s added the run-ci label Apr 4, 2026

gemini-code-assist Bot reviewed Apr 4, 2026

View reviewed changes

Comment thread python/sglang/srt/layers/pooler.py Outdated

Comment thread python/sglang/srt/managers/tokenizer_manager_score_mixin.py Outdated

sundar24295s added 2 commits April 4, 2026 19:47

updates

71d150f

Merge branch 'main' into suramach/addseqcls

3b740d0

sundar24295s mentioned this pull request Apr 7, 2026

[Roadmap] SGLang Prefill-Only 2026 CY26H1 Roadmap #15344

Open

23 tasks

Merge branch 'main' into suramach/addseqcls

e9d52d6

hnyls2002 approved these changes Apr 8, 2026

View reviewed changes

hnyls2002 merged commit 712c8c5 into sgl-project:main Apr 8, 2026
284 of 316 checks passed

hnyls2002 mentioned this pull request Apr 8, 2026

Move scoring tests to test/registered/score/ and clean up unregistered test #22345

Closed

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 8, 2026

[Score API] Add SequenceClassification Model support (sgl-project#22118)

f9567e2

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[Score API] Add SequenceClassification Model support (sgl-project#22118)

367980d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Score API] Add SequenceClassification Model support#22118

[Score API] Add SequenceClassification Model support#22118
hnyls2002 merged 4 commits intosgl-project:mainfrom
sundar24295s:suramach/addseqcls

sundar24295s commented Apr 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hnyls2002 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sundar24295s commented Apr 4, 2026

Add Scoring API support for SequenceClassification models

Summary

Changes

Design

Test plan

Unit tests (CPU, no model needed)

E2E tests (GPU required)

Manual server test — single-item scoring

Manual server test — multi-item scoring

Pre-commit

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hnyls2002 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants