Skip to content

[Score API] Implement EngineScoreMixin for scoring functionality and refactor Tok…#21342

Merged
hnyls2002 merged 14 commits intosgl-project:mainfrom
sundar24295s:suramach/refactor
Apr 3, 2026
Merged

[Score API] Implement EngineScoreMixin for scoring functionality and refactor Tok…#21342
hnyls2002 merged 14 commits intosgl-project:mainfrom
sundar24295s:suramach/refactor

Conversation

@sundar24295s
Copy link
Copy Markdown
Collaborator

Refactor: Extract Scoring API into Dedicated Mixin Files + CODEOWNERS

Motivation

The Scoring API (/v1/score, Engine.score(), Engine.async_score()) is a self-contained feature spanning entrypoints, managers, and tests. This PR extracts scoring-specific logic into dedicated files and adds CODEOWNERS entries so that scoring contributors are automatically requested for review on scoring-related changes.

What Changed

1. New file: python/sglang/srt/entrypoints/engine_score_mixin.py

Extracted score() and async_score() from the Engine class into an EngineScoreMixin. The Engine class now inherits from this mixin — no behavior change, just cleaner separation.

2. Renamed: tokenizer_manager_multiitem_mixin.pytokenizer_manager_score_mixin.py

Renamed the file and class (TokenizerManagerMultiItemMixinTokenizerManagerScoreMixin) to better reflect its purpose. This file contains the core scoring logic: score_request(), score_prompts(), multi-item scoring helpers, and the ScoreResult dataclass.

3. Updated: python/sglang/srt/entrypoints/engine.py

  • Engine now inherits from EngineScoreMixin (MRO: Engine → EngineScoreMixin → EngineBase → ABC)
  • Removed the inline score() / async_score() methods (now provided by the mixin)

4. Updated: .github/CODEOWNERS

Added per-file ownership for all scoring-specific files:

/python/sglang/srt/entrypoints/engine_score_mixin.py @sundar24295s @chanh @fortunecookiee
/python/sglang/srt/entrypoints/openai/serving_score.py @sundar24295s @chanh @fortunecookiee
/python/sglang/srt/managers/tokenizer_manager_score_mixin.py @sundar24295s @chanh @fortunecookiee
/test/registered/core/test_score_api.py @sundar24295s @chanh @fortunecookiee
/benchmark/prefill_only/bench_score.py @sundar24295s @chanh @fortunecookiee

5. Updated: test/registered/openai_server/basic/test_serving_rerank.py

Updated import path to reflect the renamed module.

Files NOT Touched (and why)

These files contain scoring-related infrastructure (GPU kernels, scheduler logprob processing, attention masking) that is shared with other features. They were intentionally left in place:

  • layers/logits_processor.pycompute_logprobs_for_multi_item_scoring() operates on GPU tensors using logits-processor-internal APIs
  • layers/attention/flashinfer_backend.pyMultiItemScoringParams and _process_multi_item_scoring() are flashinfer attention-level code
  • managers/scheduler_output_processor_mixin.py — multi-item scoring conditionals are interleaved with regular logprob processing
  • server_args.pymulti_item_scoring_delimiter is a server config flag

Validation

Server startup

python -m sglang.launch_server \
  --model-path /shared/public/sharing/job-rank/kbehdin/f389cde308efd4dbb8d9-2025-06-06-18-31-30/best_model/epoch=0-step=498-HF \
  --port 30000 --host 0.0.0.0 \
  --chunked-prefill-size -1 \
  --enable-torch-compile \
  --dtype float16 \
  --max-prefill-tokens 100000 \
  --mem-fraction-static 0.3 \
  --enable-tokenizer-batch-encode \
  --disable-radix-cache \
  --disable-cuda-graph \
  --multi-item-scoring-delimiter 128255

Server starts successfully with no import errors.

Single-item scoring

curl -X POST "http://localhost:30000/v1/score" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the capital of California? Answer Yes or No for each of the following options:",
    "items": ["Scaramento"],
    "label_token_ids": [9454, 2753],
    "model": "..."
  }'

Response:

{
    "scores": [[6.398421076647679e-06, 3.3389641633503636e-06]],
    "model": "...",
    "usage": {"prompt_tokens": 23, "total_tokens": 23},
    "object": "scoring"
}

Multi-item scoring (3 items, softmax)

curl -X POST "http://localhost:30000/v1/score" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the capital of California? Answer Yes or No for each of the following options:",
    "items": ["Sacramento", "San Jose", "San Francisco"],
    "label_token_ids": [9454, 2753],
    "apply_softmax": true,
    "model": "..."
  }'

Response:

{
    "scores": [
        [0.4857216775417328, 0.5142783522605896],
        [0.6157812476158142, 0.3842187821865082],
        [0.5241511464118958, 0.475848913192749]
    ],
    "model": "...",
    "usage": {"prompt_tokens": 28, "total_tokens": 28},
    "object": "scoring"
}

Import verification

from sglang.srt.entrypoints.engine import Engine
assert hasattr(Engine, 'score')
assert hasattr(Engine, 'async_score')
# MRO: Engine → EngineScoreMixin → EngineBase → ABC → object

Test Plan

  • Server starts without import errors
  • /v1/score endpoint returns correct results for single-item scoring
  • /v1/score endpoint returns correct results for multi-item scoring with softmax
  • Engine.score() and Engine.async_score() are accessible via mixin inheritance
  • All Python imports resolve correctly (no circular imports)
  • No stale references to old file/class names remain in the codebase
  • Existing CI test test/registered/core/test_score_api.py passes
  • Existing CI test test/registered/openai_server/basic/test_serving_rerank.py passes

…enizerManager to use TokenizerManagerScoreMixin. Add new engine_score_mixin and tokenizer_manager_score_mixin files, and update CODEOWNERS to reflect new ownership. Include unit tests for scoring methods.
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@sundar24295s
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@sundar24295s
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci here

@hnyls2002
Copy link
Copy Markdown
Collaborator

@hnyls2002 hnyls2002 merged commit 90e8680 into sgl-project:main Apr 3, 2026
28 of 44 checks passed
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
Fridge003 pushed a commit that referenced this pull request Apr 7, 2026
xiezhq-hermann pushed a commit to antgroup/sglang that referenced this pull request Apr 7, 2026
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants