feat: support qwen3(-VL) rerank scoring&chat template by alphabetc1 · Pull Request #16403 · sgl-project/sglang

alphabetc1 · 2026-01-04T16:34:45Z

Motivation

This patch:

Support Qwen3-Reranker/Qwen3-VL-Reranker scoring for /v1/rerank
Introduced optional instruct、top_n、return_documents field for /v1/rerank request
Added a chat-template-based prompt template for reranking(auto-complete prefix, suffix, and instruct)

Usage(text rerank):

Launch sglang:

python -m sglang.launch_server \
  --model-path /root/models/Qwen/Qwen3-Reranker-0.6B \
  --served-model-name rerank \
  --disable-radix-cache \
  --host 0.0.0.0 \
  --port 8001 \
  --chat-template examples/chat_template/qwen3_reranker.jinja

Send a request(instruct、top_nandreturn_documentsare optional.):

curl -X POST http://127.0.0.1:8001/v1/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "query": "法国首都是哪里？",
    "documents": [
      "法国的首都是巴黎。",
      "德国的首都是柏林。",
      "香蕉是黄色的水果。"
    ],
    "instruct": "Given a web search query, retrieve relevant passages that answer the query.",
    "top_n": 3,
    "return_documents": true
  }'

response:

[
  {
    "score": 0.7981867777396212,
    "document": "法国的首都是巴黎。",
    "index": 0,
    "meta_info": null
  },
  {
    "score":0.0002034269780552065,
    "document": "德国的首都是柏林。",
    "index": 1,
    "meta_info": null
  },
  {
    "score": 4.637871638728972e-06,
    "document": "香蕉是黄色的水果。",
    "index": 2,
    "meta_info": null
  }
]

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments (/tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci) or contact authorized users to do so.
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-01-04T16:35:05Z

Summary of Changes

Hello @alphabetc1, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the reranking capabilities by integrating support for the Qwen3-Reranker model. It introduces a new mechanism for handling decoder-only rerankers, allowing them to leverage logprob scoring. The changes include updates to the API protocol to accept an optional instruction field, a new chat template for Qwen3, and core logic modifications to process these new reranker types, ensuring compatibility and proper scoring.

Highlights

Qwen3-Reranker Support: The system now supports scoring with the Qwen3-Reranker model, which uses a decoder-only logprob approach for relevance scoring.
Optional Instruct Field: An optional 'instruct' field has been added to the '/v1/rerank' API, allowing users to provide specific instructions or context for reranking queries.
score_prompts Utility: A new helper function, 'score_prompts', is introduced to facilitate scoring of pre-rendered prompts, particularly useful for models like Qwen3-Reranker that require specific input formats.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the Qwen3-Reranker model, which uses decoder-only log-probability scoring. This involves a new chat template, a new score_prompts helper method, and significant logic changes in the /v1/rerank endpoint handler to support this new scoring mechanism alongside existing cross-encoder models. The changes also include adding an optional instruct field to the rerank API and improving how embedding scores are handled.

My review focuses on the correctness of the new logic path for the Qwen3 reranker and the test coverage for the new functionality. I've identified a potential issue in the model detection logic that could lead to incorrect behavior for other generation models. I've also suggested expanding the test suite to cover the new Qwen3 reranker functionality, which is currently untested.

alphabetc1 · 2026-01-06T02:53:48Z

/tag-and-rerun-ci

yhyang201 · 2026-01-08T10:23:55Z

/rerun-failed-ci

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

JustinTong0323 · 2026-01-08T15:35:40Z

Working on this PR to also support qwen3-vl-reranker

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

JustinTong0323 · 2026-01-08T18:23:49Z

Hi @alphabetc1, I have updated this PR to incorporate support for the newly released qwen3-vl-reranker. Could you help to review it once more and also ensure that it does not disrupt your usage? Thanks~

- Updated documentation to include detailed descriptions of supported rerank models, specifically highlighting the Qwen3-VL-Reranker and its multimodal capabilities. - Improved the Jinja template rendering logic for handling multimodal content. - Refactored token ID retrieval for 'yes' and 'no' responses to be dynamic based on the tokenizer, enhancing compatibility across different model sizes. - Added unit tests for the Qwen3-VL reranker to ensure correct handling of logprobs and scoring. Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

alphabetc1 · 2026-01-09T04:49:56Z

Hi @alphabetc1, I have updated this PR to incorporate support for the newly released qwen3-vl-reranker. Could you help to review it once more and also ensure that it does not disrupt your usage? Thanks~

Nice work, thanks for updating!
I did a quick test and noticed the scores are a bit different from before. Let me double‑check what’s going on...
before:

[{"score":0.9975845167424338,"document":"法国的首都是巴黎。","index":0,"meta_info":null},{"score":0.10230470789265304,"document":"德国的首都是柏林。","index":1,"meta_info":null},{"score":0.0017274569821434306,"document":"香蕉是黄色的水果。","index":2,"meta_info":null}]

now:

[{"score":0.7981867777396212,"document":"法国的首都是巴黎。","index":0,"meta_info":null},{"score":0.0002034269780552065,"document":"德国的首都是柏林。","index":1,"meta_info":null},{"score":4.637871638728972e-06,"document":"香蕉是黄色的水果。","index":2,"meta_info":null}]

alphabetc1 · 2026-01-09T05:02:43Z

Hi @alphabetc1, I have updated this PR to incorporate support for the newly released qwen3-vl-reranker. Could you help to review it once more and also ensure that it does not disrupt your usage? Thanks~

Nice work, thanks for updating! I did a quick test and noticed the scores are a bit different from before. Let me double‑check what’s going on... before:
[{"score":0.9975845167424338,"document":"法国的首都是巴黎。","index":0,"meta_info":null},{"score":0.10230470789265304,"document":"德国的首都是柏林。","index":1,"meta_info":null},{"score":0.0017274569821434306,"document":"香蕉是黄色的水果。","index":2,"meta_info":null}]
now:
[{"score":0.7981867777396212,"document":"法国的首都是巴黎。","index":0,"meta_info":null},{"score":0.0002034269780552065,"document":"德国的首都是柏林。","index":1,"meta_info":null},{"score":4.637871638728972e-06,"document":"香蕉是黄色的水果。","index":2,"meta_info":null}]

Ok, this is caused by the qwen3_reranker.jinja update. The current behavior is more reasonable — thanks for the fix!

alphabetc1 · 2026-01-09T10:05:09Z

@JustinTong0323 Just did some refactoring, PTLA

JustinTong0323 · 2026-01-13T20:47:48Z

/rerun-failed-ci

alphabetc1 · 2026-01-14T00:00:15Z

/rerun-failed-ci

mingfeima · 2026-01-14T02:00:44Z

@mingxu please help check xpu ci failure.

alphabetc1 · 2026-01-14T04:27:10Z

/rerun-failed-ci 1

JustinTong0323 · 2026-01-14T16:18:57Z

/rerun-failed-ci

) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

alphabetc1 requested review from CatherineSue, JustinTong0323, Ying1123, hnyls2002, ispobock, merrymercy, slin1237 and xiezhq-hermann as code owners January 4, 2026 16:34

github-actions Bot added the documentation Improvements or additions to documentation label Jan 4, 2026

gemini-code-assist Bot reviewed Jan 4, 2026

View reviewed changes

Comment thread python/sglang/srt/entrypoints/openai/serving_rerank.py Outdated

Comment thread test/registered/openai_server/basic/test_serving_rerank.py

github-actions Bot added the run-ci label Jan 6, 2026

feat: support qwen3 rerank&template

969ce39

alphabetc1 force-pushed the feat/support_reranker branch from 3be6eb6 to 969ce39 Compare January 7, 2026 16:13

qwen3vl-rerank works

f8a51ae

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

JustinTong0323 and others added 2 commits January 8, 2026 15:35

Merge branch 'main' into feat/support_reranker

14aa29a

fix up qwen3vl-reranker

abee0c4

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

JustinTong0323 added 2 commits January 8, 2026 19:02

fix qwen3-vl-reranker logprob process

5a43807

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

Did some minor refactoring

240e072

alphabetc1 commented Jan 9, 2026

View reviewed changes

Comment thread examples/chat_template/qwen3_reranker.jinja Outdated

lint

4f087c3

alphabetc1 commented Jan 9, 2026

View reviewed changes

Comment thread python/sglang/srt/entrypoints/openai/serving_rerank.py Outdated

ispobock added the high priority label Jan 10, 2026

alphabetc1 and others added 3 commits January 12, 2026 12:46

ci

c6d4e48

fix ci

ae5f290

Merge branch 'main' into feat/support_reranker

cc02f61

JustinTong0323 changed the title ~~feat: support qwen3 rerank scoring&chat template~~ feat: support qwen3(-VL) rerank scoring&chat template Jan 12, 2026

JustinTong0323 and others added 4 commits January 12, 2026 16:34

Merge branch 'main' into feat/support_reranker

368ec7d

Merge branch 'main' into feat/support_reranker

8d10165

Merge branch 'main' into feat/support_reranker

dc71fd8

fix doc

7510f90

Merge branch 'main' into feat/support_reranker

a887b55

Fridge003 merged commit de94d79 into sgl-project:main Jan 14, 2026
814 of 849 checks passed

alphabetc1 deleted the feat/support_reranker branch January 15, 2026 02:54

zackyoray pushed a commit to zackyoray/sglang that referenced this pull request Jan 21, 2026

feat: support qwen3(-VL) rerank scoring&chat template (sgl-project#16403

228e76a

) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

Conversation

alphabetc1 commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Usage(text rerank):

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Jan 4, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

alphabetc1 commented Jan 6, 2026

Uh oh!

yhyang201 commented Jan 8, 2026

Uh oh!

JustinTong0323 commented Jan 8, 2026

Uh oh!

JustinTong0323 commented Jan 8, 2026

Uh oh!

alphabetc1 commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alphabetc1 commented Jan 9, 2026

Uh oh!

alphabetc1 commented Jan 9, 2026

Uh oh!

Uh oh!

Uh oh!

JustinTong0323 commented Jan 13, 2026

Uh oh!

alphabetc1 commented Jan 14, 2026

Uh oh!

mingfeima commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alphabetc1 commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JustinTong0323 commented Jan 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

alphabetc1 commented Jan 4, 2026 •

edited

Loading

alphabetc1 commented Jan 9, 2026 •

edited

Loading

mingfeima commented Jan 14, 2026 •

edited

Loading

alphabetc1 commented Jan 14, 2026 •

edited

Loading