Skip to content

feat: support qwen3(-VL) rerank scoring&chat template#16403

Merged
Fridge003 merged 16 commits intosgl-project:mainfrom
alphabetc1:feat/support_reranker
Jan 14, 2026
Merged

feat: support qwen3(-VL) rerank scoring&chat template#16403
Fridge003 merged 16 commits intosgl-project:mainfrom
alphabetc1:feat/support_reranker

Conversation

@alphabetc1
Copy link
Copy Markdown
Collaborator

@alphabetc1 alphabetc1 commented Jan 4, 2026

Motivation

This patch:

  1. Support Qwen3-Reranker/Qwen3-VL-Reranker scoring for /v1/rerank
  2. Introduced optional instructtop_nreturn_documents field for /v1/rerank request
  3. Added a chat-template-based prompt template for reranking(auto-complete prefix, suffix, and instruct)

Usage(text rerank):

  • Launch sglang:
python -m sglang.launch_server \
  --model-path /root/models/Qwen/Qwen3-Reranker-0.6B \
  --served-model-name rerank \
  --disable-radix-cache \
  --host 0.0.0.0 \
  --port 8001 \
  --chat-template examples/chat_template/qwen3_reranker.jinja
  • Send a request(instructtop_nandreturn_documentsare optional.):
curl -X POST http://127.0.0.1:8001/v1/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "query": "法国首都是哪里?",
    "documents": [
      "法国的首都是巴黎。",
      "德国的首都是柏林。",
      "香蕉是黄色的水果。"
    ],
    "instruct": "Given a web search query, retrieve relevant passages that answer the query.",
    "top_n": 3,
    "return_documents": true
  }'

response:

[
  {
    "score": 0.7981867777396212,
    "document": "法国的首都是巴黎。",
    "index": 0,
    "meta_info": null
  },
  {
    "score":0.0002034269780552065,
    "document": "德国的首都是柏林。",
    "index": 1,
    "meta_info": null
  },
  {
    "score": 4.637871638728972e-06,
    "document": "香蕉是黄色的水果。",
    "index": 2,
    "meta_info": null
  }
]

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments (/tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci) or contact authorized users to do so.
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Jan 4, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @alphabetc1, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the reranking capabilities by integrating support for the Qwen3-Reranker model. It introduces a new mechanism for handling decoder-only rerankers, allowing them to leverage logprob scoring. The changes include updates to the API protocol to accept an optional instruction field, a new chat template for Qwen3, and core logic modifications to process these new reranker types, ensuring compatibility and proper scoring.

Highlights

  • Qwen3-Reranker Support: The system now supports scoring with the Qwen3-Reranker model, which uses a decoder-only logprob approach for relevance scoring.
  • Optional Instruct Field: An optional 'instruct' field has been added to the '/v1/rerank' API, allowing users to provide specific instructions or context for reranking queries.
  • score_prompts Utility: A new helper function, 'score_prompts', is introduced to facilitate scoring of pre-rendered prompts, particularly useful for models like Qwen3-Reranker that require specific input formats.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the Qwen3-Reranker model, which uses decoder-only log-probability scoring. This involves a new chat template, a new score_prompts helper method, and significant logic changes in the /v1/rerank endpoint handler to support this new scoring mechanism alongside existing cross-encoder models. The changes also include adding an optional instruct field to the rerank API and improving how embedding scores are handled.

My review focuses on the correctness of the new logic path for the Qwen3 reranker and the test coverage for the new functionality. I've identified a potential issue in the model detection logic that could lead to incorrect behavior for other generation models. I've also suggested expanding the test suite to cover the new Qwen3 reranker functionality, which is currently untested.

Comment thread python/sglang/srt/entrypoints/openai/serving_rerank.py Outdated
Comment thread test/registered/openai_server/basic/test_serving_rerank.py
@alphabetc1
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@github-actions github-actions Bot added the run-ci label Jan 6, 2026
@alphabetc1 alphabetc1 force-pushed the feat/support_reranker branch from 3be6eb6 to 969ce39 Compare January 7, 2026 16:13
@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
@JustinTong0323
Copy link
Copy Markdown
Collaborator

Working on this PR to also support qwen3-vl-reranker

JustinTong0323 and others added 2 commits January 8, 2026 15:35
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
@JustinTong0323
Copy link
Copy Markdown
Collaborator

Hi @alphabetc1, I have updated this PR to incorporate support for the newly released qwen3-vl-reranker. Could you help to review it once more and also ensure that it does not disrupt your usage? Thanks~

- Updated documentation to include detailed descriptions of supported rerank models, specifically highlighting the Qwen3-VL-Reranker and its multimodal capabilities.
- Improved the Jinja template rendering logic for handling multimodal content.
- Refactored token ID retrieval for 'yes' and 'no' responses to be dynamic based on the tokenizer, enhancing compatibility across different model sizes.
- Added unit tests for the Qwen3-VL reranker to ensure correct handling of logprobs and scoring.

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
@alphabetc1
Copy link
Copy Markdown
Collaborator Author

alphabetc1 commented Jan 9, 2026

Hi @alphabetc1, I have updated this PR to incorporate support for the newly released qwen3-vl-reranker. Could you help to review it once more and also ensure that it does not disrupt your usage? Thanks~

Nice work, thanks for updating!
I did a quick test and noticed the scores are a bit different from before. Let me double‑check what’s going on...
before:

[{"score":0.9975845167424338,"document":"法国的首都是巴黎。","index":0,"meta_info":null},{"score":0.10230470789265304,"document":"德国的首都是柏林。","index":1,"meta_info":null},{"score":0.0017274569821434306,"document":"香蕉是黄色的水果。","index":2,"meta_info":null}]

now:

[{"score":0.7981867777396212,"document":"法国的首都是巴黎。","index":0,"meta_info":null},{"score":0.0002034269780552065,"document":"德国的首都是柏林。","index":1,"meta_info":null},{"score":4.637871638728972e-06,"document":"香蕉是黄色的水果。","index":2,"meta_info":null}]

@alphabetc1
Copy link
Copy Markdown
Collaborator Author

Hi @alphabetc1, I have updated this PR to incorporate support for the newly released qwen3-vl-reranker. Could you help to review it once more and also ensure that it does not disrupt your usage? Thanks~

Nice work, thanks for updating! I did a quick test and noticed the scores are a bit different from before. Let me double‑check what’s going on... before:

[{"score":0.9975845167424338,"document":"法国的首都是巴黎。","index":0,"meta_info":null},{"score":0.10230470789265304,"document":"德国的首都是柏林。","index":1,"meta_info":null},{"score":0.0017274569821434306,"document":"香蕉是黄色的水果。","index":2,"meta_info":null}]

now:

[{"score":0.7981867777396212,"document":"法国的首都是巴黎。","index":0,"meta_info":null},{"score":0.0002034269780552065,"document":"德国的首都是柏林。","index":1,"meta_info":null},{"score":4.637871638728972e-06,"document":"香蕉是黄色的水果。","index":2,"meta_info":null}]

Ok, this is caused by the qwen3_reranker.jinja update. The current behavior is more reasonable — thanks for the fix!

@alphabetc1
Copy link
Copy Markdown
Collaborator Author

@JustinTong0323 Just did some refactoring, PTLA

Comment thread examples/chat_template/qwen3_reranker.jinja Outdated
Comment thread python/sglang/srt/entrypoints/openai/serving_rerank.py Outdated
@JustinTong0323 JustinTong0323 changed the title feat: support qwen3 rerank scoring&chat template feat: support qwen3(-VL) rerank scoring&chat template Jan 12, 2026
@JustinTong0323
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

1 similar comment
@alphabetc1
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@mingfeima
Copy link
Copy Markdown
Collaborator

mingfeima commented Jan 14, 2026

@mingxu please help check xpu ci failure.

@alphabetc1
Copy link
Copy Markdown
Collaborator Author

alphabetc1 commented Jan 14, 2026

/rerun-failed-ci 1

@JustinTong0323
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@Fridge003 Fridge003 merged commit de94d79 into sgl-project:main Jan 14, 2026
814 of 849 checks passed
@alphabetc1 alphabetc1 deleted the feat/support_reranker branch January 15, 2026 02:54
zackyoray pushed a commit to zackyoray/sglang that referenced this pull request Jan 21, 2026
)

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation high priority run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants