Skip to content

[feat] add grammar sessions#9156

Open
nathanrchn wants to merge 14 commits intosgl-project:mainfrom
nathanrchn:feat/grammar-sessions
Open

[feat] add grammar sessions#9156
nathanrchn wants to merge 14 commits intosgl-project:mainfrom
nathanrchn:feat/grammar-sessions

Conversation

@nathanrchn
Copy link
Copy Markdown
Contributor

@nathanrchn nathanrchn commented Aug 13, 2025

Motivation

Enables persistent grammar for multi-turn conversations where the same grammar needs to be enforced across multiple requests, supporting agentic workflows that require consistent structured output formatting using the same grammar.

Modifications

  • Add /create_grammar and /delete_grammar HTTP endpoints for grammar lifecycle management
  • Add grammar_id sampling parameter to reference existing grammar sessions
  • Implement grammar session storage and retrieval logic

Accuracy Tests

Benchmarking and Profiling

Checklist

- Introduced CreateGrammarReqInput and CreateGrammarReqOutput data classes for grammar creation requests and responses.
- Added DeleteGrammarReqInput data class for grammar deletion requests.
- Implemented create_grammar and delete_grammar methods in the Scheduler and TokenizerManager classes to handle grammar operations.
- Updated SamplingParams to include grammar_id for better request handling.
- Enhanced grammar management with error handling and logging for existing grammar IDs and initialization checks.
- Implemented /create_grammar and /delete_grammar API routes for managing grammars.
- Integrated CreateGrammarReqInput and DeleteGrammarReqInput for request handling.
- Enhanced error handling for grammar creation and deletion processes.
- Introduced a new test suite for grammar management, covering creation, deletion, and error handling for grammars.
- Implemented tests for creating grammars with JSON schema and EBNF notation, including scenarios for duplicate IDs and missing content.
- Added tests for generating content using grammars and handling sessions for incremental generation.
- Ensured proper cleanup of grammars and sessions after tests to maintain isolation.
- Updated the grammar dictionary to include an owner request for better session handling.
- Added checks to prevent reusing grammar IDs while an owner request is still active.
- Adjusted the grammar assignment logic to accommodate the new structure.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @nathanrchn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new feature that enables persistent grammar sessions for multi-turn conversations. This allows users to define and reuse grammars (JSON schema, regex, EBNF) across multiple API requests, ensuring consistent structured output. This is particularly useful for agentic workflows requiring strict output formatting.

Highlights

  • New API Endpoints: Added /create_grammar and /delete_grammar HTTP endpoints for managing the lifecycle of grammar sessions.
  • Persistent Grammar IDs: Introduced a grammar_id parameter in sampling settings, allowing users to reference and apply previously defined grammars during text generation.
  • Backend Grammar Management: Implemented core logic within the scheduler and tokenizer manager to store, retrieve, and validate grammar objects, ensuring efficient reuse and proper error handling for grammar sessions.
  • Comprehensive Testing: Included a new dedicated test suite to thoroughly validate the functionality of grammar session creation, usage, deletion, and error scenarios.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces grammar sessions, allowing for persistent grammars in multi-turn conversations. It adds /create_grammar and /delete_grammar endpoints and a grammar_id sampling parameter. The implementation involves changes in the HTTP server, scheduler, and tokenizer manager to handle the lifecycle of grammars. New tests are added to cover this functionality.

My review found a high-severity bug in the grammar creation logic that could lead to a server crash, and a minor typo in a variable name. I've provided suggestions to fix these issues. Overall, the changes are well-structured and the feature is a valuable addition.

Comment thread python/sglang/srt/managers/tokenizer_manager.py
Comment thread python/sglang/srt/managers/scheduler.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant