Skip to content

grammar: optimize big grammar matching by using left-factorization / optimize grammar cloning#20961

Open
pwilkin wants to merge 1 commit into
ggml-org:masterfrom
pwilkin:grammar-optimize
Open

grammar: optimize big grammar matching by using left-factorization / optimize grammar cloning#20961
pwilkin wants to merge 1 commit into
ggml-org:masterfrom
pwilkin:grammar-optimize

Conversation

@pwilkin

@pwilkin pwilkin commented Mar 24, 2026

Copy link
Copy Markdown
Member

Overview

Reduce complexity of parsing for huge grammars by using left-factorization of rules + refactor cloning to use pointer offsets to avoid the four-level loop.

Additional information

Pre-factorizes grammar branches to reduce number of alternate routes. On big and ambiguous grammars, the speed gain is up to 50x. Uses Adds a performance test for grammar parsing.

Hopefully helps alleviate cases such as #20879, to be paired with a PR parametrizing MAX_REPETITION_THRESHOLD.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES, told Claude to try various optimization passes at grammar, picked the ones with minimum code change + maximum effect.

@pwilkin pwilkin requested a review from ggerganov as a code owner March 24, 2026 19:34
@github-actions github-actions Bot added the testing Everything test related label Mar 24, 2026
@aldehir

aldehir commented Mar 24, 2026

Copy link
Copy Markdown
Contributor

This is a bit much. If we want to do this, we should parse to an AST and left-factor on that instead.

Then again, that would take more lines but would offer more opportunities to optimize the AST.

I'll defer to @ggerganov if he is OK with this PR. I can run a few test scenarios to check for regressions.

Edit: On second look, it's probably close to the minimal implementation for left factoring. However, I do prefer these optimizations be done against an AST. In general, I think many of the outstanding security issues related to grammars can be handled by analyzing an AST.

@pwilkin

pwilkin commented Mar 24, 2026

Copy link
Copy Markdown
Member Author

But why? This is a 120-line-of-code optimization, mostly enclosed in one transformation function, with a huge performance gain. Sure, we can go full AST mode, but that would mean a complete rewrite of the grammar engine and meanwhile, people are having real problems with grammar efficiency and OpenClaw (see the attached issue). I don't see a problem with reworking the grammar overall, but I also don't see why we should take the "all-or-nothing" approach.

@aldehir

aldehir commented Mar 24, 2026

Copy link
Copy Markdown
Contributor

It's not a rewrite, it is changing the parser to emit an AST and then visit to emit the PDA rules.

Like I said, I can offer testing for regressions if this is desirable as-is.

@pwilkin

pwilkin commented Mar 24, 2026

Copy link
Copy Markdown
Member Author

Sounds like a rewrite to me but if you want to do it then I don't see why not :) but let's see what @ggerganov has to say.

@TangLaoya

Copy link
Copy Markdown

Dear @pwilkin ,

I noticed your update, and I tried to use b8508 along with your modified source file and rebuild the code, then run the llama-server again, the gramma error still displayed.

Thanks,
Tang Laoya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants