Skip to content

feat: Only evaluate grammar.js to grammar.json#4601

Merged
WillLillis merged 3 commits intotree-sitter:masterfrom
wetneb:evaluate_only
Jul 27, 2025
Merged

feat: Only evaluate grammar.js to grammar.json#4601
WillLillis merged 3 commits intotree-sitter:masterfrom
wetneb:evaluate_only

Conversation

@wetneb
Copy link
Contributor

@wetneb wetneb commented Jul 13, 2025

This adds an --evaluate-only option to tree-sitter generate so that it only does the evaluation of grammar.js to src/grammar.json, without continuing on with the generation of src/parser.c and related files.

It's a follow-up to #4580.

@clason
Copy link
Contributor

clason commented Jul 13, 2025

I'd prefer --json-only, as that makes it clearer what happens.

@wetneb
Copy link
Contributor Author

wetneb commented Jul 13, 2025

I thought it would cause confusion with --json, but as you like… Let's wait for others to chime in and I'll update it to whatever gathers consensus.

@clason
Copy link
Contributor

clason commented Jul 13, 2025

Yeah, that flag is a bit poorly named. The issue is that "evaluate" is a bit of a meaningless term as it doesn't express the output.

@WillLillis
Copy link
Member

WillLillis commented Jul 13, 2025

What if we pulled the different "output" types into a single flag expressed as an enum, maybe something like

enum GenerateArtifact {
    JSON,
    Parser,
    Lib,
}

The build flag could be removed and replaced with the Lib variant. The default generate behavior could stay the same, but a user could then do something like

ts g --artifact=json

to just create the grammar.json file.

@clason
Copy link
Contributor

clason commented Jul 13, 2025

Hmm. I like that, but if we rethink this, we should also include the input (grammar.js, grammar.json, parser.c) into the discussion. (And I'm not excited about the name "artifact"; I feel we can come up with a better term.)

(I could simplify the nvim-treesitter pipeline if we could ts b -o foo.so straight from whatever is available -- optionally even skipping writing the intermediary files to disk, for hopefully another performance boost.)

@wetneb
Copy link
Contributor Author

wetneb commented Jul 13, 2025

Perhaps "stage" could be a better term than "artifact"?

@ObserverOfTime
Copy link
Member

so that it only does the evaluation of grammar.js to src/grammar.json

And src/node-types.json?

@wetneb
Copy link
Contributor Author

wetneb commented Jul 13, 2025

src/node-types.json is generated alongside src/parser.c and other files (src/tree-sitter/*) at the moment. Would you like it to behave differently?

@clason
Copy link
Contributor

clason commented Jul 13, 2025

Ideally, yes.

@wetneb
Copy link
Contributor Author

wetneb commented Jul 13, 2025

Could you tell me how exactly? I can think of multiple ways:

  • generating it at the same stage as grammar.json
  • generating it in its own stage, between grammar.json and parser.c

@clason
Copy link
Contributor

clason commented Jul 13, 2025

I don't think it makes sense to add a separate step for this. It's closer to grammar.json, so it should be included in the same step.

@wetneb
Copy link
Contributor Author

wetneb commented Jul 17, 2025

I've changed the CLI option to --artifact=json/parser/lib as suggested by @WillLillis above. It required a bit more of a refactoring but I guess the diff is still not too daunting.

@clason
Copy link
Contributor

clason commented Jul 17, 2025

I like your suggestion of stage better -- if only because then I wouldn't have to worry about AE vs. BE spelling ;)

Have you tested if this behaves correctly when giving an argument (such as src/grammar.json) to generate?

@ObserverOfTime
Copy link
Member

Maybe generate shouldn't build the library at all since the build command exists.
Though other commands will still do so implicitly…

Copy link
Member

@WillLillis WillLillis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left a few comments.

Copy link
Member

@WillLillis WillLillis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. When you get a chance, could you squash the last 4 commits into the appropriate changes from the first 3? Happy to merge once that's cleaned up, thanks for the PR!

wetneb added 3 commits July 25, 2025 09:21
This adds an `--evaluate-only` option to `tree-sitter generate`
so that it only does the evaluation of `grammar.js` to
`src/grammar.json`, without continuing on with the generation of
`src/parser.c` and related files.

It's a follow-up to tree-sitter#4580.
@WillLillis WillLillis merged commit 8e90799 into tree-sitter:master Jul 27, 2025
18 checks passed
@amaanq
Copy link
Member

amaanq commented Oct 1, 2025

@wetneb looking back at this (and noticing how the lib variant makes no sense....), is this even needed? I would recommend grammars commit grammar.json, because then a JS runtime isn't a requirement (until 0.26 is out with the native JS runtime). Even if you decide not to, I don't really get how this helps.

@wetneb
Copy link
Contributor Author

wetneb commented Oct 1, 2025

This PR was mostly designed to enable #4646. The use cases I have in mind are:

  • for a grammar user, make it possible to compile a grammar by just typing make, regardless of the files that are already included in the archive or repository (assuming they have a JS runtime)
  • for a grammar maintainer, in the case where the grammar.json is tracked in the repository, this also makes it possible to update it easily, without generating the parser.c in the same go (which can take quite some time).

@wetneb wetneb deleted the evaluate_only branch October 1, 2025 07:51
@clason
Copy link
Contributor

clason commented Oct 1, 2025

The makefile is its own thing, we don't really care about that (and the makefile can easily call generate then build, so no need to change the CLI).

Do you have benchmarks that show that skipping the parser generation makes a significant difference (order of minutes)?

@wetneb
Copy link
Contributor Author

wetneb commented Oct 1, 2025

Changing the Makefile only was indeed my first approach, in #4580. It was requested by @WillLillis (here) and @ObserverOfTime (here) that we first change the CLI to avoid the double generation of artifacts, which also makes sense to me.

If you would rather do it differently (or not at all), I'm sure you can revert my changes - no hard feelings about that on my side :)

@amaanq
Copy link
Member

amaanq commented Oct 1, 2025

Yeah I'd rather update the Makefile/CMake, I'm not totally against having an early exit for grammar.json/node-types.json, but I think if updating the Makefile solves the use cases you outlined I'd rather do that. And since the lib variant makes no sense, I'd say a simple flag would suffice for it (if need be).

@ObserverOfTime
Copy link
Member

Some parsers take a rather long time to generate. Regardless of where the bottleneck lies, running the entire generation process twice is not desirable. If we want to have separate steps in make/CMake (which I do like having), they need to be separate steps in the generate command as well.

@amaanq
Copy link
Member

amaanq commented Oct 1, 2025

Why not just unify it in the make/cmake files (why even generate twice)?

@wetneb
Copy link
Contributor Author

wetneb commented Oct 1, 2025

Can you give an example of the Makefile rules you would use? It would help me understand the solution you have in mind.

@ObserverOfTime
Copy link
Member

ObserverOfTime commented Oct 2, 2025

Why not just unify it in the make/cmake files (why even generate twice)?

We used to generate from src/grammar.json but this prevented people from generating directly from grammar.js.
And we can't properly support both without separating the generation steps.

@clason
Copy link
Contributor

clason commented Oct 31, 2025

I think we need to partially revert this; --stage was a (my) mistake.

@amaanq Note that even with a native JS runtime, working from evaluated JSON as an intermediate state is still very valuable for grammars that extend other grammars (and hence require npm -- a no-go for downstream users like nvim-treesitter). So I'd still like to push for (and support) committing those files.

@ObserverOfTime
Copy link
Member

We should soft deprecate imports from node_modules in 0.27 and recommend submodules or vendoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants