Skip to content

cli: Adapt default Makefile/CMakeLists to work without grammar.json#4580

Closed
wetneb wants to merge 1 commit intotree-sitter:masterfrom
wetneb:makefile
Closed

cli: Adapt default Makefile/CMakeLists to work without grammar.json#4580
wetneb wants to merge 1 commit intotree-sitter:masterfrom
wetneb:makefile

Conversation

@wetneb
Copy link
Contributor

@wetneb wetneb commented Jul 9, 2025

This is a step towards making it easier for folks not to include tree-sitter generated files in grammar repos, as outlined in #930.

This makes it possible to run make or cmake in grammar repos where grammar.json hasn't been included, and still get the parser compilation to work. The build scripts still work if grammar.json is included.

One problem that if grammar.json isn't included, the generation process will run twice:

  • once as tree-sitter generate grammar.js
  • a second time as tree-sitter generate src/grammar.json But that's not actually necessary as the first invokation already generates both src/grammar.json and src/parser.c (plus other files).

If that's an issue, one could introduce a parameter to tree-sitter generate so that it only does the evaluation from grammar.js to src/grammar.json and stops there. This would avoid work getting done twice.

This change was suggested to me by @mavit in a grammar I maintain which doesn't track generated files. I'm far from a make expert so I hope this makes sense.

…json

This is a step towards making it easier for folks not to include
tree-sitter generated files in grammar repos, as outlined in tree-sitter#930.

This makes it possible to run `make` or `cmake` in grammar repos where
`grammar.json` hasn't been included, and still get the parser
compilation to work. The build scripts still work if `grammar.json` is
included.

One problem that if `grammar.json` isn't include, the generation process
will run twice:
* once as `tree-sitter generate grammar.js`
* a second time as `tree-sitter generate src/grammar.json`
But that's not actually necessary as the first invokation already
generates both `src/grammar.json` and `src/parser.c` (plus other files).

If that's an issue, one could introduce a parameter to `tree-sitter
generate` so that it only does the evaluation from `grammar.js` to
`src/grammar.json` and stops there. This would avoid work getting done
twice.
@clason
Copy link
Contributor

clason commented Jul 10, 2025

This change was suggested to me by @mavit in a grammar I maintain which doesn't track generated files.

Please do track the generated JSON files. This allows consumers to generate the grammar without having node (or npm) installed and also improves the performance.

(This is orthogonal to the change here, which makes sense.)

@clason
Copy link
Contributor

clason commented Jul 10, 2025

If that's an issue, one could introduce a parameter to tree-sitter generate so that it only does the evaluation from grammar.js to src/grammar.json and stops there. This would avoid work getting done twice.

That is also an interesting idea which would be helpful for a) grammars that track the json but not the parser (and want to enforce this in CI) and b) independently benchmarking the JSON and C generation, which might indicate more opportunities for optimizations. @WillLillis ?

@WillLillis
Copy link
Member

If that's an issue, one could introduce a parameter to tree-sitter generate so that it only does the evaluation from grammar.js to src/grammar.json and stops there. This would avoid work getting done twice.

That is also an interesting idea which would be helpful for a) grammars that track the json but not the parser (and want to enforce this in CI) and b) independently benchmarking the JSON and C generation, which might indicate more opportunities for optimizations. @WillLillis ?

This would definitely be a nice data point to work with!

@ObserverOfTime
Copy link
Member

If that's an issue, one could introduce a parameter to tree-sitter generate so that it only does the evaluation from grammar.js to src/grammar.json and stops there. This would avoid work getting done twice.

This should be done first.

@wetneb
Copy link
Contributor Author

wetneb commented Jul 12, 2025

Nice, thank you all for your feedback! I will look into adding this feature to tree-sitter generate then.

Concerning the tracking of grammar.json (I hope it's okay that I reply here - if not, happy to move this elsewhere), I understand the advantages you mention, but I also like the benefits of not tracking generated files at all. Being able to make commits that only change grammar.js is a big help in terms of maintainability. I see that unlike parser.c, grammar.json is more concise and mergeable, but to me, the solution of running tree-sitter generate in the CI for the files to be included in releases and uploaded to package repositories seems fitting and shouldn't make any visible difference to users, AFAICT. For now I have only set up this workflow for Crates.io, but I imagine it would also work for PyPI and NPM. Golang will likely be harder. If users want to point to a particular git commit in the grammar repository, it's true that it won't work, but I see that as a helpful incentive to publish versions regularly (and it's still possible to make a GitHub/Codeberg fork of the repo where they generate the files and check them into git, I guess).

@clason
Copy link
Contributor

clason commented Jul 12, 2025

Being able to make commits that only change grammar.js is a big help in terms of maintainability.

I would argue not -- without running tree-sitter generate, you have no idea of the effect of these changes (especially on the queries you commit!) -- and consumers can't easily see them from the diff either. (As someone who maintains a lot of grammars, I can tell you that that is the by far the least of the problems...)

But it's your project -- you do you. But be aware that not having a generated grammar.json in the repo is a hard blocker for including the grammar in "consolidation projects" like nvim-treesitter.

@clason
Copy link
Contributor

clason commented Jul 12, 2025

Nice, thank you all for your feedback! I will look into adding this feature to tree-sitter generate then.

Thank you! This makes this PR obsolete, so I'm taking the liberty of closing it.

@clason clason closed this Jul 12, 2025
@wetneb
Copy link
Contributor Author

wetneb commented Jul 12, 2025

Thank you! This makes this PR obsolete, so I'm taking the liberty of closing it.

I don't think it would make those changes obsolete, as I think those changes would still be needed even after tree-sitter generate is able to only generate grammar.json from grammar.js or parser.c from grammar.json. But I guess I could reopen this later if the tree-sitter generate changes work out.

I would argue not -- without running tree-sitter generate, you have no idea of the effect of these changes (especially on the queries you commit!) -- and consumers can't easily see them from the diff either.

Of course I still run tree-sitter generate locally and in the CI. And I think the diff on grammar.js is more helpful for consumers as it is more concise and human-readable.

wetneb added a commit to wetneb/tree-sitter that referenced this pull request Jul 13, 2025
This adds an `--evaluate-only` option to `tree-sitter generate`
so that it only does the evaluation of `grammar.js` to
`src/grammar.json`, without continuing on with the generation of
`src/parser.c` and related files.

It's a follow-up to tree-sitter#4580.
wetneb added a commit to wetneb/tree-sitter that referenced this pull request Jul 17, 2025
This adds an `--evaluate-only` option to `tree-sitter generate`
so that it only does the evaluation of `grammar.js` to
`src/grammar.json`, without continuing on with the generation of
`src/parser.c` and related files.

It's a follow-up to tree-sitter#4580.
wetneb added a commit to wetneb/tree-sitter that referenced this pull request Jul 25, 2025
This adds an `--evaluate-only` option to `tree-sitter generate`
so that it only does the evaluation of `grammar.js` to
`src/grammar.json`, without continuing on with the generation of
`src/parser.c` and related files.

It's a follow-up to tree-sitter#4580.
WillLillis pushed a commit that referenced this pull request Jul 27, 2025
This adds an `--evaluate-only` option to `tree-sitter generate`
so that it only does the evaluation of `grammar.js` to
`src/grammar.json`, without continuing on with the generation of
`src/parser.c` and related files.

It's a follow-up to #4580.
wetneb added a commit to wetneb/tree-sitter that referenced this pull request Jul 27, 2025
…json

This is a new version of tree-sitter#4580, to make it easier not to include
tree-sitter generated files in grammar repos, as outlined in tree-sitter#930.

This makes it possible to run `make` or `cmake --build .` in grammar repos where
`grammar.json` hasn't been included, and still get the parser
compilation to work. The build scripts still work if `grammar.json` is
included.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants