Intro
I think we probably all agree that it would be nice to have a proper BNF for the Nix language, and that ideally the "official" parser would be compiled from that BNF. Here's why:
- It could improve editor support (e.g.: we could have a
nix-parse executable that either pretty-prints the AST or outputs it in JSON/XML/s-expression format).
- It could make projects like hnix, cabal2nix, and nixfmt much easier.
- It would also serve as an official definition of the Nix syntax, which would be a stepping stone to a formal executable semantics for Nix.
- The Nix syntax would have improved documentation (I have often seen people complain about the Nix syntax being weird, so the ability to link to the full BNF and say "it's not that bad!" would be quite nice).
- Code quality in the Nix evaluator would likely increase, and we could decrease the incidence of bugs in our parser (admittedly I don't know if such bugs are common).
There is a tool that would allow for such a thing, called BNFC. It has a very BNF-like syntax and has many backends (C/C++ via Bison/Flex, Haskell via Happy/Alex, OCaml, LaTeX for documentation), and can even generate pretty-printers.
Obstacles
The obstacles I see to using BNFC for the Nix parser are:
- The BNF would have to be written. This doesn't seem too hard but someone has to do it. There are a few existing Haskell parsers that one could look at for inspiration, and the Nix expression syntax isn't so complicated anyway. The Nix source code will probably have to refactored to deal with the new intermediate representation, as well.
- The parser thus generated could be too slow for nixpkgs – we're already dealing with some major evaluation time problems.
- Naively, this would introduce GHC into the compile-time closure of Nix.
Obstacle 1
Regarding obstacle 1, this is unavoidable but I might be willing to do a large amount of the work here (though it would be nice to have a point of contact on IRC for questions about the Nix internals).
Obstacle 2
Regarding obstacle 2: I think we mainly need an answer to the question "How much time is currently spent in the Nix parser versus other parts of evaluation?". If the answer is "not very much", then the speed of the generated parser will probably not be much of an issue. Otherwise, I would be willing to write a BNFC grammar and compare its speed to the current parser, assuming we agree on a solution for obstacle 3.
Obstacle 3
Regarding obstacle 3, I have thought of a few different options for mitigating the issue (I've put the "cons" associated with each option in bold, and any notable "pros" in italics):
Mitigations
- We could have the BNFC-based parser in addition to an in-tree parser that is used as a fallback in case BNFC is not available.
- Could lead to the BNFC parser falling out of sync with the in-tree parser.
- We now need to maintain two syntax definitions.
- Could have test code that uses QuickCheck or similar to generate definitions with the BNFC parser and tests if both parsers output the same AST, though this would not cover cases where syntax was added to the non-BNFC parser but not to the BNFC-based parser.
- We could add a commit hook and Travis CI test that forces pre-compilation of the BNFC syntax, and check in the generated parser C++ source as well as the BNF file.
- Adds nothing to the build/runtime closure of Nix.
- Could be annoying for developers of the Nix package manager.
- We could say "screw it" and have the compiled syntax as a
nativeBuildInput.
- Breaks native builds of Nix on platforms where GHC doesn't build.
- We could compile BNFC to C via JHC or GHC
-fvia-C, check that into a repo, and have Nix depend on the built version of that code.
- Adds nothing to the build/runtime closure of Nix.
- JHC seems unmaintained (last commit is 1 year old).
-fvia-C may not support the required extensions.
- There isn't yet an unregistered build of GHC in nixpkgs, AFAICT.
- Adds minor maintenance hassle.
- We could compile BNFC to JavaScript using GHCJS, check that into a repo, and have Nix depend on that code plus Node.
- Adds node.js to the build-time closure of Nix.
- Adds minor maintenance hassle.
- We could compile BNFC to LLVM assembly using GHC's LLVM backend, check that into a repo, and have Nix depend on that plus LLVM. This would probably be a good opportunity to switch Nix's build over to Clang, since we are depending on LLVM anyway. Preferably, we'd at least disassemble the LLVM bitcode before checking into a git repo, so that we can diff changes.
- Adds LLVM and potentially Clang to the build closure of Nix.
- Adds minor maintenance hassle.
- Could remove GCC from Nix build closure if we switch to Clang.
- Clang is a native cross-compiler, so this might speed up cross-compilation of Nix (though I don't think anyone does this?).
- We could precompile BNFC (with static linking) and then make a derivation that fetches the binary and
patchelfs in the libraries needed for GHC's runtime (GMP, libc, and libpthread, mainly).
- Adds a non-source package to the build closure of Nix.
- Adds maintenance hassle.
- Potentially annoying to make portable.
Personally, I would say that:
- Option 1 is the best idea iff BNFC generates a parser that is too slow to use and there's no workarounds for that fact.
- Options 3 and 7 seem like bad ideas.
- Options 6 and 5 are great, if adding LLVM or Node to the Nix build closure is palatable.
- Option 4 is great, assuming that BNFC can actually compile with
-fvia-C or JHC.
- Option 2 is probably the most realistic, assuming there is buy-in from you all.
Summary
I might be willing to make the BNF for Nix and do the preliminary work to compare efficiency with the existing parser, iff people think this is a good idea.
The main open questions are:
- Is this a good idea?
- How much time is spent in the Nix parser versus other parts of evaluation?
- Which of the above mitigations is best for addressing the Nix build closure issue?
- Can BNFC compile under
-fvia-C or JHC?
- How hard is it to compile Node for non-x86_64 platforms, and how big is its closure?
- How hard is it to compile LLVM for non-x86_64 platforms, and how big is its closure?
- Would the developers of Nix be willing to deal with installing BNFC and a commit hook?
Intro
I think we probably all agree that it would be nice to have a proper BNF for the Nix language, and that ideally the "official" parser would be compiled from that BNF. Here's why:
nix-parseexecutable that either pretty-prints the AST or outputs it in JSON/XML/s-expression format).There is a tool that would allow for such a thing, called BNFC. It has a very BNF-like syntax and has many backends (C/C++ via Bison/Flex, Haskell via Happy/Alex, OCaml, LaTeX for documentation), and can even generate pretty-printers.
Obstacles
The obstacles I see to using BNFC for the Nix parser are:
Obstacle 1
Regarding obstacle 1, this is unavoidable but I might be willing to do a large amount of the work here (though it would be nice to have a point of contact on IRC for questions about the Nix internals).
Obstacle 2
Regarding obstacle 2: I think we mainly need an answer to the question "How much time is currently spent in the Nix parser versus other parts of evaluation?". If the answer is "not very much", then the speed of the generated parser will probably not be much of an issue. Otherwise, I would be willing to write a BNFC grammar and compare its speed to the current parser, assuming we agree on a solution for obstacle 3.
Obstacle 3
Regarding obstacle 3, I have thought of a few different options for mitigating the issue (I've put the "cons" associated with each option in bold, and any notable "pros" in italics):
Mitigations
nativeBuildInput.-fvia-C, check that into a repo, and have Nix depend on the built version of that code.-fvia-Cmay not support the required extensions.patchelfs in the libraries needed for GHC's runtime (GMP, libc, and libpthread, mainly).Personally, I would say that:
-fvia-Cor JHC.Summary
I might be willing to make the BNF for Nix and do the preliminary work to compare efficiency with the existing parser, iff people think this is a good idea.
The main open questions are:
-fvia-Cor JHC?