Add a script to fuzz the parser (courtesy of pysource-codegen)#11015
Add a script to fuzz the parser (courtesy of pysource-codegen)#11015AlexWaygood merged 6 commits intomainfrom
pysource-codegen)#11015Conversation
There was a problem hiding this comment.
No idea if this is the correct place to put this script! This adds a whole new scripts/ directory to the crate
There was a problem hiding this comment.
We have a scripts directory at the project root, I would just put it there in a subdirectory along with a requirements.in (and generated requirements.txt) file. And, the module docstring could go in the README.md of that directory. Refer to scripts/release or scripts/benchmarks.
There was a problem hiding this comment.
Or, if you prefer to keep it in ruff_python_parser, we could rename it to fuzz instead to make the intent clear
There was a problem hiding this comment.
I moved it to the scripts/ directory. I'd rather keep the module docstring, though, as it currently doubles as the text that's shown if you pass -h or --help as a CLI argument to the script:
Details
(ruff) (add-fuzzing-script) % py scripts/fuzz-parser/fuzz.py -h ~/dev/ruff
usage: fuzz.py [-h] [--only-new-bugs] [--quiet] seeds [seeds ...]
Run the parser on randomly generated (but syntactically valid) Python source-code files.
To install all dependencies for this script into an environment using `uv`, run:
uv pip install -r scripts/fuzz-parser/requirements.txt
Example invocations of the script:
- Run the fuzzer using seeds 0, 1, 2, 78 and 93 to generate the code:
`python scripts/fuzz-parser/fuzz.py 0-2 78 93`
- Run the fuzzer concurrently using seeds in range 0-10 inclusive,
but only reporting bugs that are new on your branch:
`python scripts/fuzz-parser/fuzz.py 0-10 --new-bugs-only`
- Run the fuzzer concurrently on 10,000 different Python source-code files,
and only print a summary at the end:
`python scripts/fuzz-parser/fuzz.py 1-10000 --quiet
N.B. The script takes a few seconds to get started, as the script needs to compile
your checked out version of ruff with `--release` as a first step before it
can actually start fuzzing.
positional arguments:
seeds Either a single seed, or an inclusive range of seeds in the format `0-5`
options:
-h, --help show this help message and exit
--only-new-bugs Only report bugs if they exist on the current branch, but *didn't* exist on the released version of Ruff installed into the Python environment we're running in
--quiet Print fewer things to the terminal while running the fuzzer
There also doesn't actually seem to be a README.md file for the scripts/ directory right now
|
carljm
left a comment
There was a problem hiding this comment.
Code looks good to me! Comments are just a typo and something that's probably not worth acting on.
The location of the script seems reasonable to me, but you could wait for someone who actually knows something about preferred ruff directory structure to comment on that, if you want to.
|
|
||
| @dataclass | ||
| class ResolvedCliArgs: | ||
| seeds: list[Seed] |
There was a problem hiding this comment.
I was thinking we'd have to actually partition the list of seeds, but we don't; we just feed them to the executors one by one, which is a lot simpler (and balances better, too.)
That also means technically we wouldn't have to fully materialize the list of seeds like we do now; instead we could just store the list[int | range] here, and have a method that yields seeds one at a time.
But I think this doesn't really matter, and for smaller sizes of list, materializing is probably faster than a generator.
| if len(args.seeds) <= 5: | ||
| bugs = run_fuzzer_sequentially(args) | ||
| else: | ||
| bugs = run_fuzzer_concurrently(args) |
There was a problem hiding this comment.
Did I nerd snip you into making it concurrent :D
There was a problem hiding this comment.
Maybe 😁 @carljm and I looked at it in our pairing session yesterday
There was a problem hiding this comment.
Making it concurrent was a very good call, though. It's much faster now :)
| parser.add_argument( | ||
| "--only-new-bugs", | ||
| action="store_true", | ||
| help=( | ||
| "Only report bugs if they exist on the current branch, " | ||
| "but *didn't* exist on the released version of Ruff " | ||
| "installed into the Python environment we're running in" | ||
| ), | ||
| ) |
There was a problem hiding this comment.
This is nice. So we can also just use it to test the parser in general.
There was a problem hiding this comment.
We have a scripts directory at the project root, I would just put it there in a subdirectory along with a requirements.in (and generated requirements.txt) file. And, the module docstring could go in the README.md of that directory. Refer to scripts/release or scripts/benchmarks.
|
Cc. @15r10nk -- |
|
That's great 🚀. It makes me really happy to know that pysource-codegen is useful for you. |
Summary
This PR adds a script that can be used to fuzz the parser against randomly generated, syntactically correct Python source-code files.
Test Plan
I reverted the
crates/ruff_python_parserdirectory back to how it was as of 13ffb5b (in order to deliberately reintroduce some parser bugs), then ran the script several times with various invocations. Example output with the bugs reintroduced:Details
Screenshot to show how it looks with colour:
Details