Merged
Conversation
c0ed038 to
08b0203
Compare
3fe4791 to
a45a6c4
Compare
thecrypticace
approved these changes
Mar 11, 2025
philipp-spiess
approved these changes
Mar 11, 2025
RobinMalfait
pushed a commit
that referenced
this pull request
Mar 26, 2026
## Summary This specializes the `.jsonl` and `.ndjson` file extensions so they're preprocessed like JSON instead of by the standard scanner. This prevents them from creating thousands of sub machines and reduces scanning time (see #17125 where this was done for `.json` files). It seems reasonable to handle new-line delimited JSON files as well otherwise scanning these files can take quite a long time. It's quite unlikely that these will contain classes so, alternatively, these *could* go in the binary extensions list so they get ignored entirely. ## Test plan I ran manual tests inside the `oxide` crate against some large-ish JSONL files (5MB–15MB). These changes bring down scanning time from 2s–3s on my M3 Max (via `cargo test --release …`) to less than 20ms. I also ran tests through a full CLI build pipeline on a low-spec linux box. This change brought scanning time down from ~90s to ~300ms for a single ~15MB file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a small JSON pre processor to improve parsing JSON files. This is because the extractor creates "sub machines" whenever it encounters a
[or a{in the input. We do this because of things like%w[…]strings in Ruby orclassName={clsx({flex: true})}in JSX.Due to the sheer amount of potential
[and]brackets, it could be that parsing JSON files are way slower than they need to be.To tackle this, after this PR, when given an input like this:
We'll preprocess all the important brackets and braces by replacing them with spaces so the extractor doesn't need special casing:
We saw this while debugging this issue: #17092
Test plan