Lazy-load syntax Contexts#393
Merged
Merged
Conversation
This enables making `SyntaxSet::load_defaults_newlines()` *much* faster. Intuitively it might seem like this change is enough, but it is also necessary to stop compressing the "outer" data in `assets/default_newlines.packdump`. Since the inner, lazy-loaded data is already compressed, compressing the data another time only makes the data CPU-intensive to decompress, without any gains in reduced size of the data. The final step in lazy-loading will be done in a separate commit. Here are an overview of what kind of improvements are at stake: | Version | `SyntaxSet::load_defaults_newlines()` time | | -------------------- | ----------------------------------------- | | v4.6.0 | 35.3 ms | | This commit | 24.0 ms | | No outer compression | 1.3 ms (!) | Lazy-loading syntaxes is very fast (in the sub-ms range), so time-to-parse is practically unchanged by this commit.
Enselic
added a commit
to Enselic/bat
that referenced
this pull request
Nov 13, 2021
To get fast startup, syntect will instead start to lazy-load syntaxes. See trishume/syntect#393 and discussions in linked PRs.
trishume
approved these changes
Nov 22, 2021
Enselic
added a commit
to Enselic/bat
that referenced
this pull request
Nov 22, 2021
To get fast startup, syntect will instead start to lazy-load syntaxes. See trishume/syntect#393 and discussions in linked PRs.
Enselic
added a commit
to sharkdp/bat
that referenced
this pull request
Nov 22, 2021
To get fast startup, syntect will instead start to lazy-load syntaxes. See trishume/syntect#393 and discussions in linked PRs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This enables making
SyntaxSet::load_defaults_newlines()much faster. Intuitively it might seem like this change is enough, but it is also necessary to stop compressing the "outer" data inassets/default_newlines.packdump. Since the inner, lazy-loaded data is already compressed, compressing the data another time only makes the data CPU-intensive to decompress, without any gains in reduced size of the data.The final step in lazy-loading will be done in a separate commit/PR after this has been merged. Here is an overview of what kind of improvements are at stake:
SyntaxSet::load_defaults_newlines()timeLazy-loading syntaxes is very fast (in the sub-ms range even on low-end desktops for most syntaxes), so time-to-parse is practically unchanged by this commit.
Naturally, the time to build a
SyntaxSet(criterion testlink_syntaxes) has increased significantly, since it now involves serialization and compression. That is a very good trade-off however, since it is better if the one-time link step is slow but loading assets is fast, versus fast one-time linking but slow loading of assets. Loading assets occurs over and over while linking only occurs once.All syntect regression tests pass, and all bat regression tests pass. So risk of regressions is very low.
Overview of the syntax lazy-loading work:
contextsinSyntaxReferencetocontext_ids#379contextsfromSyntaxSetinto individualSyntaxReferences #382SyntaxSetwhen serializing itNote: The target branch of this PR is temporarily
store-contexts-in-syntax-referencessince #382 is not merged yet, but given the discussion in it it appears practically Approved to me, and I figured it wouldn't hurt to get the next PR in the series up for review.