Do not compress the outer parts of a SyntaxSet#398
Conversation
Since the lazy-loaded syntaxes already has their data compressed, it is only detrimental to performance to compress another time. So stop compressing the SyntaxSet and add APIs for clients too.
| /// data that has been embedded in your own binary with the [`include_bytes!`] | ||
| /// macro. | ||
| #[cfg(any(feature = "dump-load", feature = "dump-load-rs"))] | ||
| fn from_uncompressed_data<T: DeserializeOwned>(v: &[u8]) -> Result<T> { |
There was a problem hiding this comment.
In the spirit of #98 I chose to return a Result<T> here instead of T as from_binary. I also chose to call it _data instead of _binary since I think _data is clearer. But I am of course willing to make any changes that my reviewer(s) request.
|
I'm not entirely comfortable to merge this PR without any review, so I'll leave this open for at least another month. After it is merged, it would be fantastic with a new syntect release, because we can then practically immediately make a bat release that makes use of lazy-loading of syntaxes. |
trishume
left a comment
There was a problem hiding this comment.
Looks good, this isn't that big or weird of a change so I would have been fine with you merging this on your own after a month, for reference.
I'm happy to do a release now! Do you know offhand whether any important public APIs changed in a breaking way? I remember one minor change to something public that I'm pretty sure nobody uses, that I'm fine doing in a non-major release, but was there anything other than that? If you don't remember I can look through myself.
Good to know, I'll take this into account when evaluating complexity/risk of PRs in the future 👍 Looking forward to the release! I quickly skimmed through In terms of changes to the public API, I checked the diff using my home-cooked method: #!/usr/bin/env bash
set -o errexit -o nounset -o pipefail
RUSTDOCFLAGS='-Z unstable-options --output-format json' cargo +nightly doc --no-deps
jq '.index | .[] | select(.crate_id == 0) | select(.visibility == "public") | .name' target/doc/syntect.json | sort > public-api-symbols.txtDiffing the output of v4.6.0 with origin/master produces the following list of changes:
To be precise; the diff of adding I did all this in a bit of a rush since now I need to prepare for Christmas, but it should be a good baseline for you when you double-check the diff yourself :) Feel free to remove and add and do whatever changes you like to my draft! |
|
Thanks so much for figuring out the changelog for me, that made things much quicker! I just pushed the release and published Thanks again for all your great contributions and your patience with how spottily I've put time into maintenance. |
Here is the last PR in the lazy-loading series.
Since the lazy-loaded syntaxes already has their data compressed, it is only
detrimental to performance to compress another time. So stop compressing the
SyntaxSetand add APIs for clients too.To summarize impact of lazy-loading work, here are criterion benchmark numbers with v4.6.0 as baseline (with some back-ported benches to make it possible to compare):
Much faster:
from_dump_filetakes 1.3263 ms instead of 42.257 ms, an improvement with -96.859%load_internal_dumpshow similar numbers asfrom_dump_fileSignificant improvements:
load_and_highlight/*: Each test is ~41 ms faster compared to before (due to theload_internal_dumpimprovement). For example,load_and_highlight/"parser.rs"takes ~434 ms instead of ~467 ms, andload_and_highlight/"highlight_test.erb"takes ~32 ms instead of ~62 ms.No practical difference in performance compared to v4.6.0:
parse/*highlight/*add_from_folderload_internal_themesstack_matchinghighlight_htmlMuch slower:
link_syntaxestakes ~200 ms instead of ~17 ms since it now involves serialization and compression. It's for the greater good though of course since linking is a one time thing while loading dumps happens over and over, and affects startup time in a much more fundamental way.Another example of something that has gotten much faster is
syncat. The improvement numbers I gave in the prototype code is still applicable.Overview of the syntax lazy-loading work:
contextsinSyntaxReferencetocontext_ids#379contextsfromSyntaxSetinto individualSyntaxReferences #382(Note that MSRV CI will fail due to #397 not being merged)
(Note that projects that have their own code for dumps (like bat) is not affected by this PR, but for "normal" clients this PR is important.)