YUV->RGBA conversion: Special case the edge pixels, do the middle without index clamping#12
Merged
kmeisthax merged 3 commits intokmeisthax:video-h263from Jan 21, 2021
Conversation
…n step This yielded an overall 20% faster decoding speed on the video I tested
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR splits the color space conversion of the edge pixels and "the rest", to allow fewer operations on the inside pixels that are the straightforward case.
The effects of each included commit (plus two excluded ones) on the runtime of a particular command are:
These numbers are the output of
time(in seconds) running the following command (after compilation is done), with each sample averaged from three runs. The error bars are one standard deviation long in both directions.time cargo run --package=exporter --release -- ../../Downloads/z0r-de_4145.swf --frames 1000I also commented out the actual saving of the frames into files, so the effect on rendering itself is more directly measurable.
While the "utility functions" commit regresses a little bit, doing it is almost a necessity for the one after it, which is the one providing the significant gains.
Overall, these changes sped up the rendering by about 25%.
I also made two more experiments (independently) that I then discarded because they both regressed slightly:
The first one was doing the bilinear interpolation differently: on
f32numbers, in two steps (the usual way, in a rotated H-shape).The second one was simply omitting the
.min()and.max()calls fromclamp(), relying on the saturating property of thef32tou8cast instead.I don't know if this is starting to stretch the "code simplicity/cleanliness" vs. "runtime performance" trade-off a little bit too far, but at least there is still no
unsafeanywhere... :)