Merged
Conversation
This establishes baseline performance of the algorithm using some resonable default settings. On my local machine, this yeilds the following bechmark results: goos: linux goarch: amd64 pkg: github.com/esimov/triangle cpu: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz BenchmarkDraw-4 21 3074890062 ns/op PASS ok github.com/esimov/triangle 67.892s Signed-off-by: Chris Waldon <christopher.waldon.dev@gmail.com>
Signed-off-by: Chris Waldon <christopher.waldon.dev@gmail.com>
This commit switches several types from taking 4-bytes-per-pixel to a single byte per pixel. The behavior (so far as I can tell) is unchanged, but the speedup is noticable: goos: linux goarch: amd64 pkg: github.com/esimov/triangle cpu: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz BenchmarkDraw-4 25 2865398063 ns/op PASS ok github.com/esimov/triangle 74.604s Signed-off-by: Chris Waldon <christopher.waldon.dev@gmail.com>
This brings another few percent of performance improvement: goos: linux goarch: amd64 pkg: github.com/esimov/triangle cpu: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz BenchmarkDraw-4 27 2498826294 ns/op PASS ok github.com/esimov/triangle 70.112s Signed-off-by: Chris Waldon <christopher.waldon.dev@gmail.com>
This tweak appears to have a minor performance improvemnt associated, as no dynamic allocation has to occur for the slice embedded within the type. goos: linux goarch: amd64 pkg: github.com/esimov/triangle cpu: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz BenchmarkDraw-4 28 2474394774 ns/op PASS ok github.com/esimov/triangle 71.881s Signed-off-by: Chris Waldon <christopher.waldon.dev@gmail.com>
esimov
reviewed
Apr 9, 2021
Owner
esimov
left a comment
There was a problem hiding this comment.
Hey @whereswaldon thanks a lot for your contribution, I really appreciate. I left a comment on the benchmark test, otherwise everything is just fine and the speed improvement is quite considerable. Please do that slight adjustment and I will merge it back your PR into the master branch.
This preserves support for older go versions. Signed-off-by: Chris Waldon <christopher.waldon.dev@gmail.com>
esimov
reviewed
Apr 10, 2021
Owner
esimov
left a comment
There was a problem hiding this comment.
This looks good for me. I have merged it back to the main branch. Thanks a lot.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hello! Thanks so much for this amazing library. It's been really fun to play with over the last day or so.
This PR is a series of tweaks I made to boost performance. These were guided by
pprofand the included benchmark, but only tested on a single x86_64 Linux machine. They may not be an improvement everywhere.That being said, they appear to represent a 24% speedup on my test hardware without compromising the output in any way.
I have never worked with these algorithms before, and it is definitely possible that my work contains mistakes that compromise the algorithms under some edge case that I failed to consider or did not understand.
I also don't know enough about the algorithms to know if the parameters being benchmarked represent the normal use-case well enough. Perhaps the benchmark should be using different values?
I did all of this because I have a toy program that transforms my webcam output using this library, and I'm trying to boost the realtime processing capability of this library. Even with all of these enhancements applied, I can only get 15-20FPS for very low-res images.
Regardless, I hope these are a useful reference point for optimization, even if they ultimately aren't merged.