Performance problems with detailed art (too many small shapes?)
This is probably a hard problem to crack, but here's a description anyway...
This snowflake pattern in the Context Free gallery renders quickly in the CF app: https://contextfreeart.org/gallery2/index.html#design/3771
Here's the CF code:
shape A {
CIRCLE[]
loop 6 [b.1r-60] A [s.4.5y.5]
}
startshape A
And here's my translation in Stamps:
(define rr random-real)
(define-shape A
(circle)
((loop ([i 6])
(A [s (rr .4 .5)]
[y .5]
[b (* i .1)]
[r (* i -60)]))))
(maximum-render-cycles 200000)
(start-shape A)
I guess that maximum-render-cycles needs to be increased to get the detail, but this results in long render times and out of memory errors.
yes, unfortunately, C++ will beat racket no matter how much we optimize.
Things to try:
- Use Racket Profile to identify the bottleneck
- Using more of typed racket improves performance
- For memory optimization: the path calculation algorithm could compute union of paths to reduce the number of objects
- I haven't explored any concurrency/parallelism - which is definitely possible in this application
- Maybe smarted pruning techniques (stop expanding if shape gets too small, but that can be tricky)
I think exploring the pruning might be beneficial. Here's an example, which renders some circles - about 25 are visible, but 5001 get rendered:
#lang s-exp stamps/lang
(define-shape row
(circle [saturation 1 ]
[hue 0 ]
[brightness 0.5])
(row [translate 0.5 0.5]
[scale 0.8 ]
[hue 2 ])
)
(maximum-render-cycles 10000)
(start-shape row)
I'm guessing the pruning code would go in the function record-paths - is that correct?
Let me know what you think and I'll do some exploring...
Yes, that would be the obvious place.
I guess the sub-problems we need to solve to optimize the performance of your benchmark case are:
- an efficient test to decide whether a polygon completely occludes another (polygon = 2xN matrix)
- an efficient test to decide whether a new polygon added (in
record-pathmethod) occludes any of the existing polygons (inpaths-queueinstance variable) - and then efficiently remove those occluded polygons. Most likely the existing polygons set must be represented in something other than a queue (maybe a quad-tree?).
Hi Rodrigo, My first strategy is to understand fully the recording paths code (especially recursive ones), so I'm writing some test code in a branch here: https://github.com/ericclack/racket-stamps/blob/performance/private/tests/renderer-recursive-tests.rkt
Once I've done that I'll start trying out some pruning strategies, perhaps with some alternative data structures as you suggest.
I'll let you know how I get on... -Eric.
My bad, I misunderstood you! I thought you're referring to the path-record% class (path-record.rkt).
The class intent of the class is simply to be a place where: 1) you can add new "paths", and 2) you can "render" those added paths into a device context (dc). My previous comments refer to ideas to implement in that class.
As for the record-paths function (render.rkt), it's not really recursive, just using a trick of "let" expressions to make a loop work. The purpose is to "keep applying" the renderer to shape rules and getting more shape rules until we reach the maximum render cycles or there are no more rule expansions - all while adding those paths to an instance of path-record%.
I'm not 100% clear how to optimize that function
Ah I see how it works now, with the let-loop.
I have a theory that at some point we can discard the functions returned by (renderer pr) in function record-paths in render.rkt... when their shapes would be too small to see. I'm going to explore this and let you know how I get on.
More progress here: https://github.com/ericclack/racket-stamps/tree/performance
More to do...
Testing seems to show no performance improvement! However it's been interesting for me to see how the code works... we'll see if further work leads anywhere...
Looking further I can see that the pruning is not having the desired effect -- certainly we are drawing less, but the work to record the paths seems to still reach the maximum-render-cycles limit, and this takes most of the time.
See here the low number of shapes and the high render time, which actually is mostly in "recording paths" (from https://github.com/ericclack/racket-stamps/blob/performance/examples/cf-pink-blossom2.rkt)
rendering...recording paths... drawing paths... 2983 shapes, 28595 ms
saving to 1024x768 image output.png (png)...done
So I need to look at my logic here:
https://github.com/ericclack/racket-stamps/blob/2d9ff58e3972d66506e15a2c5fc9a13a2c5cdc0c/private/render.rkt#L72
See option #18