Enable Skia reduceOpsTaskSplitting option#26067
Conversation
This option is enabled on our bots and is rolling out to our clients. It provides a performance improvement by reordering tasks to combine render tasks targeting the same surfaces. For instance, if you call draw saveLayer draw restoreLayer draw By default, Skia will treat this as draw to A draw to B draw B to A draw more to A With this option, we will reorder to draw to B draw to A draw B to A draw more to A This reduction in target swapping is faster but may consume more VRAM. If the reordered version would go over our memory budget, then we'll stick to the original version.
|
It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat. If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix? Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing. |
|
@chinmaygarde wdyt? |
chinmaygarde
left a comment
There was a problem hiding this comment.
I fail to see how the example you linked would consume more VRAM if it were reordered. I guess the reordering the following would lead to more VRAM usage however since the second draw render target can be reused today? Just so I understand, is this what you meant?
draw
saveLayer
draw
restore
draw
saveLayer
draw
restore
draw
| // A similar work-around is also used in shell/common/io_manager.cc. | ||
| options.fDisableGpuYUVConversion = true; | ||
|
|
||
| options.fReduceOpsTaskSplitting = GrContextOptions::Enable::kYes; |
There was a problem hiding this comment.
This is only in effect in the GL backend. The identical change must be applied to gpu_surface_vulkan and gpu_surface_metal.
|
Oh, indeed this particular example wouldn't use more VRAM. Instead imagine we have 2 offscreen surfaces. Without reordering, the surface may be reusable for both offscreen draws. With reordering, we need to actually allocate 2 separate surfaces. |
|
We are currently fairly sensitive to increases in VRAM usage. Is there way to enable the optimization only if it doesn't allocate more surfaces for purposes of reordering? In the first case, reducing render target switches makes sense. In the second case, I am not so sure. |
|
Hmm, there is no capability to do that. However we will fall back to the un-reordered version if the reordered version would not fit within the VRAM budget specified by the user. Would it be possible to tune that value? Is there a good way that we could try this option out and get data on the trade-off? |
Right now, it is some multiple of the screen size specified.
There are benchmarks that run on Skia perf. But, it may be that most of them measure time and the few that measure memory may actually be measuring host memory. @dnfield may have more precise guidance on the specific benchmarks. |
|
Is the VRAM budget the same as the resource cache budget? I don't know that we're telling Skia about anything but resource cache budgets. We do not have any VRAM specific benchmarks. We do have some memory benchmarks that are basically polling |
|
Yes the resource cache budget is the VRAM budget for Skia GPU. We'll avoid going over that whenever possible, including falling-back from reordering tasks. |
|
Would you be up for following https://github.com/flutter/flutter/tree/master/dev/devicelab#running-an-ab-test-for-engine-changes and testing out |
|
Hey @dnfield, I'm attempting to do that but running into some trouble on my Mac. I'm trying to run on Android. Specifically, I've used gn to generate the If you have a suitable environment, would you be willing to patch in this change locally and see how it fares? |
|
That paste link seems dead. I can try to take a look at this today. |
|
I did not get to it yesterday but looking now :) |
|
Results for flutter_gallery__memory_nav: The other memory benchmark I was suggesting was having trouble with the AB. However, for flutter_gallery__transition_perf |
|
This seems to more significantly regress memory performance. I'm a bit suspicious of the frame time improvements, since this really should only impact raster time - it's possible that there's som enoise due to my framework commit not using the same hash as what this is based on. However, the rasterizer actually shows a decrease in worst time and 90th percentile, though the average increase is nice. |
|
Given these results I'm not really inclined to say we should land this change. If there's a better benchmark we could run, or a real life application this benefits, it would help. |
chinmaygarde
left a comment
There was a problem hiding this comment.
Given that we are super sensitive to device memory usage right now. I'd say we shouldn't land this right now unless the option to reorder draw calls only when there is a guarantee that additional surfaces wont be allocated is available.
|
I hear you. FWIW, in order to get more information on the trade-off, I temporarily turned this option off on the Skia bots to gather perf data from our mobile device lab. Here's the result for the Pixel4XL on Vulkan: As you can see, the result depends on the specific captured workload (Chrome snapshot) we're looking at but the performance is always improved, and in some cases like Does this move the needle on your thinking? |
|
@apwilson - would you be able to check out how applying this affects benchmarks on Fuchsia? I don't think we have many customers using Vulkan on Android, but Fuchsia is using Vulkan AFAIK. |
|
(The benchmarks I ran were OpenGL based) |
|
Sure, the flag also affects gl. For instance here's the timing data from the iPhone6 in gles on our skp suite. The "hump" is when we temporarily disabled the flag to measure the impact. https://perf.skia.org/e/?begin=1621537044&end=1621648544&keys=X70d8e3c479150e3a2b75931f96229ccf&num_commits=50&request_type=1&xbaroffset=55308 |
This option will be the default in Skia soon. Per discussion in flutter-team-archive#26067, let’s be explicit about disabling it for the time being, and we can revisit the flag in the future if desirable.
This option will be the default in Skia soon. Per discussion in flutter-team-archive#26067, let’s be explicit about disabling it for the time being, and we can revisit the flag in the future if desirable.
This option will be the default in Skia soon. Per discussion in #26067, let’s be explicit about disabling it for the time being, and we can revisit the flag in the future if desirable.
This option will be the default in Skia soon. Per discussion in flutter-team-archive#26067, let’s be explicit about disabling it for the time being, and we can revisit the flag in the future if desirable.
This option will be the default in Skia soon. Per discussion in flutter-team-archive#26067, let’s be explicit about disabling it for the time being, and we can revisit the flag in the future if desirable.
This option is enabled on our bots and is rolling out to our clients. It provides a performance improvement by reordering tasks to combine render tasks targeting the same surfaces.
For instance, if you call
By default, Skia will treat this as
With this option, we will reorder to
This reduction in target swapping is faster but may consume more VRAM. If the reordered version would go over our memory budget, then we'll stick to the original version.