Enable Skia reduceOpsTaskSplitting option by Adlai-Holler · Pull Request #26067 · flutter-team-archive/engine

Adlai-Holler · 2021-05-11T14:41:27Z

This option is enabled on our bots and is rolling out to our clients. It provides a performance improvement by reordering tasks to combine render tasks targeting the same surfaces.

For instance, if you call

draw
saveLayer
  draw
restore
draw

By default, Skia will treat this as

draw to A
draw to B
draw B to A
draw more to A

With this option, we will reorder to

draw to B
draw to A
draw B to A
draw more to A

This reduction in target swapping is faster but may consume more VRAM. If the reordered version would go over our memory budget, then we'll stick to the original version.

This option is enabled on our bots and is rolling out to our clients. It provides a performance improvement by reordering tasks to combine render tasks targeting the same surfaces. For instance, if you call draw saveLayer draw restoreLayer draw By default, Skia will treat this as draw to A draw to B draw B to A draw more to A With this option, we will reorder to draw to B draw to A draw B to A draw more to A This reduction in target swapping is faster but may consume more VRAM. If the reordered version would go over our memory budget, then we'll stick to the original version.

flutter-dashboard · 2021-05-11T14:41:29Z

It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat.

If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix?

Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing.

Adlai-Holler · 2021-05-11T14:41:49Z

@chinmaygarde wdyt?

chinmaygarde

I fail to see how the example you linked would consume more VRAM if it were reordered. I guess the reordering the following would lead to more VRAM usage however since the second draw render target can be reused today? Just so I understand, is this what you meant?

draw
saveLayer
  draw
restore
draw 
saveLayer
  draw
restore
draw

chinmaygarde · 2021-05-11T20:01:58Z

  // A similar work-around is also used in shell/common/io_manager.cc.
  options.fDisableGpuYUVConversion = true;
+
+  options.fReduceOpsTaskSplitting = GrContextOptions::Enable::kYes;


This is only in effect in the GL backend. The identical change must be applied to gpu_surface_vulkan and gpu_surface_metal.

Adlai-Holler · 2021-05-11T20:06:04Z

Oh, indeed this particular example wouldn't use more VRAM. Instead imagine we have 2 offscreen surfaces. Without reordering, the surface may be reusable for both offscreen draws. With reordering, we need to actually allocate 2 separate surfaces.

chinmaygarde · 2021-05-11T20:12:32Z

We are currently fairly sensitive to increases in VRAM usage. Is there way to enable the optimization only if it doesn't allocate more surfaces for purposes of reordering? In the first case, reducing render target switches makes sense. In the second case, I am not so sure.

Adlai-Holler · 2021-05-11T20:18:24Z

Hmm, there is no capability to do that. However we will fall back to the un-reordered version if the reordered version would not fit within the VRAM budget specified by the user. Would it be possible to tune that value? Is there a good way that we could try this option out and get data on the trade-off?

chinmaygarde · 2021-05-11T20:22:14Z

Would it be possible to tune that value?

Right now, it is some multiple of the screen size specified.

Is there a good way that we could try this option out and get data on the trade-off?

There are benchmarks that run on Skia perf. But, it may be that most of them measure time and the few that measure memory may actually be measuring host memory. @dnfield may have more precise guidance on the specific benchmarks.

dnfield · 2021-05-11T21:07:48Z

Is the VRAM budget the same as the resource cache budget? I don't know that we're telling Skia about anything but resource cache budgets.

We do not have any VRAM specific benchmarks. We do have some memory benchmarks that are basically polling adb shell dumpsys meminfo and mainly looking at RSS.

Adlai-Holler · 2021-05-11T21:14:40Z

Yes the resource cache budget is the VRAM budget for Skia GPU. We'll avoid going over that whenever possible, including falling-back from reordering tasks.

dnfield · 2021-05-11T21:27:33Z

Would you be up for following https://github.com/flutter/flutter/tree/master/dev/devicelab#running-an-ab-test-for-engine-changes and testing out flutter_gallery__memory_nav and complex_layout_scroll_perf__devtools_memory?

Adlai-Holler · 2021-05-12T15:10:35Z

Hey @dnfield, I'm attempting to do that but running into some trouble on my Mac. I'm trying to run on Android.

Specifically, I've used gn to generate the android_release_arm64 variant but attempting to build it with ninja -C out/android_release_arm64 generates a ton of errors in C++ and Java and Dart code, only reaching 303/3723 files built. https://pastebin.com/Qge8sVuR

If you have a suitable environment, would you be willing to patch in this change locally and see how it fares?

dnfield · 2021-05-12T16:43:49Z

That paste link seems dead. I can try to take a look at this today.

dnfield · 2021-05-13T17:59:31Z

I did not get to it yesterday but looking now :)

dnfield · 2021-05-13T20:02:59Z

Results for flutter_gallery__memory_nav:

════════════════════════════╡ ••• Raw results ••• ╞═════════════════════════════

start-min:
  A:	105820.00	105990.00	105961.00	
  B:	118927.00	110670.00	101765.00	
start-max:
  A:	113444.00	112400.00	113004.00	
  B:	124858.00	123504.00	120261.00	
start-median:
  A:	105960.00	106358.00	106332.00	
  B:	119227.00	119563.00	119146.00	
end-min:
  A:	230458.00	229345.00	230025.00	
  B:	241728.00	241957.00	242080.00	
end-max:
  A:	234842.00	236941.00	237314.00	
  B:	251830.00	247217.00	248769.00	
end-median:
  A:	231343.00	231931.00	232670.00	
  B:	243571.00	243522.00	244325.00	
diff-min:
  A:	121398.00	123355.00	123343.00	
  B:	122206.00	122394.00	123999.00	
diff-max:
  A:	127169.00	127427.00	128176.00	
  B:	131483.00	136547.00	146390.00	
diff-median:
  A:	125089.00	124541.00	125014.00	
  B:	123913.00	123530.00	124849.00	



═════════════════════════╡ ••• Final A/B results ••• ╞══════════════════════════

Score	Average A (noise)	Average B (noise)	Speed-up
start-min	105923.67 (0.07%)	110454.00 (6.34%)	0.96x	
start-max	112949.33 (0.38%)	122874.33 (1.57%)	0.92x	
start-median	106216.67 (0.17%)	119312.00 (0.15%)	0.89x	
end-min	229942.67 (0.20%)	241921.67 (0.06%)	0.95x	
end-max	236365.67 (0.46%)	249272.00 (0.77%)	0.95x	
end-median	231981.33 (0.23%)	243806.00 (0.15%)	0.95x	
diff-min	122698.67 (0.75%)	122866.33 (0.65%)	1.00x	
diff-max	127590.67 (0.33%)	138140.00 (4.48%)	0.92x	
diff-median	124881.33 (0.19%)	124097.33 (0.45%)	1.01x

The other memory benchmark I was suggesting was having trouble with the AB. However, for flutter_gallery__transition_perf

════════════════════════════╡ ••• Raw results ••• ╞═════════════════════════════

missed_transition_count:
  A:	0.00	0.00	0.00	
  B:	0.00	0.00	0.00	
average_frame_build_time_millis:
  A:	1.80	1.76	1.85	
  B:	1.78	1.78	1.78	
worst_frame_build_time_millis:
  A:	36.86	34.21	35.35	
  B:	31.67	34.83	31.00	
90th_percentile_frame_build_time_millis:
  A:	2.12	2.10	2.32	
  B:	2.19	2.14	2.03	
99th_percentile_frame_build_time_millis:
  A:	14.94	13.34	14.02	
  B:	13.34	12.63	12.71	
average_frame_rasterizer_time_millis:
  A:	4.93	5.44	5.32	
  B:	4.77	5.06	5.01	
worst_frame_rasterizer_time_millis:
  A:	60.88	60.09	63.93	
  B:	64.38	80.42	59.08	
90th_percentile_frame_rasterizer_time_millis:
  A:	7.82	10.80	10.19	
  B:	9.70	10.39	10.00	
99th_percentile_frame_rasterizer_time_millis:
  A:	22.52	22.61	23.51	
  B:	22.45	21.42	24.53	



═════════════════════════╡ ••• Final A/B results ••• ╞══════════════════════════

Score	Average A (noise)	Average B (noise)	Speed-up
missed_transition_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
average_frame_build_time_millis	1.80 (2.09%)	1.78 (0.14%)	1.01x	
worst_frame_build_time_millis	35.47 (3.06%)	32.50 (5.14%)	1.09x	
90th_percentile_frame_build_time_millis	2.18 (4.51%)	2.12 (3.05%)	1.03x	
99th_percentile_frame_build_time_millis	14.10 (4.64%)	12.89 (2.44%)	1.09x	
average_frame_rasterizer_time_millis	5.23 (4.11%)	4.94 (2.56%)	1.06x	
worst_frame_rasterizer_time_millis	61.63 (2.68%)	67.96 (13.35%)	0.91x	
90th_percentile_frame_rasterizer_time_millis	9.60 (13.37%)	10.03 (2.83%)	0.96x	
99th_percentile_frame_rasterizer_time_millis	22.88 (1.95%)	22.80 (5.67%)	1.00x

dnfield · 2021-05-13T20:04:27Z

This seems to more significantly regress memory performance. I'm a bit suspicious of the frame time improvements, since this really should only impact raster time - it's possible that there's som enoise due to my framework commit not using the same hash as what this is based on. However, the rasterizer actually shows a decrease in worst time and 90th percentile, though the average increase is nice.

dnfield · 2021-05-13T20:14:50Z

Given these results I'm not really inclined to say we should land this change. If there's a better benchmark we could run, or a real life application this benefits, it would help.

chinmaygarde

Given that we are super sensitive to device memory usage right now. I'd say we shouldn't land this right now unless the option to reorder draw calls only when there is a guarantee that additional surfaces wont be allocated is available.

Adlai-Holler · 2021-05-21T19:37:27Z

I hear you. FWIW, in order to get more information on the trade-off, I temporarily turned this option off on the Skia bots to gather perf data from our mobile device lab. Here's the result for the Pixel4XL on Vulkan:

https://perf.skia.org/e/?begin=1621542595&end=1621618244&keys=X6a1d49475c5eb82730dfe1872fe7701c&num_commits=50&request_type=1&xbaroffset=55307

As you can see, the result depends on the specific captured workload (Chrome snapshot) we're looking at but the performance is always improved, and in some cases like desk_weather.skp (weather.com) and mobi_slashdot, the numbers moved quite dramatically – from 4.8ms median to 3.16ms in the latter case.

Does this move the needle on your thinking?

dnfield · 2021-05-21T20:21:33Z

@apwilson - would you be able to check out how applying this affects benchmarks on Fuchsia?

I don't think we have many customers using Vulkan on Android, but Fuchsia is using Vulkan AFAIK.

dnfield · 2021-05-21T20:21:46Z

(The benchmarks I ran were OpenGL based)

Adlai-Holler · 2021-05-25T14:53:29Z

Sure, the flag also affects gl. For instance here's the timing data from the iPhone6 in gles on our skp suite. The "hump" is when we temporarily disabled the flag to measure the impact. https://perf.skia.org/e/?begin=1621537044&end=1621648544&keys=X70d8e3c479150e3a2b75931f96229ccf&num_commits=50&request_type=1&xbaroffset=55308

This option will be the default in Skia soon. Per discussion in flutter-team-archive#26067, let’s be explicit about disabling it for the time being, and we can revisit the flag in the future if desirable.

This option will be the default in Skia soon. Per discussion in #26067, let’s be explicit about disabling it for the time being, and we can revisit the flag in the future if desirable.

This option will be the default in Skia soon. Per discussion in flutter-team-archive#26067, let’s be explicit about disabling it for the time being, and we can revisit the flag in the future if desirable.

google-cla Bot added the cla: yes label May 11, 2021

chinmaygarde reviewed May 11, 2021

View reviewed changes

Apply to other skia contexts

dc65fea

chinmaygarde suggested changes May 21, 2021

View reviewed changes

Adlai-Holler mentioned this pull request Jun 3, 2021

Disable Skia reduceOpsTaskSplitting #26568

Merged

Adlai-Holler closed this Jun 3, 2021

Adlai-Holler deleted the reduce_ops_task_splitting branch June 3, 2021 18:06

Uh oh!

Conversation

Adlai-Holler commented May 11, 2021

Uh oh!

flutter-dashboard Bot commented May 11, 2021

Uh oh!

Adlai-Holler commented May 11, 2021

Uh oh!

chinmaygarde left a comment

Choose a reason for hiding this comment

Uh oh!

chinmaygarde May 11, 2021

Choose a reason for hiding this comment

Uh oh!

Adlai-Holler commented May 11, 2021

Uh oh!

chinmaygarde commented May 11, 2021

Uh oh!

Adlai-Holler commented May 11, 2021

Uh oh!

chinmaygarde commented May 11, 2021

Uh oh!

dnfield commented May 11, 2021

Uh oh!

Adlai-Holler commented May 11, 2021

Uh oh!

dnfield commented May 11, 2021

Uh oh!

Adlai-Holler commented May 12, 2021

Uh oh!

dnfield commented May 12, 2021

Uh oh!

dnfield commented May 13, 2021

Uh oh!

dnfield commented May 13, 2021

Uh oh!

dnfield commented May 13, 2021

Uh oh!

dnfield commented May 13, 2021

Uh oh!

chinmaygarde left a comment

Choose a reason for hiding this comment

Uh oh!

Adlai-Holler commented May 21, 2021

Uh oh!

dnfield commented May 21, 2021

Uh oh!

dnfield commented May 21, 2021

Uh oh!

Adlai-Holler commented May 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Adlai-Holler commented May 25, 2021 •

edited

Loading