benchmarkLive: a new `LiveTestWidgetsFlutterBindingFramePolicy` for benchmark on device #61388

CareF · 2020-07-13T21:30:32Z

Description

Currently we have 5 LiveTestWidgetsFlutterBindingFramePolicys, out of which, only fullyLive can or is intended to be used for running test on a device that respects the frame scheduling from engine. The problem is that it's still responding to pumps from both user calls of a test app or from some test gestures, which leads to unexpected frame requests (and therefore extra delay for awaiting the frame) besides those from engine and framework animation. This prevents a faithful simulation for performance test cases running on a device.

For a test running without a device (i.e. AutomatedTestWidgetsFlutterBinding) pump is inevitable in coding a test case and also in implementing our API (e.g. in flingFrom or handlePointerEventRecord) for driving a test app because before/after pump the location of a widget may change, which will change the hit test result and change the behavior to an input. However, for test running on a device (i.e. LiveTestWidgetsFlutterBinding ) adding pumps will interfere with the engine frame scheduling, results in an unfaithful performance test result.

For integrating test purposes we want to have a frame policy that's close to what a real app is doing: respect the frame request from the engine and from the framework, but ignore any extra frame request from the driving the test.

Use demo is also the test case listed below, where we can see that with the new frame policy flag, the delay for simulating gestures are significantly reduced.

~~It's currently still labeled draft because the test case about frame count seems platform dependent and is flaky on cirrus~~

Related Issues

Resolves #60739

Tests

I added the following tests:

dev/devicelab/bin/tasks/frame_policy_delay_test_android.dart from dev/integration_tests/simple_scroll/test/frame_policy.dart.

A typical result for the device lab test running locally on Pixel4 is:

{
  "success": true,
  "data": {
    "average_delay_fullyLive_millis": 9.096860655737704,
    "average_delay_benchmarkLive_millis": 1.986188524590164,
    "90th_percentile_delay_fullyLive_millis": 17.468,
    "90th_percentile_delay_benchmarkLive_millis": 4.105
  },
  "benchmarkScoreKeys": [
    "average_delay_fullyLive_millis",
    "average_delay_benchmarkLive_millis",
    "90th_percentile_delay_fullyLive_millis",
    "90th_percentile_delay_benchmarkLive_millis"
  ]
}

on Moto G4 is:

{
  "success": true,
  "data": {
    "average_delay_fullyLive_millis": 10.99943442622951,
    "average_delay_benchmarkLive_millis": 4.632606557377049,
    "90th_percentile_delay_fullyLive_millis": 22.621,
    "90th_percentile_delay_benchmarkLive_millis": 8.177
  },
  "benchmarkScoreKeys": [
    "average_delay_fullyLive_millis",
    "average_delay_benchmarkLive_millis",
    "90th_percentile_delay_fullyLive_millis",
    "90th_percentile_delay_benchmarkLive_millis"
  ]
}

This simple_scroll may also be used for other scrolling tests. I avoid use complex_layout because that one is more noisy due to a complex layout.

Checklist

Before you create this PR confirm that it meets all requirements listed below by checking the relevant checkboxes ([x]). This will ensure a smooth and quick review process.

I read the Contributor Guide and followed the process outlined there for submitting PRs.
I signed the CLA.
I read and followed the Flutter Style Guide, including Features we expect every widget to implement.
I read the Tree Hygiene wiki page, which explains my responsibilities.
I updated/added relevant documentation (doc comments with ///).
All existing and new tests are passing.
The analyzer (flutter analyze --flutter-repo) does not report any problems on my PR.
I am willing to follow-up on review comments in a timely manner.

Breaking Change

Did any tests fail when you ran them? Please read Handling breaking changes.

No, no existing tests failed, so this is not a breaking change.

CareF · 2020-07-14T12:46:22Z

The test passes on my local machine, but I'm guessing the CI system when running flutter test with LiveTestWidgetsFlutterBinding it has a different refresh rate.

dnfield · 2020-07-14T17:23:50Z

packages/flutter_test/lib/src/binding.dart

It would help to explain how this differs from benchmark and fullyLive, and when I'd want to use this instead of one of those.

benchmark is actually very different: it's ignoring all animations. But yeah I should update the docs, and restructure the two parts from https://api.flutter.dev/flutter/flutter_test/LiveTestWidgetsFlutterBindingFramePolicy-class.html and
https://api.flutter.dev/flutter/flutter_test/LiveTestWidgetsFlutterBinding/framePolicy.html . But before that I still would like to hear if the general idea is reasonable.

packages/flutter_test/lib/src/binding.dart

packages/flutter_test/test/live_binding_test.dart

CareF · 2020-07-14T19:24:02Z

After discussion with @liyuqian we feel it's probably better to move the test case to device lab and measure the delayed time reported by handlePointerEventRecord as a metric (average, 90th percentile) and use this as if it's a performance test. And the frame count is not very different anyway for fullyLive and benchmarkLive anyway and we sill ignore it.

@dnfield What do you think?

dnfield · 2020-07-14T20:58:13Z

Sgtm

CareF · 2020-07-15T23:46:26Z

A typical result for the device lab test running locally is:

{
  "success": true,
  "data": {
    "average_delay_fullyLive_millis": 11.110319672131148,
    "average_delay_benchmarkLive_millis": 3.6040245901639345,
    "90th_percentile_delay_fullyLive_millis": 19.567,
    "90th_percentile_delay_benchmarkLive_millis": 9.01
  },
  "benchmarkScoreKeys": [
    "average_delay_fullyLive_millis",
    "average_delay_benchmarkLive_millis",
    "90th_percentile_delay_fullyLive_millis",
    "90th_percentile_delay_benchmarkLive_millis"
  ]
}

CareF · 2020-07-15T23:56:25Z

There're already too many things. I'll open another PR to deal with the docs after this lands, so that while we updating the language I can move forward with the new flag. The major purpose for the new PR would be to combine the two separate docs explaining the flag, but it will also be used to clarify the usage.

liyuqian · 2020-07-16T20:14:03Z

I can no longer see the "Draft" status. Does "It's currently still labeled draft because the test case about frame count seems platform dependent and is flaky on cirrus" in the PR description still apply?

CareF · 2020-07-16T20:24:03Z

@liyuqian It shouldn't be considered draft now. That's out-of-date because I removed the unit test and moved it to a device lab test.
Although to enable merge I'll need to wait for package updates.

dev/integration_tests/simple_scroll/test/frame_policy.dart

dev/devicelab/lib/tasks/perf_tests.dart

CareF · 2020-07-17T12:51:04Z

dev/benchmarks/macrobenchmarks/test/frame_policy.dart

By doing so I'm avoiding extra operation to select the page. It might be a good idea to do the same for other pages with E2E test version, or to put this together with the util code in #61509

CareF · 2020-07-17T12:53:48Z

dev/benchmarks/macrobenchmarks/test_driver/frame_policy_test.dart

This file is modified from https://github.com/flutter/plugins/blob/master/packages/e2e/lib/e2e_driver.dart

dev/benchmarks/macrobenchmarks/lib/main.dart

dev/benchmarks/macrobenchmarks/lib/src/simple_scroll.dart

liyuqian · 2020-07-17T18:42:58Z

dev/benchmarks/macrobenchmarks/test/frame_policy.dart

This looks strange. Does handlePointerEventRecord mostly return all negative delays? Shall we use abs just in case of positive delays? We might need to add some documentations to handlePointerEventRecord to describe the meaning of positive and negative delays, and set the expectation if most of them are negative or positive.

Yes mostly negative. It's positive only when Future.delayed complete earlier than the set time. My local test shows it's 0~3 out of 200+ events with tiny positive values < 100us (microseconds). I don't think I should use abs because this list is only shown in debug mode and abs makes less use of the info. But it does make sense to change the behavior of handlePointerEventRecord for the major result being positive since we are calling it delay, and add docs to that.

Update about the number. When putting this on a more complex environment, sometimes (not sure when, it differs from run to run) the fluctuation becomes larger, and positives can go as much as ~10% and up to 500us. Still tiny compare to frame interval though.

packages/flutter_test/lib/src/binding.dart

CareF · 2020-07-17T19:51:09Z

And I also just realized I should use flutter/plugins#2873 this new feature of reporting result in e2e to implement this rather than using timeline. Added a new commit. @liyuqian

CareF · 2020-07-17T20:08:10Z

I may also need to update the macrobenchmark/readme.md, which is also the case for #61509. That will be done on whichever of the two PR who lands later.

liyuqian

LGTM

…enchmark on device (flutter#61388) * add benchmarkLive flag and tests * update handlePointerEventRecord doc * using e2e 0.6.1

CareF requested a review from liyuqian July 13, 2020 21:30

googlebot added the cla: yes label Jul 13, 2020

CareF requested a review from dnfield July 13, 2020 21:30

CareF self-assigned this Jul 13, 2020

goderbauer added a: tests "flutter test", flutter_test, or one of our tests framework flutter/packages/flutter repository. See also f: labels. labels Jul 13, 2020

CareF marked this pull request as draft July 14, 2020 02:36

fluttergithubbot added the work in progress; do not review label Jul 14, 2020

dnfield reviewed Jul 14, 2020

View reviewed changes

packages/flutter_test/lib/src/binding.dart Outdated Show resolved Hide resolved

liyuqian reviewed Jul 14, 2020

View reviewed changes

packages/flutter_test/test/live_binding_test.dart Outdated Show resolved Hide resolved

dnfield reviewed Jul 14, 2020

View reviewed changes

packages/flutter_test/test/live_binding_test.dart Outdated Show resolved Hide resolved

CareF removed the work in progress; do not review label Jul 15, 2020

CareF marked this pull request as ready for review July 15, 2020 23:28

fluttergithubbot added the c: contributor-productivity Team-specific productivity, code health, technical debt. label Jul 15, 2020

CareF requested review from dnfield and liyuqian July 15, 2020 23:56

CareF mentioned this pull request Jul 16, 2020

update packages #61647

Merged

9 tasks

liyuqian reviewed Jul 16, 2020

View reviewed changes

dev/integration_tests/simple_scroll/test/frame_policy.dart Outdated Show resolved Hide resolved

dev/devicelab/lib/tasks/perf_tests.dart Outdated Show resolved Hide resolved

CareF requested a review from liyuqian July 17, 2020 00:06

CareF commented Jul 17, 2020

View reviewed changes

liyuqian reviewed Jul 17, 2020

View reviewed changes

CareF requested a review from liyuqian July 17, 2020 19:26

CareF added 5 commits July 17, 2020 16:05

add benchmarkLive flag and tests

9104cb2

update pubspec

1502991

update docs

0cdf5f1

modify according to liyuqian@

c890e09

update handlePointerEventRecord doc

68b58d6

CareF mentioned this pull request Jul 17, 2020

Add a E2E based performance test case #61509

Merged

9 tasks

using e2e 0.6.1

39ce41d

liyuqian approved these changes Jul 17, 2020

View reviewed changes

CareF added the waiting for tree to go green label Jul 17, 2020

CareF merged commit 54c9441 into flutter:master Jul 17, 2020

This was referenced Jul 17, 2020

Update framePolicy docs #61773

Merged

Remove the flaky flag for frame policy test #61921

Merged

CareF deleted the benchmarkLive branch July 21, 2020 17:17

CareF mentioned this pull request Sep 26, 2020

cleanup completed todo and unused variable for WidgetTester #66743

Merged

github-actions bot locked as resolved and limited conversation to collaborators Jul 30, 2021

benchmarkLive: a new LiveTestWidgetsFlutterBindingFramePolicy for benchmark on device #61388

benchmarkLive: a new LiveTestWidgetsFlutterBindingFramePolicy for benchmark on device #61388

Uh oh!

Conversation

CareF commented Jul 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Tests

Checklist

Breaking Change

Uh oh!

CareF commented Jul 14, 2020

Uh oh!

dnfield Jul 14, 2020

Choose a reason for hiding this comment

Uh oh!

CareF Jul 14, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CareF commented Jul 14, 2020

Uh oh!

dnfield commented Jul 14, 2020

Uh oh!

CareF commented Jul 15, 2020

Uh oh!

CareF commented Jul 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liyuqian commented Jul 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CareF commented Jul 16, 2020

Uh oh!

Uh oh!

Uh oh!

CareF Jul 17, 2020

Choose a reason for hiding this comment

Uh oh!

CareF Jul 17, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

liyuqian Jul 17, 2020

Choose a reason for hiding this comment

Uh oh!

CareF Jul 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CareF Jul 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CareF commented Jul 17, 2020

Uh oh!

CareF commented Jul 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liyuqian left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

benchmarkLive: a new `LiveTestWidgetsFlutterBindingFramePolicy` for benchmark on device #61388

benchmarkLive: a new `LiveTestWidgetsFlutterBindingFramePolicy` for benchmark on device #61388

CareF commented Jul 13, 2020 •

edited

Loading

CareF commented Jul 15, 2020 •

edited

Loading

liyuqian commented Jul 16, 2020 •

edited

Loading

CareF Jul 17, 2020 •

edited

Loading

CareF Jul 18, 2020 •

edited

Loading

CareF commented Jul 17, 2020 •

edited

Loading