Skip to content

Conversation

@CareF
Copy link
Contributor

@CareF CareF commented Jul 13, 2020

Description

Currently we have 5 LiveTestWidgetsFlutterBindingFramePolicys, out of which, only fullyLive can or is intended to be used for running test on a device that respects the frame scheduling from engine. The problem is that it's still responding to pumps from both user calls of a test app or from some test gestures, which leads to unexpected frame requests (and therefore extra delay for awaiting the frame) besides those from engine and framework animation. This prevents a faithful simulation for performance test cases running on a device.

For a test running without a device (i.e. AutomatedTestWidgetsFlutterBinding) pump is inevitable in coding a test case and also in implementing our API (e.g. in flingFrom or handlePointerEventRecord) for driving a test app because before/after pump the location of a widget may change, which will change the hit test result and change the behavior to an input. However, for test running on a device (i.e. LiveTestWidgetsFlutterBinding ) adding pumps will interfere with the engine frame scheduling, results in an unfaithful performance test result.

For integrating test purposes we want to have a frame policy that's close to what a real app is doing: respect the frame request from the engine and from the framework, but ignore any extra frame request from the driving the test.

Use demo is also the test case listed below, where we can see that with the new frame policy flag, the delay for simulating gestures are significantly reduced.

It's currently still labeled draft because the test case about frame count seems platform dependent and is flaky on cirrus

Related Issues

Resolves #60739

Tests

I added the following tests:

  • dev/devicelab/bin/tasks/frame_policy_delay_test_android.dart from dev/integration_tests/simple_scroll/test/frame_policy.dart.

A typical result for the device lab test running locally on Pixel4 is:

{
  "success": true,
  "data": {
    "average_delay_fullyLive_millis": 9.096860655737704,
    "average_delay_benchmarkLive_millis": 1.986188524590164,
    "90th_percentile_delay_fullyLive_millis": 17.468,
    "90th_percentile_delay_benchmarkLive_millis": 4.105
  },
  "benchmarkScoreKeys": [
    "average_delay_fullyLive_millis",
    "average_delay_benchmarkLive_millis",
    "90th_percentile_delay_fullyLive_millis",
    "90th_percentile_delay_benchmarkLive_millis"
  ]
}

on Moto G4 is:

{
  "success": true,
  "data": {
    "average_delay_fullyLive_millis": 10.99943442622951,
    "average_delay_benchmarkLive_millis": 4.632606557377049,
    "90th_percentile_delay_fullyLive_millis": 22.621,
    "90th_percentile_delay_benchmarkLive_millis": 8.177
  },
  "benchmarkScoreKeys": [
    "average_delay_fullyLive_millis",
    "average_delay_benchmarkLive_millis",
    "90th_percentile_delay_fullyLive_millis",
    "90th_percentile_delay_benchmarkLive_millis"
  ]
}

This simple_scroll may also be used for other scrolling tests. I avoid use complex_layout because that one is more noisy due to a complex layout.

Checklist

Before you create this PR confirm that it meets all requirements listed below by checking the relevant checkboxes ([x]). This will ensure a smooth and quick review process.

  • I read the Contributor Guide and followed the process outlined there for submitting PRs.
  • I signed the CLA.
  • I read and followed the Flutter Style Guide, including Features we expect every widget to implement.
  • I read the Tree Hygiene wiki page, which explains my responsibilities.
  • I updated/added relevant documentation (doc comments with ///).
  • All existing and new tests are passing.
  • The analyzer (flutter analyze --flutter-repo) does not report any problems on my PR.
  • I am willing to follow-up on review comments in a timely manner.

Breaking Change

Did any tests fail when you ran them? Please read Handling breaking changes.

  • No, no existing tests failed, so this is not a breaking change.

@CareF CareF requested a review from liyuqian July 13, 2020 21:30
@CareF CareF requested a review from dnfield July 13, 2020 21:30
@CareF CareF self-assigned this Jul 13, 2020
@goderbauer goderbauer added a: tests "flutter test", flutter_test, or one of our tests framework flutter/packages/flutter repository. See also f: labels. labels Jul 13, 2020
@CareF CareF marked this pull request as draft July 14, 2020 02:36
@CareF
Copy link
Contributor Author

CareF commented Jul 14, 2020

The test passes on my local machine, but I'm guessing the CI system when running flutter test with LiveTestWidgetsFlutterBinding it has a different refresh rate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would help to explain how this differs from benchmark and fullyLive, and when I'd want to use this instead of one of those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

benchmark is actually very different: it's ignoring all animations. But yeah I should update the docs, and restructure the two parts from https://api.flutter.dev/flutter/flutter_test/LiveTestWidgetsFlutterBindingFramePolicy-class.html and
https://api.flutter.dev/flutter/flutter_test/LiveTestWidgetsFlutterBinding/framePolicy.html . But before that I still would like to hear if the general idea is reasonable.

@CareF
Copy link
Contributor Author

CareF commented Jul 14, 2020

After discussion with @liyuqian we feel it's probably better to move the test case to device lab and measure the delayed time reported by handlePointerEventRecord as a metric (average, 90th percentile) and use this as if it's a performance test. And the frame count is not very different anyway for fullyLive and benchmarkLive anyway and we sill ignore it.

@dnfield What do you think?

@dnfield
Copy link
Contributor

dnfield commented Jul 14, 2020

Sgtm

@CareF CareF marked this pull request as ready for review July 15, 2020 23:28
@fluttergithubbot fluttergithubbot added the c: contributor-productivity Team-specific productivity, code health, technical debt. label Jul 15, 2020
@CareF
Copy link
Contributor Author

CareF commented Jul 15, 2020

A typical result for the device lab test running locally is:

{
  "success": true,
  "data": {
    "average_delay_fullyLive_millis": 11.110319672131148,
    "average_delay_benchmarkLive_millis": 3.6040245901639345,
    "90th_percentile_delay_fullyLive_millis": 19.567,
    "90th_percentile_delay_benchmarkLive_millis": 9.01
  },
  "benchmarkScoreKeys": [
    "average_delay_fullyLive_millis",
    "average_delay_benchmarkLive_millis",
    "90th_percentile_delay_fullyLive_millis",
    "90th_percentile_delay_benchmarkLive_millis"
  ]
}

@CareF
Copy link
Contributor Author

CareF commented Jul 15, 2020

There're already too many things. I'll open another PR to deal with the docs after this lands, so that while we updating the language I can move forward with the new flag. The major purpose for the new PR would be to combine the two separate docs explaining the flag, but it will also be used to clarify the usage.

@CareF CareF requested review from dnfield and liyuqian July 15, 2020 23:56
@CareF CareF mentioned this pull request Jul 16, 2020
9 tasks
@liyuqian
Copy link
Contributor

liyuqian commented Jul 16, 2020

I can no longer see the "Draft" status. Does "It's currently still labeled draft because the test case about frame count seems platform dependent and is flaky on cirrus" in the PR description still apply?

@CareF
Copy link
Contributor Author

CareF commented Jul 16, 2020

@liyuqian It shouldn't be considered draft now. That's out-of-date because I removed the unit test and moved it to a device lab test.
Although to enable merge I'll need to wait for package updates.

@CareF CareF requested a review from liyuqian July 17, 2020 00:06
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By doing so I'm avoiding extra operation to select the page. It might be a good idea to do the same for other pages with E2E test version, or to put this together with the util code in #61509

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks strange. Does handlePointerEventRecord mostly return all negative delays? Shall we use abs just in case of positive delays? We might need to add some documentations to handlePointerEventRecord to describe the meaning of positive and negative delays, and set the expectation if most of them are negative or positive.

Copy link
Contributor Author

@CareF CareF Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes mostly negative. It's positive only when Future.delayed complete earlier than the set time. My local test shows it's 0~3 out of 200+ events with tiny positive values < 100us (microseconds). I don't think I should use abs because this list is only shown in debug mode and abs makes less use of the info. But it does make sense to change the behavior of handlePointerEventRecord for the major result being positive since we are calling it delay, and add docs to that.

Copy link
Contributor Author

@CareF CareF Jul 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update about the number. When putting this on a more complex environment, sometimes (not sure when, it differs from run to run) the fluctuation becomes larger, and positives can go as much as ~10% and up to 500us. Still tiny compare to frame interval though.

@CareF CareF requested a review from liyuqian July 17, 2020 19:26
@CareF
Copy link
Contributor Author

CareF commented Jul 17, 2020

And I also just realized I should use flutter/plugins#2873 this new feature of reporting result in e2e to implement this rather than using timeline. Added a new commit. @liyuqian

@CareF
Copy link
Contributor Author

CareF commented Jul 17, 2020

I may also need to update the macrobenchmark/readme.md, which is also the case for #61509. That will be done on whichever of the two PR who lands later.

Copy link
Contributor

@liyuqian liyuqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CareF CareF merged commit 54c9441 into flutter:master Jul 17, 2020
@CareF CareF deleted the benchmarkLive branch July 21, 2020 17:17
Pragya007 pushed a commit to Pragya007/flutter that referenced this pull request Aug 11, 2020
…enchmark on device (flutter#61388)

* add benchmarkLive flag and tests

* update handlePointerEventRecord doc

* using e2e 0.6.1
mingwandroid pushed a commit to mingwandroid/flutter that referenced this pull request Sep 6, 2020
…enchmark on device (flutter#61388)

* add benchmarkLive flag and tests

* update handlePointerEventRecord doc

* using e2e 0.6.1
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

a: tests "flutter test", flutter_test, or one of our tests c: contributor-productivity Team-specific productivity, code health, technical debt. framework flutter/packages/flutter repository. See also f: labels.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a new LiveTestWidgetsFlutterBindingFramePolicy for benchmark on device

6 participants