-
Notifications
You must be signed in to change notification settings - Fork 29.8k
benchmarkLive: a new LiveTestWidgetsFlutterBindingFramePolicy for benchmark on device
#61388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The test passes on my local machine, but I'm guessing the CI system when running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would help to explain how this differs from benchmark and fullyLive, and when I'd want to use this instead of one of those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
benchmark is actually very different: it's ignoring all animations. But yeah I should update the docs, and restructure the two parts from https://api.flutter.dev/flutter/flutter_test/LiveTestWidgetsFlutterBindingFramePolicy-class.html and
https://api.flutter.dev/flutter/flutter_test/LiveTestWidgetsFlutterBinding/framePolicy.html . But before that I still would like to hear if the general idea is reasonable.
|
After discussion with @liyuqian we feel it's probably better to move the test case to device lab and measure the delayed time reported by @dnfield What do you think? |
|
Sgtm |
|
A typical result for the device lab test running locally is: {
"success": true,
"data": {
"average_delay_fullyLive_millis": 11.110319672131148,
"average_delay_benchmarkLive_millis": 3.6040245901639345,
"90th_percentile_delay_fullyLive_millis": 19.567,
"90th_percentile_delay_benchmarkLive_millis": 9.01
},
"benchmarkScoreKeys": [
"average_delay_fullyLive_millis",
"average_delay_benchmarkLive_millis",
"90th_percentile_delay_fullyLive_millis",
"90th_percentile_delay_benchmarkLive_millis"
]
} |
|
There're already too many things. I'll open another PR to deal with the docs after this lands, so that while we updating the language I can move forward with the new flag. The major purpose for the new PR would be to combine the two separate docs explaining the flag, but it will also be used to clarify the usage. |
|
I can no longer see the "Draft" status. Does "It's currently still labeled draft because the test case about frame count seems platform dependent and is flaky on cirrus" in the PR description still apply? |
|
@liyuqian It shouldn't be considered draft now. That's out-of-date because I removed the unit test and moved it to a device lab test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By doing so I'm avoiding extra operation to select the page. It might be a good idea to do the same for other pages with E2E test version, or to put this together with the util code in #61509
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is modified from https://github.com/flutter/plugins/blob/master/packages/e2e/lib/e2e_driver.dart
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks strange. Does handlePointerEventRecord mostly return all negative delays? Shall we use abs just in case of positive delays? We might need to add some documentations to handlePointerEventRecord to describe the meaning of positive and negative delays, and set the expectation if most of them are negative or positive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes mostly negative. It's positive only when Future.delayed complete earlier than the set time. My local test shows it's 0~3 out of 200+ events with tiny positive values < 100us (microseconds). I don't think I should use abs because this list is only shown in debug mode and abs makes less use of the info. But it does make sense to change the behavior of handlePointerEventRecord for the major result being positive since we are calling it delay, and add docs to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update about the number. When putting this on a more complex environment, sometimes (not sure when, it differs from run to run) the fluctuation becomes larger, and positives can go as much as ~10% and up to 500us. Still tiny compare to frame interval though.
|
And I also just realized I should use flutter/plugins#2873 this new feature of reporting result in e2e to implement this rather than using timeline. Added a new commit. @liyuqian |
|
I may also need to update the macrobenchmark/readme.md, which is also the case for #61509. That will be done on whichever of the two PR who lands later. |
liyuqian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…enchmark on device (flutter#61388) * add benchmarkLive flag and tests * update handlePointerEventRecord doc * using e2e 0.6.1
…enchmark on device (flutter#61388) * add benchmarkLive flag and tests * update handlePointerEventRecord doc * using e2e 0.6.1
Description
Currently we have 5
LiveTestWidgetsFlutterBindingFramePolicys, out of which, onlyfullyLivecan or is intended to be used for running test on a device that respects the frame scheduling from engine. The problem is that it's still responding topumps from both user calls of a test app or from some test gestures, which leads to unexpected frame requests (and therefore extra delay for awaiting the frame) besides those from engine and framework animation. This prevents a faithful simulation for performance test cases running on a device.For a test running without a device (i.e.
AutomatedTestWidgetsFlutterBinding)pumpis inevitable in coding a test case and also in implementing our API (e.g. inflingFromorhandlePointerEventRecord) for driving a test app because before/after pump the location of a widget may change, which will change the hit test result and change the behavior to an input. However, for test running on a device (i.e.LiveTestWidgetsFlutterBinding) addingpumps will interfere with the engine frame scheduling, results in an unfaithful performance test result.For integrating test purposes we want to have a frame policy that's close to what a real app is doing: respect the frame request from the engine and from the framework, but ignore any extra frame request from the driving the test.
Use demo is also the test case listed below, where we can see that with the new frame policy flag, the delay for simulating gestures are significantly reduced.
It's currently still labeled draft because the test case about frame count seems platform dependent and is flaky on cirrusRelated Issues
Resolves #60739
Tests
I added the following tests:
dev/devicelab/bin/tasks/frame_policy_delay_test_android.dartfromdev/integration_tests/simple_scroll/test/frame_policy.dart.A typical result for the device lab test running locally on Pixel4 is:
{ "success": true, "data": { "average_delay_fullyLive_millis": 9.096860655737704, "average_delay_benchmarkLive_millis": 1.986188524590164, "90th_percentile_delay_fullyLive_millis": 17.468, "90th_percentile_delay_benchmarkLive_millis": 4.105 }, "benchmarkScoreKeys": [ "average_delay_fullyLive_millis", "average_delay_benchmarkLive_millis", "90th_percentile_delay_fullyLive_millis", "90th_percentile_delay_benchmarkLive_millis" ] }on Moto G4 is:
{ "success": true, "data": { "average_delay_fullyLive_millis": 10.99943442622951, "average_delay_benchmarkLive_millis": 4.632606557377049, "90th_percentile_delay_fullyLive_millis": 22.621, "90th_percentile_delay_benchmarkLive_millis": 8.177 }, "benchmarkScoreKeys": [ "average_delay_fullyLive_millis", "average_delay_benchmarkLive_millis", "90th_percentile_delay_fullyLive_millis", "90th_percentile_delay_benchmarkLive_millis" ] }This
simple_scrollmay also be used for other scrolling tests. I avoid usecomplex_layoutbecause that one is more noisy due to a complex layout.Checklist
Before you create this PR confirm that it meets all requirements listed below by checking the relevant checkboxes (
[x]). This will ensure a smooth and quick review process.///).flutter analyze --flutter-repo) does not report any problems on my PR.Breaking Change
Did any tests fail when you ran them? Please read Handling breaking changes.